Python regular expressions extract the first uppercase word or the first and second words if both are capitalized

Python regular expressions extract the first uppercase word or the first and second words if both are capitalized … here is a solution to the problem.

Python regular expressions extract the first uppercase word or the first and second words if both are capitalized

The current regular expression formula I implemented can only extract the first two uppercase words of the given string. If the second word is not capitalized, I want to be able to extract only the first word in the string.

Here are some examples:

s = 'Smith John went to ss for Jones.'
s = 'Jones, Greg went to 2b for Smith.'
s = 'Doe went to ss for Jones.'

Essentially, I just want the regular expression to output the following:

'Smith John'
'Jones, Greg'
'Doe'

My current regular expression formula is as follows, but it doesn’t capture the Doe example:

new = re.findall(r'([A-Z][\w-]*(?:\ s+[A-Z][\w-]*)+)', s)

Solution

Regular expressions are too much. str.isupper() works well:

In [11]: def getName(s):
    ...:     first, second = s.split()[:2]
    ...:     if first[0].isupper():
    ...:         if second[0].isupper():
    ...:             return ' '.join([first, second])
    ...:         return first
    ...:     

This gives:

In [12]: getName('Smith John went to ss for Jones.')
Out[12]: 'Smith John'

In [13]: getName('Jones, Greg went to 2b for Smith.')
Out[13]: 'Jones, Greg'

In [14]: getName('Doe went to ss for Jones.')
Out[14]: 'Doe'

Add some checks so that when your string only has one word it doesn’t go wrong and you’re good to go.


If you insist on using regular expressions, you can use a pattern like this:

In [36]: pattern = re.compile(r'([A-Z].*? ) {1,2}')

In [37]: pattern.match('Smith John went to ss for Jones.'). group(0).rstrip()
Out[37]: 'Smith John'

In [38]: pattern.match('Doe went to ss for Jones.'). group(0).rstrip()
Out[38]: 'Doe'

r'([A-Z].*? ) {1,2}' will match the first and can also match the second if they are uppercase.

Related Problems and Solutions