Python regular expressions match 2 different delimiters

Python regular expressions match 2 different delimiters … here is a solution to the problem.

Python regular expressions match 2 different delimiters

I’m trying to make a regular expression that matches something like:

[

[uid::Page name|page alias]].

For example:

[[nw::Home|Home]].

Both the uid and the page alias are optional.

I want the separator :: or | to appear only once, and only in the order shown. However, the character : should be allowed anywhere after the uid. That’s the problem.

The following regular expression works just fine, except it matches a string where :: appears twice or appears in the wrong place:

regex = r'\[\[([\w]+::)?( [^|\t\n\r\f\v]+)(\|[ ^|\t\n\r\f\v]+)?\]\]'
re.match(regex, '[[Home]]') # matches, good
re.match(regex, '[[Home| Home page]]') # matches, good
re.match(regex, '[[nw::Home]]') # matches, good
re.match(regex, '[[nw::Home| Home page]]') # matches, good
re.match(regex, '[[nw| Home| Home page]]') # doesn't match, good
re.match(regex, '[[nw| Home::Home page]]') # matches, bad
re.match(regex, '[[nw::Home::Home page]]') # matches, bad

I’ve read all about negative precedence and last-line expressions, but I don’t know how to apply them in this case. Any suggestions would be appreciated.

EDIT: I would also like to know how to prevent the delimiter from being included in the matching results, as shown below:

('nw::', 'home page', '| home page').

Solution

If I understand your needs correctly, you can use this :

\[\[(?:(? <uid>\w+)::)? (?!. *::)(?<page>[^|\t\n\r\f\v]+)(?:\| (?<alias>[^|\t\n\r\f\v]+))? \]\]
                      ^^^^^^^^

See here for a demo. I added a negative lookforward after the uid capture.

I’ve named the captured groups, but if you don’t want them, that’s the one that doesn’t name the capture groups :

\[\[(?:( \w+)::)? (?!. *::)([^|\t\n\r\f\v]+)(?:\| ([^|\t\n\r\f\v]+))? \]\]

Related Problems and Solutions