Python – How do I apply a pipeline (regular expression or) to a specific pattern substring so that it matches a group followed by a set of characters or EOL?

How do I apply a pipeline (regular expression or) to a specific pattern substring so that it matches a group followed by a set of characters or EOL?… here is a solution to the problem.

How do I apply a pipeline (regular expression or) to a specific pattern substring so that it matches a group followed by a set of characters or EOL?

I

have a list of questions and answers, and I want to split by question.

s = 'Q1 blah1 Ans BLAH1 Q2 blah2 Ans BLAH2'
re.split('(Q\d.*?) Q\d', s)

Result:

['', 'Q1 blah1 Ans BLAH1 ', ' blah2 Ans BLAH2']

I want to capture the part that starts with “Q#” followed by another “Q#” or the end of the line. So I tried using this :

re.split('(Q\d.*?) Q\d|$', s)
['', 'Q1 blah1 Ans BLAH1 ', ' blah2 Ans BLAH2']

And this :

re.split('(Q\d.*?) (Q\d|$)', s)
['', 'Q1 blah1 Ans BLAH1 ', 'Q2', ' blah2 Ans BLAH2']

However, they did not give me the results I wanted. Due to | Used incorrectly, in the first case it does not work, but I do not know how to correct it. In the second case, Q2 will not be captured with blah2 Ans BLAH2.

Edit:

Expected output:

['', 'Q1 blah1 Ans BLAH1 ', 'Q2 blah2 Ans BLAH2']

Solution

Try splitting as follows:

\s+(?=Q\d+)

This uses a positive forward assertion that asserts but does not consume the next problem that is beginning.

s = 'Q1 blah1 Ans BLAH1 Q2 blah2 Ans BLAH2'
print re.split('\s+(?=Q\d+)', s)

['Q1 blah1 Ans BLAH1', 'Q2 blah2 Ans BLAH2']

Demo

Related Problems and Solutions