How do I apply a pipeline (regular expression or) to a specific pattern substring so that it matches a group followed by a set of characters or EOL?… here is a solution to the problem.
How do I apply a pipeline (regular expression or) to a specific pattern substring so that it matches a group followed by a set of characters or EOL?
I
have a list of questions and answers, and I want to split by question.
s = 'Q1 blah1 Ans BLAH1 Q2 blah2 Ans BLAH2'
re.split('(Q\d.*?) Q\d', s)
Result:
['', 'Q1 blah1 Ans BLAH1 ', ' blah2 Ans BLAH2']
I want to capture the part that starts with “Q#” followed by another “Q#” or the end of the line. So I tried using this :
re.split('(Q\d.*?) Q\d|$', s)
['', 'Q1 blah1 Ans BLAH1 ', ' blah2 Ans BLAH2']
And this :
re.split('(Q\d.*?) (Q\d|$)', s)
['', 'Q1 blah1 Ans BLAH1 ', 'Q2', ' blah2 Ans BLAH2']
However, they did not give me the results I wanted. Due to |
Used incorrectly, in the first case it does not work, but I do not know how to correct it. In the second case, Q2
will not be captured with blah2 Ans BLAH2.
Edit:
Expected output:
['', 'Q1 blah1 Ans BLAH1 ', 'Q2 blah2 Ans BLAH2']
Solution
Try splitting as follows:
\s+(?=Q\d+)
This uses a positive forward assertion that asserts but does not consume the next problem that is beginning.
s = 'Q1 blah1 Ans BLAH1 Q2 blah2 Ans BLAH2'
print re.split('\s+(?=Q\d+)', s)
['Q1 blah1 Ans BLAH1', 'Q2 blah2 Ans BLAH2']