Python regular expressions replace all numbers unless they are part of a substring … here is a solution to the problem.
Python regular expressions replace all numbers unless they are part of a substring
I want to remove all numbers unless they form one of the special substrings. In the example below, the special substrings I should skip number removal are 1s, 2s, s4, 3s. I guess I need to use negative forwarding
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?! 1s|2s|s4|3s)[0-9\.]"
re.sub(pattern, ' ', s)
As I understand it, the pattern above is:
- Match all numbers including decimals starting from the end ([]).
- Only do this if we don’t have a pattern after matching ?!
- They are 1, 2, s4, or 3 (| = OR).
Everything makes sense until you try. The example s above returns a
1s sa 2s3s as s af3s
, which indicates that all exclusion patterns are valid unless the number is at the end of a special substring, in which case it still matches?!
I
believe this operation should return a 1s sa 2s3s as4s4af3s
, how do I fix my pattern?
Solution
You can use
it
import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)| [\d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s
See Python demo
Details:
(1s|2s|s4|3s)
– Group 1: 1s,2s
,s4
, or3s
|
– or[\d.]
– A number or dot.
If Group 1 matches, Group 1 values are substitutes, otherwise, it is a space.