Use regular expressions to filter Pandas lines with ~ at the beginning and end of a string… here is a solution to the problem.
Use regular expressions to filter Pandas lines with ~ at the beginning and end of a string
I’m trying to use regular expressions in pandas to filter out rows with ~
at the beginning and end of a given column. For example, take the following pandas Dataframe:
import pandas as pd
df = pd. DataFrame({'line': [1, 2, 3, 4, 5, 6, 7, 8, 9],
'Unit': ['LF', 'LS~', '~~SF', 'CY', '~SF~', 'PC', '~~', '~LF', '~PC~']})
This is the output I want :
df[df. Unit.str.contains(MY_EXPRESSION, regex=True)]
line Unit
0 1 LF
1 2 LS~
2 3 ~~SF
3 4 CY
5 6 PC
7 8 ~LF
What I’ve tried so far :
- MY_EXPRESSION = ‘^[^~].*[^~]$’
This filters anything with a ~ at the beginning or end of the string. I just want to filter out lines with ~
at the beginning and end of the string.
- MY_EXPRESSION = ‘^([^~])(.*)([^~])$’
This also filters out lines with a ~ at the beginning or end of the string. Again, I just want to filter out lines with ~
at the beginning and end of the string.
What regular expression do I need (i.e. MY_EXPRESSION
in the example) to filter the Dataframe the way I want?
I’m using pandas v.0.23.4.
Solution
Use pandas. Series.str.match
df[~df. Unit.str.match('^~.*~$')]
Unit line
0 LF 1
1 LS~ 2
2 ~~SF 3
3 CY 4
5 PC 6
7 ~LF 8