Python – Use regular expressions to filter Pandas lines with ~ at the beginning and end of a string

Use regular expressions to filter Pandas lines with ~ at the beginning and end of a string… here is a solution to the problem.

Use regular expressions to filter Pandas lines with ~ at the beginning and end of a string

I’m trying to use regular expressions in pandas to filter out rows with ~ at the beginning and end of a given column. For example, take the following pandas Dataframe:

import pandas as pd
df = pd. DataFrame({'line': [1, 2, 3, 4, 5, 6, 7, 8, 9],
                   'Unit': ['LF', 'LS~', '~~SF', 'CY', '~SF~', 'PC', '~~', '~LF', '~PC~']})

This is the output I want :

df[df. Unit.str.contains(MY_EXPRESSION, regex=True)]
   line Unit
0     1   LF
1     2   LS~
2     3   ~~SF
3     4   CY
5     6   PC
7     8   ~LF

What I’ve tried so far :

  1. MY_EXPRESSION = ‘^[^~].*[^~]$’

This filters anything with a ~ at the beginning or end of the string. I just want to filter out lines with ~ at the beginning and end of the string.

  1. MY_EXPRESSION = ‘^([^~])(.*)([^~])$’

This also filters out lines with a ~ at the beginning or end of the string. Again, I just want to filter out lines with ~ at the beginning and end of the string.

What regular expression do I need (i.e. MY_EXPRESSION in the example) to filter the Dataframe the way I want?

I’m using pandas v.0.23.4.

Solution

Use pandas. Series.str.match

df[~df. Unit.str.match('^~.*~$')]

Unit  line
0    LF     1
1   LS~     2
2  ~~SF     3
3    CY     4
5    PC     6
7   ~LF     8

Related Problems and Solutions