Python – Delete column rows that contain any numeric substrings

Delete column rows that contain any numeric substrings… here is a solution to the problem.

Delete column rows that contain any numeric substrings

I noticed that when a column element in a Pandas DataFrame has a numeric substring, the method isnumeric returns false.

For example:

row 1, column 1 has the following: 0002 0003 1289
row 2, column 1 has the following: 89060 324 123431132
row 3, column 1 has the following: 890GB 32A 34311TT
row 4, column 1 has the following: 82A 34311TT
row 4, column 1 has the following: 82A 34311TT 889 9999C

Obviously, rows 1 and 2

are numbers, but isnumeric returns false for rows 1 and 2.

I found a workaround that involves splitting each substring into their own columns, and then creating a column of bool values for each substring to add the bool values together to show if a row is all numbers. However, this is tedious, and my features don’t look neat either. I also don’t want to remove and replace spaces (compress all substrings into one number) because I need to keep the original substrings.

Does anyone know of a simpler solution/technique that correctly tells me that these elements with one or more numeric substrings are all numeric? My ultimate goal is to remove these rows that contain only numbers.

Solution

I think it is necessary to use split and all to check the list understanding of all numeric strings:

mask = ~df['a'].apply(lambda x: all([s.isnumeric() for s in x.split()]))

mask = [not all([s.isnumeric() for s in x.split()]) for x in df['a']]

If you want to check if at least one numeric string uses any:

mask = ~df['a'].apply(lambda x: any([s.isnumeric() for s in x.split()]))

mask = [not any([s.isnumeric() for s in x.split()]) for x in df['a']]

Related Problems and Solutions