Check the string type of the pandas dataframe column
I have a fairly large pandas data frame (11k rows and 20 columns). A column has mixed data types, mostly numbers (floating-point) and a small number of strings scattered everywhere.
Before performing some statistical analysis using the data in the blended column, I subset this data frame by querying other columns (but not if a string is present). 99% of the time, once subset, this column is pure numbers, but string values rarely appear in the subset I need to capture.
What is the
most efficient way/Python way to iterate through a Pandas mixed-type column to check a string (or conversely, if the entire column is full of numeric values)?
If there is a string in the column I want to throw an error, otherwise continue.
Solution
Here’s one way. I’m not sure if it can be vectorized.
import pandas as pd
df = pd. DataFrame({'A': [1, None, 'hello', True, 'world', 'mystr', 34.11]})
df['stringy'] = [isinstance(x, str) for x in df. A]
# A stringy
# 0 1 False
# 1 None False
# 2 hello True
# 3 True False
# 4 world True
# 5 mystr True
# 6 34.11 False