Python Data Frame : simple string split that includes ‘-‘ … here is a solution to the problem.
Python Data Frame : simple string split that includes ‘-‘
Happy New Year everyone. I have a Dataframe with ints and strings in each column. In my string column, some of my values contain “-” in the middle, and I want to remove any strings after “-“. Take a look at my df below.
input:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small-box Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium-box esf 20160101
4 95713207 A1 Dum-pal ess 20160101
...
output:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
...
So in my “facet_cls” column, anything after the “-” (including the “-“) needs to be removed. And my data itself is very large, so I want to use the fastest process I can find. Any ideas?
Thanks in advance!
Solution
Use >split and pass str[0]
selects only the first value of the list:
df['facet_cls'] = df['facet_cls'].str.split('-').str[0]
print (df)
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
Details:
print (df['facet_cls'].str.split('-'))
0 [Small]
1 [Small, box]
2 [Medium]
3 [Medium, box]
4 [Dum, pal]
Name: facet_cls, dtype: object