.combine_first is used to merge multiple rows… here is a solution to the problem.
.combine_first is used to merge multiple rows
I have a pandas data frame (df) where some rows have duplicate rows. Some columns in these repeating rows have NaN values, while similar columns in duplicate rows do. I want to merge duplicate rows so that the missing values are replaced with the values in the duplicate rows, and then the duplicate rows are removed. For example, the following are duplicate lines:
id col1 col2 col3
0 01 abc 123
9 01 xy
The result should be like this:
id col1 col2 col3
0 01 abc xy 123
I tried .combine_first by using df.iloc[0:1,]
.combine_first(df.iloc[9:10,]) without success. Can someone help me? Thanks!
Solution
I think you need to >groupby Fill NaN
forward and backward, then yes drop_duplicates
:
print (df)
id col1 col2 col3
0 1 abc NaN 123.0
9 1 NaN xy NaN
0 2 abc NaN 17.0
9 2 NaN xr NaN
9 2 NaN xu NaN
df = df.groupby('id').apply(lambda x: x.ffill().bfill()).drop_duplicates()
print (df)
id col1 col2 col3
0 1 abc xy 123.0
0 2 abc xr 17.0
9 2 abc xu 17.0