Python – .combine_first is used to merge multiple rows

.combine_first is used to merge multiple rows… here is a solution to the problem.

.combine_first is used to merge multiple rows

I have a pandas data frame (df) where some rows have duplicate rows. Some columns in these repeating rows have NaN values, while similar columns in duplicate rows do. I want to merge duplicate rows so that the missing values are replaced with the values in the duplicate rows, and then the duplicate rows are removed. For example, the following are duplicate lines:

     id   col1   col2   col3
0    01   abc           123
9    01           xy   

The result should be like this:

     id   col1   col2   col3
0    01   abc     xy     123

I tried .combine_first by using df.iloc[0:1,]

.combine_first(df.iloc[9:10,]) without success. Can someone help me? Thanks!

Solution

I think you need to >groupby Fill NaN forward and backward, then yes drop_duplicates :

print (df)
   id col1 col2   col3
0   1  abc  NaN  123.0
9   1  NaN   xy    NaN
0   2  abc  NaN   17.0
9   2  NaN   xr    NaN
9   2  NaN   xu    NaN

df = df.groupby('id').apply(lambda x: x.ffill().bfill()).drop_duplicates()
print (df)
   id col1 col2   col3
0   1  abc   xy  123.0
0   2  abc   xr   17.0
9   2  abc   xu   17.0

Related Problems and Solutions