Python – How to populate the np.nan value with data from another row based on a match

How to populate the np.nan value with data from another row based on a match… here is a solution to the problem.

How to populate the np.nan value with data from another row based on a match

I need to do the following

a=[1,2,3,4,5]
c=[0,100,100,200,200,0]
b=['2013-06-10', np.nan, '2013-02-15', np.nan, '2013-05-15']
df=pd. DataFrame({'a':a,'b':b,'c':c})

This will give:

   a           b    c
0  1  2013-06-10  100
1  2         NaN  100
2  3  2013-02-15  200
3  4         NaN  200
4  5  2013-05-15  100

I want to look up the same value in the previous row based on the value in column C and fill the date when column B is empty.
It should end up like this :-

   a           b    c
0  1  2013-06-10  100
1  2  2013-06-10  100
2  3  2013-02-15  200
3  4  2013-02-15  200
4  5  2013-05-15  100

I currently use the Apply Lambda row formula function to fill in dates, but because my raw data has millions of rows, the speed is much lower. I was wondering if anyone knows of a faster way to populate values with data from different rows based on the same value in column C

Solution

You can use ffill :

df['b'] = df.groupby('c')['b'].ffill()
print (df)
   a           b    c
0  1  2013-06-10  100
1  2  2013-06-10  100
2  3  2013-02-15  200
3  4  2013-02-15  200
4  5  2013-05-15  100

Also, if the first value in some group in b is NaN, use apply because two functions need to be applied by group:

print (df)
   a           b    c
0  1         NaN  100 <- NaN
1  1  2013-06-10  100
2  2         NaN  100
3  3  2013-02-15  200
4  4         NaN  200
5  5  2013-05-15  100

df['b'] = df.groupby('c')['b'].apply(lambda x: x.ffill().bfill())
print (df)
   a           b    c
0  1  2013-06-10  100
1  1  2013-06-10  100
2  2  2013-06-10  100
3  3  2013-02-15  200
4  4  2013-02-15  200
5  5  2013-05-15  100

Related Problems and Solutions