in Pandas : loc or join? is the most efficient way to join a data frame… here is a solution to the problem.
in Pandas : loc or join? is the most efficient way to join a data frame
Let’s say I have two data frames; One holds the transaction, trans
, the other holds the product information, prod
, I want to add the product price, variable price
, to the transaction data frame, repeat them for each column. Which of the following methods is more effective/popular:
Method 1:
trans = trans.set_index('product_id').join(trans.set_index('product_id'))
Method 2:
trans.set_index('product_id',inplace=True)
trans['price'] = prod.loc[trans.product_id, 'price']
Solution
It seems that you need to map
:
trans = pd. DataFrame({'product_id':[1,2,3],
'price':[4,5,6]})
print (trans)
price product_id
0 4 1
1 5 2
2 6 3
prod = pd. DataFrame({'product_id':[1,2,4],
'price':[40,50,60]})
print (prod)
price product_id
0 40 1
1 50 2
2 60 4
d = prod.set_index('product_id')['price'].to_dict()
trans['price'] = trans['product_id'].map(d)
print (trans)
price product_id
0 40.0 1
1 50.0 2
2 NaN 3