Python – in Pandas : loc or join? is the most efficient way to join a data frame

in Pandas : loc or join? is the most efficient way to join a data frame… here is a solution to the problem.

in Pandas : loc or join? is the most efficient way to join a data frame

Let’s say I have two data frames; One holds the transaction, trans, the other holds the product information, prod, I want to add the product price, variable price, to the transaction data frame, repeat them for each column. Which of the following methods is more effective/popular:

Method 1:

trans = trans.set_index('product_id').join(trans.set_index('product_id'))

Method 2:

trans.set_index('product_id',inplace=True)
trans['price'] = prod.loc[trans.product_id, 'price']

Solution

It seems that you need to map :

trans = pd. DataFrame({'product_id':[1,2,3],
                   'price':[4,5,6]})

print (trans)
   price  product_id
0      4           1
1      5           2
2      6           3

prod = pd. DataFrame({'product_id':[1,2,4],
                   'price':[40,50,60]})

print (prod)
   price  product_id
0     40           1
1     50           2
2     60           4

d = prod.set_index('product_id')['price'].to_dict()
trans['price'] = trans['product_id'].map(d)
print (trans)
   price  product_id
0   40.0           1
1   50.0           2
2    NaN           3

Related Problems and Solutions