How Python merges two csv files
I’m trying to merge two csv files (A and B) into one (C). The CSV file looks like this:
— CSV A —
Field_1, Key
a0 , k0
a1 , k1
a2 , k2
a3 , k0
a4 , k2
a5 , k0
— CSV B —
Key, Field_2, Field_3
k0 , b0 , c0
k1 , b1 , c1
k2 , b2 , c2
k3 , b3 , c3
— Expected CSV C (merged csv) —
Field_1, Key, Field_2, Field_3
a0 , k0 , b0 , c0
a1 , k1 , b1 , c1
a2 , k2 , b2 , c2
a3 , k0 , b0 , c0
a4 , k2 , b2 , c2
a5 , k0 , b0 , c0
So basically the field in CSV B that matches the key of CSV A should be added to get CSV C. But I leave the next merge field empty
— actual CSV C —
Field_1, Key, Field_2, Field_3
a0 , k0 , ,
a1 , k1 , ,
a2 , k2 , ,
a3 , k0 , ,
a4 , k2 , ,
a5 , k0 , ,
This is the code I’m trying to use to merge these fields. But as I said, I can’t get the data from the merged csv b, I can only get the header.
a = pd.read_csv("a.csv")
b = pd.read_csv("b.csv").rename(columns={'Key': ' Key'})
result = a.merge(b, on=" Key", how="left")
result.to_csv("c.csv", index=False)
So what can I do to properly merge data from CSV B? Thank you.
Solution
The problem is that you have spaces (_) in your key fields. In a.csv you have “_key” (e.g. “k0″) and in b.csv you have “key” (e.g. “k0_”), so the keys do not match. This code works if you remove spaces in the CSV file:
import pandas as pd
a = pd.read_csv("a.csv",sep=",")
b = pd.read_csv("b.csv",sep=",")
pd.merge(a,b,on="Key",how="left")
You can use skipinitialspace=True during the import of a.csv because the space is preceded, as shown below:
a = pd.read_csv("a.csv",sep=",",skipinitialspace=True)
Or you define a function to remove any spaces and apply it on import:
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x #Stripping whitespaces in values
dataset = dataset.rename(columns=lambda x: x.strip()) #Stripping whitespaces in colnames
return dataset.applymap(trim)
a = trim(pd.read_csv("a.csv",sep=","))
b = trim(pd.read_csv("b.csv",sep=","))