How Python merges two csv files

How Python merges two csv files … here is a solution to the problem.

How Python merges two csv files

I’m trying to merge two csv files (A and B) into one (C). The CSV file looks like this:

— CSV A —

Field_1, Key
a0     , k0
a1     , k1
a2     , k2
a3     , k0
a4     , k2
a5     , k0

— CSV B —

Key, Field_2, Field_3
k0 , b0     , c0
k1 , b1     , c1
k2 , b2     , c2
k3 , b3     , c3

— Expected CSV C (merged csv) —

Field_1, Key, Field_2, Field_3
a0     , k0 , b0     , c0
a1     , k1 , b1     , c1
a2     , k2 , b2     , c2
a3     , k0 , b0     , c0
a4     , k2 , b2     , c2
a5     , k0 , b0     , c0

So basically the field in CSV B that matches the key of CSV A should be added to get CSV C. But I leave the next merge field empty

— actual CSV C —

Field_1, Key, Field_2, Field_3
a0     , k0 ,        , 
a1     , k1 ,        , 
a2     , k2 ,        , 
a3     , k0 ,        , 
a4     , k2 ,        , 
a5     , k0 ,        , 

This is the code I’m trying to use to merge these fields. But as I said, I can’t get the data from the merged csv b, I can only get the header.

a = pd.read_csv("a.csv")
b = pd.read_csv("b.csv").rename(columns={'Key': ' Key'})
result = a.merge(b, on=" Key", how="left")
result.to_csv("c.csv", index=False)

So what can I do to properly merge data from CSV B? Thank you.

Solution

The problem is that you have spaces (_) in your key fields. In a.csv you have “_key” (e.g. “k0″) and in b.csv you have “key” (e.g. “k0_”), so the keys do not match. This code works if you remove spaces in the CSV file:

import pandas as pd

a = pd.read_csv("a.csv",sep=",")
b = pd.read_csv("b.csv",sep=",")

pd.merge(a,b,on="Key",how="left")

You can use skipinitialspace=True during the import of a.csv because the space is preceded, as shown below:

a = pd.read_csv("a.csv",sep=",",skipinitialspace=True)

Or you define a function to remove any spaces and apply it on import:

def trim(dataset):

trim = lambda x: x.strip() if type(x) is str else x #Stripping whitespaces in values

dataset = dataset.rename(columns=lambda x: x.strip()) #Stripping whitespaces in colnames

return dataset.applymap(trim)

a = trim(pd.read_csv("a.csv",sep=","))
b = trim(pd.read_csv("b.csv",sep=","))

Related Problems and Solutions