Python merges two columns based on the keys in the first column

Python merges two columns based on the keys in the first column … here is a solution to the problem.

Python merges two columns based on the keys in the first column

Let’s say I have two columns in an excel file as follows:

1 1
1 2
2 3
3 4
4 5
5 6
1 3

My goal here is to implement a mapping between two columns. If the values in the first column are the same in multiple rows, the corresponding values are added to the second column. So my output should be like this: [1

:6, 2:3, 3:4, 4:5, 5:6].

Logic: The number “1” appears in row 3 and corresponds to values 1, 2, and 3. Therefore, the total value of key 1 becomes 1+2+3=6.

I started with one method and came to an end :

import xlrd
book = xlrd.open_workbook('C:\\Users\\a593977\\Desktop\\ExcelTest.xlsx')
sheet = book.sheet_by_name('Sheet1')
data = [[sheet.cell_value(c, r) for c in range(sheet.nrows)] for r in range(sheet. ncols)]
firstColumn=data[0]
firstColumn=sorted(firstColumn)
secondColumn=data[1]
secondColumn=sorted(secondColumn)
print(list(zip(firstColumn,secondColumn)))

The output of this code is:

[(1.0, 1.0), (1.0, 2.0), (1.0, 3.0), (2.0, 3.0), (3.0, 4.0), (4.0,
5.0), (5.0, 6.0)]

But the goal is: [1:6, 2:3, 3:4, 4:5

, 5:6]. How do I proceed?

Solution

Use Pandas. Try groupby, sum, and agg.

import pandas as pd

df = pd.read_excel('C:\\Users\\a593977\\Desktop\\ExcelTest.xlsx', header=None)
res = (df
      .groupby(df.columns[0], as_index=False, sort=False)[df.columns[1]]
      .sum()
      .astype(str)
      .agg(':'.join, 1)
      .tolist()
)

print(res)
['1:6', '2:3', '3:4', '4:5', '5:6']

Related Problems and Solutions