Python – The average of the elements in the list grouped by the first item in the list

The average of the elements in the list grouped by the first item in the list… here is a solution to the problem.

The average of the elements in the list grouped by the first item in the list

My list looks like my_list = [[‘A’, 6, 7], [‘A’, 4, 8], [‘B’, 9, 3], [‘C’, 1,

1]], ['B', 10, 7]

].

I want to find the average of the other two columns in each internal list, grouped by the first column in each internal list.

[['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]]

[‘A’, 5,

7.5] from [‘A’, (6+4)/2 ,(7+8)/2].

I

don’t mind if I end up with a dictionary or something, but I hope it’s still a list.

I tried the following:


  1. my_list1 = [i[0] for i in my_list]
    my_list2 = [i[1:] for i in my_list]
    new_dict = {k: v for k, v in zip(my_list1, my_list2)}

Split the original list so the first column becomes KEY and the second and third columns become VALUE, converting it to a dictionary will give you an aggregation, but the problem is

I want to keep decimal places, it rounds and gives me integers instead of floats

my_list1 = ['A', 'A', 'B', 'C', 'B']

my_list2 = [[6, 7], [4, 8], [9, 3], [1, 1], [10, 7]]

new_dict= {'A': [5, 8], 'B': [10, 5], 'C': [1, 1]}

When I ideally want it to be [[‘A’, 5, 7.5], [‘B’, 9.5, 5], ['C', 1, 1]] (don’t mind if it’s a dictionary or not).


  1. Probably using for loop thinking to convert the second and third columns to float, and then it gives me a float when I convert it to a dictionary. But there is no difference, it rounds and gives an integer

    for i in range(0, len(my_list)):
      for j in range(1, len(my_list[i])):
        my_list[i][j].astype(float)
    
    dict = {}
    
    for l2 in my_list:
      dict[l2[0]] = l2[1:]
    

The reason I need to keep the decimal places is because the second and third columns refer to the x and y coordinates…

So all in all, the goal is to find the average of the other two columns in each inner list, grouped by the first column in each inner list, and keep as many decimal places as possible

Solution

Suppose you intend to use the following list:

In [4]: my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1], ['B', 10, 7]]

Just use defaultdict to group by the first element and find mean:

In [6]: from collections import defaultdict

In [7]: grouper = defaultdict(list)

In [8]: for k, *tail in my_list:
    ...:     grouper[k].append(tail)
    ...:

In [9]: grouper
Out[9]:
defaultdict(list,
            {'A': [[6, 7], [4, 8]], 'B': [[9, 3], [10, 7]], 'C': [[1, 1]]})

In [10]: import statistics

In [11]: {k: list(map(statistics.mean, zip(*v))) for k,v in grouper.items()}
Out[11]: {'A': [5, 7.5], 'B': [9.5, 5], 'C': [1, 1]}

Note that if you are using Python 2, you do not need to call list after map. Also, you should use iteritems instead of items.

In addition, you must do the following:

for sub in my_list:
    grouper[sub[0]].append(sub[1:])

Not a clean version on Python 3.

Finally, there is no statistics module in Python 2. So it just takes :

def mean(seq):
    return float(sum(seq))/len(seq)

And use mean instead of statistics.mean

Related Problems and Solutions