The average of the elements in the list grouped by the first item in the list
My list looks like my_list = [[‘A’, 6, 7], [‘A’, 4, 8], [‘B’, 9, 3], [‘C’, 1,
1]], ['B', 10, 7]
].
I want to find the average of the other two columns in each internal list, grouped by the first column in each internal list.
[['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]]
[‘A’, 5,
7.5]
from [‘A’, (6+4)/2 ,(7+8)/2].
I
don’t mind if I end up with a dictionary or something, but I hope it’s still a list.
I tried the following:
my_list1 = [i[0] for i in my_list]
my_list2 = [i[1:] for i in my_list]
new_dict = {k: v for k, v in zip(my_list1, my_list2)}
Split the original list so the first column becomes KEY and the second and third columns become VALUE, converting it to a dictionary will give you an aggregation, but the problem is
I want to keep decimal places, it rounds and gives me integers instead of floats
my_list1 = ['A', 'A', 'B', 'C', 'B']
my_list2 = [[6, 7], [4, 8], [9, 3], [1, 1], [10, 7]]
new_dict= {'A': [5, 8], 'B': [10, 5], 'C': [1, 1]}
When I ideally want it to be [[‘A’, 5, 7.5], [‘B’, 9.5, 5], ['C', 1, 1]]
(don’t mind if it’s a dictionary or not).
Probably using for loop thinking to convert the second and third columns to float, and then it gives me a float when I convert it to a dictionary. But there is no difference, it rounds and gives an integer
for i in range(0, len(my_list)): for j in range(1, len(my_list[i])): my_list[i][j].astype(float) dict = {} for l2 in my_list: dict[l2[0]] = l2[1:]
The reason I need to keep the decimal places is because the second and third columns refer to the x and y coordinates…
So all in all, the goal is to find the average of the other two columns in each inner list, grouped by the first column in each inner list, and keep as many decimal places as possible
Solution
Suppose you intend to use the following list:
In [4]: my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1], ['B', 10, 7]]
Just use defaultdict
to group by the first element and find mean
:
In [6]: from collections import defaultdict
In [7]: grouper = defaultdict(list)
In [8]: for k, *tail in my_list:
...: grouper[k].append(tail)
...:
In [9]: grouper
Out[9]:
defaultdict(list,
{'A': [[6, 7], [4, 8]], 'B': [[9, 3], [10, 7]], 'C': [[1, 1]]})
In [10]: import statistics
In [11]: {k: list(map(statistics.mean, zip(*v))) for k,v in grouper.items()}
Out[11]: {'A': [5, 7.5], 'B': [9.5, 5], 'C': [1, 1]}
Note that if you are using Python 2, you do not need to call list
after map
. Also, you should use iteritems
instead of items
.
In addition, you must do the following:
for sub in my_list:
grouper[sub[0]].append(sub[1:])
Not a clean version on Python 3.
Finally, there is no statistics
module in Python 2. So it just takes :
def mean(seq):
return float(sum(seq))/len(seq)
And use mean instead of statistics.mean