Dataframe: How to select a different column for each row
Let’s consider a data frame A, which contains three columns: a, b, and c. Let’s say we also have series B of the same size as A. In each row, it contains the name of one of column A. I want to build a Series that contains the values in the column specified by B in table A.
The simplest example is as follows:
idxs = np.arange(0, 5)
A = pd. DataFrame({
'a': [3, 1, 5, 7, 8],
'b': [5, 6, 7, 3, 1],
'c': [2, 7, 8, 2, 1],
}, index=idxs)
B = pd. Series(['b', 'c', 'c', 'a', 'a'], index=idxs)
I need to apply some operations to get the same result as the following series:
C = pd. Series([5, 7, 8, 7, 8], index=idxs)
In such a simple example, the following “broadcast” can be performed on a pure numpy array:
d = {'a':0, 'b':1, 'c':2 }
AA = A.rename(columns=d).as_matrix()
BB = B.apply(lambda x: d[x]).as_matrix()
CC = AA[idxs, BB]
This works, but in my actual problem, where I have multiple indexes of Dataframes, things get more complicated.
Is it possible to do this using the pandas tool?
The first thing that comes to my mind is:
A['idx'] = B;
C = A.apply(lambda x: x[x['idx']], axis=1)
Effective!
Solution
You can use DataFrame.lookup
:
pd. Series(A.lookup(B.index, B), index=B.index)
0 5
1 7
2 8
3 7
4 8
dtype: int64
The NumPy solution involving broadcasting is:
A.values[B.index, (A.columns.values == B[:, None]).argmax(1)]
# array([5, 7, 8, 7, 8])