Python – Dataframe: How to select a different column for each row

Dataframe: How to select a different column for each row… here is a solution to the problem.

Dataframe: How to select a different column for each row

Let’s consider a data frame A, which contains three columns: a, b, and c. Let’s say we also have series B of the same size as A. In each row, it contains the name of one of column A. I want to build a Series that contains the values in the column specified by B in table A.

The simplest example is as follows:

idxs = np.arange(0, 5)
A = pd. DataFrame({
    'a': [3, 1, 5, 7, 8],
    'b': [5, 6, 7, 3, 1],
    'c': [2, 7, 8, 2, 1],
}, index=idxs)
B = pd. Series(['b', 'c', 'c', 'a', 'a'], index=idxs)

I need to apply some operations to get the same result as the following series:

C = pd. Series([5, 7, 8, 7, 8], index=idxs)

In such a simple example, the following “broadcast” can be performed on a pure numpy array:

d = {'a':0, 'b':1, 'c':2 }
AA = A.rename(columns=d).as_matrix()
BB = B.apply(lambda x: d[x]).as_matrix()

CC = AA[idxs, BB]

This works, but in my actual problem, where I have multiple indexes of Dataframes, things get more complicated.

Is it possible to do this using the pandas tool?

The first thing that comes to my mind is:

A['idx'] = B;
C = A.apply(lambda x: x[x['idx']], axis=1)

Effective!

Solution

You can use DataFrame.lookup:

pd. Series(A.lookup(B.index, B), index=B.index)

0    5
1    7
2    8
3    7
4    8
dtype: int64

The NumPy solution involving broadcasting is:

A.values[B.index, (A.columns.values == B[:, None]).argmax(1)]
# array([5, 7, 8, 7, 8])

Related Problems and Solutions