Python – Vectorized lookup of Pandas data frame column values in a separate list

Vectorized lookup of Pandas data frame column values in a separate list… here is a solution to the problem.

Vectorized lookup of Pandas data frame column values in a separate list

I’m looking for a fast (vectorized) way to perform calculations using the contents of a Pandas data frame.

My data frame

contains 2 labels per row and I want to find the value corresponding to each label (from the dictionary/list) and perform a calculation to return the result to a new column in the data frame.

I’ve included below a working example of my use of loops.

label1s = np.array(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], dtype=str)
label2s = np.array(['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'], dtype=str)
data = np.column_stack([label1s, label2s])

label_values = {'A':1, 'B':2, 'C':3}

df = pd. DataFrame(data=data, columns=['Label1', 'Label2'])

new_col = np.zeros_like(label1s, dtype=float)

for index, row in df.iterrows():
    val1 = label_values[row['Label1']]
    val2 = label_values[row['Label2']]
    new_col[index] = val1 - val2

df['result'] = new_col
df

However, for large datasets, loops are very undesirable and slow.

Is there any way to optimize?

I’ve explored some features of pandas like “lookup”, but this seems to require arrays of each size, whereas in my case I need to look up values from an external list and the size is not the same as the data frame.

Solution

You can map dictionary to the desired column, ie

df['result'] = df. Label1.map(label_values) - df. Label2.map(label_values)

Related Problems and Solutions