Python – Replaces a string in pandas python only if it matches the exact string

Replaces a string in pandas python only if it matches the exact string… here is a solution to the problem.

Replaces a string in pandas python only if it matches the exact string

I can’t replace strings in pandas the right way. I’m not sure if I’m limited to using pandas and may not be able to do that with pandas alone.

This is what my data frame looks like :

 (ID: 10)              247333605                      0.0  
  (ID: 20)               36738870                      0.0  
  (ID: 40)             4668036427                      0.0  
  (ID: 50)             1918647972                      0.0  
  (ID: 60)             4323165902                  44125.0  
  (ID: 80)              145512255                      0.0  
 Assigned (ID: 30)       42050340                      0.0  
 Assigned (ID: 40)   130880371376                      0.0  
 Assigning (ID: 30)    1095844753                      0.0  
 Cancelled (ID: 40)        937280                      0.0  
 Cancelled (ID: 80)   16857720813                      0.0  
 Planned (ID: 20)      9060392597                      0.0  
 Planning (ID: 10)   108484297031                      0.0  
 Processed (ID: 70)  133289880880                      0.0  
 Revoked (ID: 50)      2411903072                      0.0  
 Writing (ID: 50)    146408550024                      0.0  
 Written (ID: 60)    139458227923                1018230.0  

For each (ID: x),

it should match Assigned (ID: x), Dismissed (ID: x), and so on with the correct ID.

Use a line similar to this line:

input_data['last_status'] = input_data.last_status.str.replace('(ID: 10)', 'Planning (ID: 10)')

My output is:

(Assigned (ID: 40))                                0.0  
  (Cancelled (ID: 80))                               0.0  
  (Planned (ID: 20))                                 0.0  
  (Planning (ID: 10))                                0.0  
  (Writing (ID: 50))                                 0.0  
  (Written (ID: 60))                             44125.0  
 Assigned (Assigned (ID: 40))                        0.0  
 Assigned (ID: 30)                                   0.0  
 Assigning (ID: 30)                                  0.0  
 Cancelled (Assigned (ID: 40))                       0.0  
 Cancelled (Cancelled (ID: 80))                      0.0  
 Planned (Planned (ID: 20))                          0.0  
 Planning (Planning (ID: 10))                        0.0  
 Processed (ID: 70)                                  0.0  
 Revoked (Writing (ID: 50))                          0.0  
 Writing (Writing (ID: 50))                          0.0  
 Written (Written (ID: 60))                    1018230.0  

As you can see, all (ID: x) have been replaced, but it still doesn’t match the correct terminology.

My ideal data frame would look like this:

 Assigned (ID: 30)       42050340                      0.0  
 Assigned (ID: 40)   130880371376                      0.0  
 Assigning (ID: 30)    1095844753                      0.0  
 Cancelled (ID: 40)        937280                      0.0  
 Cancelled (ID: 80)   16857720813                      0.0  
 Planned (ID: 20)      9060392597                      0.0  
 Planning (ID: 10)   108484297031                      0.0  
 Processed (ID: 70)  133289880880                      0.0  
 Revoked (ID: 50)      2411903072                      0.0  
 Writing (ID: 50)    146408550024                      0.0  
 Written (ID: 60)    139458227923                1018230.0 

I

definitely want to use pandas because the datasets are large and I have different implementations, but they take a few days to run. Is there a way to do this in Pandas?

I’ve never asked any questions on StackOverflow before. I hope my question is clear.

Solution

If you want to generalize, you can use str.replace with SOL/EOL anchor.

df['last_status'].str.replace(r'^(\(ID: \d+\))$', r'Planning: \1')

0     Planning: (ID: 10)
1     Planning: (ID: 20)
2     Planning: (ID: 40)
3     Planning: (ID: 50)
4     Planning: (ID: 60)
5     Planning: (ID: 80)
6      Assigned (ID: 30)
7      Assigned (ID: 40)
8     Assigning (ID: 30)
9     Cancelled (ID: 40)
10    Cancelled (ID: 80)
11      Planned (ID: 20)
12     Planning (ID: 10)
13    Processed (ID: 70)
14      Revoked (ID: 50)
15      Writing (ID: 50)
16      Written (ID: 60)
Name: last_status, dtype: object

If you only want to replace a specific ID, change your regular expression to –

r'^(\(ID: 10\))$'

Or,

r'^(\(ID: {}\))$'.format(number)

where number is a variable that holds the ID value to perform the substitution.

Related Problems and Solutions