Replaces a string in pandas python only if it matches the exact string
I can’t replace strings in pandas the right way. I’m not sure if I’m limited to using pandas and may not be able to do that with pandas alone.
This is what my data frame looks like :
(ID: 10) 247333605 0.0
(ID: 20) 36738870 0.0
(ID: 40) 4668036427 0.0
(ID: 50) 1918647972 0.0
(ID: 60) 4323165902 44125.0
(ID: 80) 145512255 0.0
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
For each (ID: x),
it should match Assigned (ID: x), Dismissed (ID: x), and so on with the correct ID.
Use a line similar to this line:
input_data['last_status'] = input_data.last_status.str.replace('(ID: 10)', 'Planning (ID: 10)')
My output is:
(Assigned (ID: 40)) 0.0
(Cancelled (ID: 80)) 0.0
(Planned (ID: 20)) 0.0
(Planning (ID: 10)) 0.0
(Writing (ID: 50)) 0.0
(Written (ID: 60)) 44125.0
Assigned (Assigned (ID: 40)) 0.0
Assigned (ID: 30) 0.0
Assigning (ID: 30) 0.0
Cancelled (Assigned (ID: 40)) 0.0
Cancelled (Cancelled (ID: 80)) 0.0
Planned (Planned (ID: 20)) 0.0
Planning (Planning (ID: 10)) 0.0
Processed (ID: 70) 0.0
Revoked (Writing (ID: 50)) 0.0
Writing (Writing (ID: 50)) 0.0
Written (Written (ID: 60)) 1018230.0
As you can see, all (ID: x) have been replaced, but it still doesn’t match the correct terminology.
My ideal data frame would look like this:
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
I
definitely want to use pandas because the datasets are large and I have different implementations, but they take a few days to run. Is there a way to do this in Pandas?
I’ve never asked any questions on StackOverflow before. I hope my question is clear.
Solution
If you want to generalize, you can use str.replace
with SOL/EOL anchor.
df['last_status'].str.replace(r'^(\(ID: \d+\))$', r'Planning: \1')
0 Planning: (ID: 10)
1 Planning: (ID: 20)
2 Planning: (ID: 40)
3 Planning: (ID: 50)
4 Planning: (ID: 60)
5 Planning: (ID: 80)
6 Assigned (ID: 30)
7 Assigned (ID: 40)
8 Assigning (ID: 30)
9 Cancelled (ID: 40)
10 Cancelled (ID: 80)
11 Planned (ID: 20)
12 Planning (ID: 10)
13 Processed (ID: 70)
14 Revoked (ID: 50)
15 Writing (ID: 50)
16 Written (ID: 60)
Name: last_status, dtype: object
If you only want to replace a specific ID, change your regular expression to –
r'^(\(ID: 10\))$'
Or,
r'^(\(ID: {}\))$'.format(number)
where number
is a variable that holds the ID value to perform the substitution.