Python 3 encoding replaces unicode characters … here is a solution to the problem.
Python 3 encoding replaces unicode characters
According to the documentation, the following command is performed
'Brückenspinne'.encode("utf-8",errors='replace')
Should I be given the byte sequence b'Br?? ckenspinne'
。 However, Unicode characters are not replaced, but encoded:
b'Br\xc3\xbcckenspinne'
Can you tell me how I actually eliminated unicode characters? (I used replace for testing, and I plan to use ‘xmlcharrefreplace'
later.) Honestly, I want to convert unicode characters to their xmlcharref, leaving everything as strings).
Thank you.
Solution
UTF-8
encoding can represent the character ü
; No substitution occurred.
Use other encodings that do not represent characters. For example, ascii
:
>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'
>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Brü ckenspinne'