Python 3 encoding replaces unicode characters

Python 3 encoding replaces unicode characters … here is a solution to the problem.

Python 3 encoding replaces unicode characters

According to the documentation, the following command is performed

'Brückenspinne'.encode("utf-8",errors='replace')

Should I be given the byte sequence b'Br?? ckenspinne'。 However, Unicode characters are not replaced, but encoded:

b'Br\xc3\xbcckenspinne'

Can you tell me how I actually eliminated unicode characters? (I used replace for testing, and I plan to use ‘xmlcharrefreplace' later.) Honestly, I want to convert unicode characters to their xmlcharref, leaving everything as strings).

Thank you.

Solution

UTF-8 encoding can represent the character ü; No substitution occurred.

Use other encodings that do not represent characters. For example, ascii:

>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'

>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Br&#252; ckenspinne'

Python 3 encoding replaces unicode characters

Solution

Related Problems and Solutions