How to check if a string contains only UTF-8 characters… here is a solution to the problem.
How to check if a string contains only UTF-8 characters
So far I’m doing something like this:
def is_utf8(s):
try:
x=bytes(s,'utf-8').decode('utf-8', 'strict')
print(x)
return 1
except:
return 0
The only problem is that I don’t want it to print anything, I want to remove print(x)
and when I do that, the feature stops working.
For example, if I do this: print(is_utf8("H tst"))
it returns 0 when printed in a function, otherwise it prints 1. Did I solve this problem in the wrong way
Solution
You can use chardet to detect modules with unknown encodings. For example, if a is a
byte array, you can determine the encoding like this:
import chardet
b = chardet.detect(a)
print(b["encoding"])