Python – How to check if a string contains only UTF-8 characters

How to check if a string contains only UTF-8 characters… here is a solution to the problem.

How to check if a string contains only UTF-8 characters

So far I’m doing something like this:

def is_utf8(s):
    try:
        x=bytes(s,'utf-8').decode('utf-8', 'strict')
        print(x)
        return 1
    except:
        return 0

The only problem is that I don’t want it to print anything, I want to remove print(x) and when I do that, the feature stops working.
For example, if I do this: print(is_utf8("H tst")) it returns 0 when printed in a function, otherwise it prints 1. Did I solve this problem in the wrong way

Solution

You can use chardet to detect modules with unknown encodings. For example, if a is a byte array, you can determine the encoding like this:

import chardet

b = chardet.detect(a)
print(b["encoding"])

Related Problems and Solutions