Java – How to convert Vietnamese text to normal text?

How to convert Vietnamese text to normal text?… here is a solution to the problem.

How to convert Vietnamese text to normal text?

I have such a Vietnamese text :

String text = "Xin chào Việt Nam";

I want to convert it to normal text. My expected result:

String result = " "Xin chao Viet Nam";

What should I do? Thank you.

Solution

You are looking for Normalizer in java.text.Normalizer
It allows you to map between accented Unicode characters and their breakups:
It basically converts all accented characters into their deAccented counterparts, followed by their combined diacritics. You can now use regular expressions to remove diacritics.

        public static void main(String[] args) {

System.out.println(deAccent("Xin chào Việt Nam"));
        }

public static String deAccent(String str) {
            String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
            Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
            return pattern.matcher(nfdNormalizedString).replaceAll("");
        }

Related Problems and Solutions