How to convert Vietnamese text to normal text?… here is a solution to the problem.
How to convert Vietnamese text to normal text?
I have such a Vietnamese text :
String text = "Xin chào Việt Nam";
I want to convert it to normal text. My expected result:
String result = " "Xin chao Viet Nam";
What should I do? Thank you.
Solution
You are looking for Normalizer in java.text.Normalizer
It allows you to map between accented Unicode characters and their breakups:
It basically converts all accented characters into their deAccented counterparts, followed by their combined diacritics. You can now use regular expressions to remove diacritics.
public static void main(String[] args) {
System.out.println(deAccent("Xin chào Việt Nam"));
}
public static String deAccent(String str) {
String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(nfdNormalizedString).replaceAll("");
}