Remove non ascii characters from String in Java example – Java Code Examples

This example shows how to remove non ascii characters from String in Java. Example also shows how to remove non ascii characters from String using regular expression.

How to replace non ascii characters with ASCII equivalent character?

All above approaches remove non ascii characters from the String. What if you want to replace “ä” with “a”? You can do that by normalize the string first and then replace the non ascii characters as given below.

Output

Alternatively, you can also use “[^\\p{ASCII}]” to remove all non-ascii characters after normalizing the string value as given below.

Output

If the text is in Unicode format, “[\\p{M}]” pattern should be used instead of “[^\\p{ASCII}]” pattern as given below.

Output

In regular expression \\p{M} matches the accent while \\P{M} matches the glyph of a Unicode character.

Finally, if you are using Apache Commons library, you can use stripAccents method of StringUtils class to remove accents from the Unicode characters as given below.

Output

How to remove only non printable characters from the String?

If you want to keep only printable characters and remove all non printable characters from the String you can use below given code.

Please note that above code also removes \t (tab), \n (new line) and \r (carriage return) characters as well.

Please let us know your views in the comments section below.

Source: Remove non ascii characters from String in Java example – Java Code Examples

Advertisements
This entry was posted in Compus and tagged , , , . Bookmark the permalink.