Java regular expression escape characters

Java regular expression escape characters … here is a solution to the problem.

Java regular expression escape characters

When matching certain characters, such as line breaks, you can use the regular expression “\\n” or just “\n”. For example, the following splits a string into an array of rows:

String[] lines = allContent.split("\\r?\\n");

But the following is equally valid:

String[] lines = allContent.split("\r?\n");

My question:

The above two work the same way in Full, or are there any subtle differences? If it’s the latter, can you give an example where you get a different result?

Or is there only a difference in [possible/theoretical] performance?

Solution

There is no difference in the current scene. The usual string escape sequence is formed with the help of a single backslash, followed by valid escape characters (“\n”, “\r”, etc.) and regular expression escape sequences are literal backslashes (that is, double backslashes in Java string literals) and valid regular expression escape characters ( “\\n“, "\\ d" et al).

"\n” (escape sequence) is the literal LF (line break) and “\\n" is the regular expression escape sequence that matches the LF symbol.

"\r” (escape sequence) is the literal CR (carriage return) and “\\r" is the regular expression escape sequence that matches the CR symbol.

"\t” (escape sequence) is a literal tab character, and "\\t" is a regular expression escape sequence that matches tabs.

See Java regex docs for the list of supported regex Escape lists.

However, if you use Pattern.COMMENTS flag (used to introduce comments and format the pattern well so that the regular expression engine ignores all unescaped spaces in the pattern), you will need to define a newline character (LF) or “\r” in Java string literals and "\r" to define a carriage return (CR).

See Java test

:

String s = "\n";
System.out.println(s.replaceAll("\n", "LF"));  => LF
System.out.println(s.replaceAll("\\n", "LF"));  => LF
System.out.println(s.replaceAll("(?x)\\n", "LF"));  => LF
System.out.println(s.replaceAll("(?x)\\\n", "LF"));  => LF
System.out.println(s.replaceAll("(?x)\n", "<LF>")); 
 => <LF>
<LF>

Why is the last one generated <LF> + newline + <LF>? Because “(?x)\nequals "" , an empty pattern that matches the spaces before and after the newline.

Related Problems and Solutions