Java Regex APIs and Quoting

I’ve just been digging through the J2SE 1.4 regex stuff, and every time I have to do regex work in Java I keep thinking how much easier it is in other languages. Specifically I’m talking about the clunkiness of the various regexen APIs in Java and the requirement to double-backslash regex operators. We need a better way. Ruby and Perl both have native regex support built in to the language, so the backslashes are just fine. Python, which doesn’t have native regex support (it’s in the library), does have “raw” string quoting, which allows you not to double-up the backslashes. So what I have to write like this in Java:

1  Pattern p =
2      Pattern.compile("(\(\d+\))?\s*(\d{3}\s*\-\s*(\d{3})");
3  Matcher m = p.matcher(my_string);
4  if (m.matches())
5  {
6      ...;
7  }

or

1  if (Pattern.matches("(\(\d+\))?\s*(\d{3}\s*\-\s*(\d{3})", my_string))
2  {
3      ...;
4  }

looks like this in Python:

1  if re.match("((d+))?s*(d{3}s*-s*(d{3})", my_string):
2      ...

and could be even more easily written in Ruby thus:

1  if my_string =~ /((d+))?s*(d{3}s*-s*(d{3})/
2      ...
3  end

See the difference? The built-in regex support is really nice and the ease of quoting is a beautiful thing. I doubt that we’ll ever see either of these in Java since they would certainly be considered non-trivial to add.