I’m a big fan of regular expressions, because they let you parse text in very concise, and sometimes complicated, ways. Though I agree with jwz about regular expressions in lots of cases, I still use them frequently. Perl was the first language that really let me use regex, way back in 1991. After that I used various implementations in C, Python, Ruby, Java and various other languages. While I was glad that Java 5 finally added regex support, I was disappointed at the implementation. It’s kind of clunky, and because Java doesn’t have regex syntax baked into the language, and no support for “raw” strings, you end up with a regex with twice as many backslashes as necessary.
Last night while reading Programming In Scala, I came upon the discussion of Scala’s regex class. Scala has a raw string, so you have exactly as many backslashes in your pattern as necessary, but each regex you create also defines a Scala extractor, so you can easily bind local variables to groups within the expression.
First, let’s look at the Java code.
[java]Pattern emailParser = Pattern.compile("([\w\d\-\_]+)(\+\d+)?@([\w\d\-\.]+)");
String s = "firstname.lastname@example.org";
Matcher m = emailParser.matcher(s);
String name = m.group(1);
String num = m.group(2);
String domain = m.group(3);
System.out.printf("Name: [%s], Num: [%s], Domain: [%s]n", name, num, domain);
That’s pretty simple, though the double-backslashes really annoy me. Running this code results in the local variables name and domain getting assigned parts of the email address. The variable called num is assigned null, because the email address didn’t contain a plus sign followed by a number. Now, here’s the same program in Scala.
[scala highlight="1, 5"]val EmailParser = """([wd-_]+)(+d+)?@([wd-.]+)""".r
val s = "email@example.com"
val EmailParser(name, num, domain) = s
printf("Name: %s, Domain: %sn", name, domain)[/scala]
In line 1, we are calling the “r” method on a raw string. This method converts the String into a Regex and returns it. We’re then assigning it to a local val called EmailParser. Also notice line 5. In that one line, we are declaring three local vals and assigning them whatever the groups in the regex matched, or null if they didn’t match. Just like with the Java example, num will be null since there was no plus sign followed by a number. If you change the email address in either example to something like “firstname.lastname@example.org”, then all three variables will be assigned parts of the string.
Do you have to have this level of support to do regex? No. Does it make things a lot nicer? Indeed.
Now, I just discovered that in Scala if the regex doesn’t match at all, then a MatchError is thrown. That’s sort of a bummer, because it means you’ll have to add a try/catch around your code. Still, I like the extractor syntax that binds regex groups to local variables in one step.
You can see some more examples of regex in Scala over here.