99 Scala Problems #28 – I Like My Solution Better

I’ve been working through this list of 99 Scala Problems, which is modeled after this list of 99 Prolog Problems. As I’ve been going through them, I have been comparing my solutions to those provided (obviously). Sometimes, my solution is more or less the same as the “official” solution. Sometimes, theirs is better. In the case of problem 28, I think mine is far easier to read and understand.

Problem 28 has two parts. The first part reads:

a) We suppose that a list contains elements that are lists themselves. The objective is to sort the elements of the list according to their length. E.g. short lists first, longer lists later, or vice versa.

Running the function should look like this:

[scala]
scala> lsort(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res0: List[List[java.lang.String]] = List(List(o), List(d, e), List(d, e), List(m, n), List(a, b, c), List(f, g, h), List(i, j, k, l))
[/scala]

For this part, my solution was almost identical. Here’s what I came up with:

[scala]
def lsort[T](ls: List[List[T]]) = {
ls.sortWith {(a, b) => a.length < b.length}
}
[/scala]

You can see that this function takes a List of type T, and then calls the sortWith method on that list, passing in a function value that sorts the lists based on their length, shortest to longest. The “official” solution was only slightly different:

[scala]
def lsort[A](ls: List[List[A]]): List[List[A]] =
ls sort { _.length < _.length }
[/scala]

Here, they used A instead of T, but that doesn’t affect anything, and they specified the return type, while I left mine inferred. Instead of assigning each bucket of the list to a named variable, as I did, they use the underscore placeholder. The two functions are functionally (get it?) identical, but theirs is a bit shorter because they removed the outer braces, and were able to skip the parameter list, since they used the underscores.

Now, the second part is where I diverge from the official solution. Here’s the problem description:

b) Again, we suppose that a list contains elements that are lists themselves. But this time the objective is to sort the elements according to their length frequency; i.e. in the default, sorting is done ascendingly, lists with rare lengths are placed [first], others with a more frequent length come later.

And the expected call and result is

[scala]
scala> lsortFreq(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res1: List[List[java.lang.String]] = List(List(i, j, k, l), List(o), List(a, b, c), List(f, g, h), List(d, e), List(d, e), List(m, n))[/scala]

First, let’s look at what they presented as the solution. It referenced functions from other files, but I have included them all here for easy of viewing.

[scala]
def lsortFreq[A](ls: List[List[A]]): List[List[A]] = {
val freqs = Map(encode(ls map { _.length } sort { _ < _ }) map { _.swap }:_*)
ls sort { (e1, e2) => freqs(e1.length) < freqs(e2.length) }
}

def encode[T](ls: List[T]): List[(Int, T)] = {
val packedList = pack(ls)
packedList map {list => (list.length, list.head)}
}

def pack[T](ls: List[T]): List[List[T]] = ls match {
case Nil => Nil
case h :: tail => (h :: tail.takeWhile(_ == h)) :: pack(tail.dropWhile(_ == h))
}
[/scala]

I think this is very confusing code. It’s calling the encode function which does run-length encoding of the passed-in thing. It then uses a Map of these encodings to sort the passed-in list. The presence of five underscores in the first line, obscures where those parameters are coming from, and the final underscore is actually part of the _* method of the Array class!

My solution, while being a longer function, is far more readable, in my opinion. And, it’s the same number of lines as the three-method solution. Here it is

[scala]
def lsortFreq[T](ls: List[List[T]]) = {
val lengthMap = scala.collection.mutable.Map[Int, Int]()

for (l <- ls) {
val len = l.length
if (!lengthMap.contains(len)) {
lengthMap(len) = 1
} else {
lengthMap(len) += 1
}
}

ls sortWith {(a, b) => lengthMap(a.length) < lengthMap(b.length)}
}
[/scala]

In my function, I created a mutable Map and then iterate over the list, getting each item’s length, and then keep a running tally of how many items had that length. The map has these lengths as its keys, and the number of items with that length as its values. Get it? I then sort the original list by having each item in the comparison lookup how many items share its length, and use that as the sort criterion.

I have no idea which of these solutions is more efficient. For small problems like this, I doubt there’s any measurable difference. But I do believe that my solution is easier to read and understand. So much so, in fact, that I think someone who is not familiar with Scala would be able to easily figure out what it’s doing. I don’t know that the same can be said of the other solution.

I got criticized for promoting terse code in this article, so this is my attempt at balance. 🙂

Note: I did change the inputs to these functions from symbols to strings. The code formatter I use on the blog wasn’t colorizing things properly when there were symbols involved.

Procedural vs. Functional

With the rise of Scala and Clojure, there’s been a lot of talk lately about procedural vs. functional styles of coding. Most developers are accustomed to procedural coding, and functional can be hard to get a handle on. I was working through Programming in Scala (again) this morning, and I came upon this function:

// Procedural implementation
def longestWord(words: Array[String]) = {
  var word = words(0)
  var idx = 0

  for (i <- 1 until words.length)
    if (words(i).length > word.length) {
      word = words(i)
      idx = i
    }

  (word, idx)
}

The purpose of this function is to find the longest word in the passed-in array, and return a tuple with that longest word, and its index in the array. You can see that in this function, we have two vars, one for the current longest word, and another for its index in the array. We then use a for expression to walk the array, reassigning word and idx when we find a longer word. This is very much like how you would write this in Java.

I decided to rewrite this function in a more functional style, just to see how my functional chops are coming along. Here’s what I ended up with:

// A more functional implementation
def longestWord(words: Array[String]) =
  (("", -1) /: words.zipWithIndex) {(old, cur) =>
    if (cur._1.length > old._1.length) cur
    else old
  }

First of all, notice how much shorter this function is than the first one. Also, notice that there is only a single expression in the function, so the outer curly braces aren’t necessary. What this expression is doing is calling zipWithIndex on the passed-in array, which results in an array of tuples containing each word and its index. We then call foldLeft using its operator name of /:, with its initial argument being a tuple with an empty string and -1 for an index. What foldLeft does is apply the function value passed to it to pairs of arguments. On the first pass, the arguments are what was passed in and the first element in the array. On the second iteration, the arguments are the result of the first pass and the second element in the array. This then continues through the entire array. What is returned after the final pass will be a tuple that contains the longest word in the array, and its index.

Now, I don’t claim to be a functional master or anything, but I think this is a decent illustration of how the functional style can reduce the lines of code, and the number of mutable variables, while making the code easier to read and understand.

Slides From My Presentation on Operator Overloading In Scala

Last night I spoke at the Atlanta Scala Enthusiats meeting about operator overloading and a little on implicit conversions. I think the talk went well as I got lots of really good questions from the audience, and they laughed at my jokes. This presentation grew out of a blog post I wrote a few months ago entitled Scala Gets Operator Overloading Right; I beefed it up and made some slides and more code samples. Incidentally, if you Google for “scala operator overloading” that blog post is the first result.

For those of you who weren’t there, here are my slides and the code samples that go with them. I wrote these samples against Scala 2.7.7.final. They should work with the latest Scala 2.8, but I haven’t verified this.

And here’s the source code: oopres.zip

I’m Liking Scala’s XML Literals

How many times have you written a class that you needed to save and restore to/from XML? How did you do it? There are libraries that will do this for you, but I don’t know if any of them have taken the lead. Usually, you have to either have external XML templates that will be filled in, or you have to create the XML through code using DOM or JDOM or some other DOM. I think Scala has a better way.

Enter Scala’s XML literals. In Scala, you can embed XML right in the middle of regular Scala code. For example,

val num = 23

val xml = 
	<person>
		<name>Tom</name>
	</person>
	
println(num)
println(xml)

In this code, we assign to the val called num the Int value of 23. We then assign the xml val an XML Node, that begins with “person”. That’s neat, but not terribly useful. Here’s where it gets really interesting.

import scala.xml.{Node, XML}

class Person(val id: Int,
			 val firstName: String,
			 val lastName: String,
			 val title: String) {

	override def toString = {
		id + ", " + firstName + " " + lastName + ": " + title
	}
	
	def toXML = {
		<person>
			<id>{id}</id>
			<firstName>{firstName}</firstName>
			<lastName>{lastName}</lastName>
			<title>{title}</title>
		</person>
	}
}

object Person {
	def fromXML(node: Node): Person = {
		new Person(
			(node  "id").text.toInt,
			(node  "firstName").text,
			(node  "lastName").text,
			(node  "title").text
		)
	}
}

val p = new Person(23, "Tom", "Servo", "Robot")
println(p)

val xml = p.toXML

println(xml)

XML.saveFull("Person.xml", xml, "UTF-8", true, null)

val xmlFromFile = XML.loadFile("Person.xml")

val p1 = Person.fromXML(xmlFromFile)
println(p1)

That’s a longer example, but there’s a lot of interesting stuff in there. Lines 3 – 20 are defining a Scala class, but look at lines 12 – 19. Those lines define a method called toXML that returns the current instance of the class, serialized to XML. I’ve defined the XML structure that I want, and Scala will fill in the values by replacing the stuff between the curly braces with whatever those expressions evaluate to. In this case, it’s just the instance variables, but it could be anything you want, including more XML. So if you have a more complex data structure, it could call the toXML method on its instance variables, and they could handle their own serialization, which would then be placed where it should be in the parent XML document. Nifty, eh?

Now, look at lines 22 – 31. This bit of code is defining a companion object to the Person class, which defines a method to deserialize XML into an instance of class Person. Strictly speaking, I didn’t have to use a companion object; I could have just defined the fromXML method at the top level, but I think this makes the code cleaner. What this method does is take an XML node, tear it apart, and return an instance of class Person, with its instance variables populated.

Lines 33 – 45 exercise this code. The result of running this example look like this

box:~/tmp> scala person.scala 
23, Tom Servo: Robot
<person>
			<id>23</id>
			<firstName>Tom</firstName>
			<lastName>Servo</lastName>
			<title>Robot</title>
		</person>
23, Tom Servo: Robot
box:~/tmp>

Pretty neat, eh? In the exercise code, we created a Person object, printed it out, serialized it to XML, printed that out, then saved the XML to a file. We then read that file back in, and deserialized it into another instance of class Person. Using XML literals for de/serialization is a nice application. I can see using this quite a lot. I like how it keeps the XML format right there with the code it will represent, so that when you change the code, you’ll remember to change the XML, too.

Bogus Search Results From Sys-Con

Updated! Be sure to scroll down for the latest!

I’m writing a blog post dealing with Scala’s XML literal syntax and how to use it for object de/serialization and so I wanted to get a list of existing Java XML de/serialization libraries. I went to Google and searched for “java xml serialization” and here’s the first result that I got

googleres

Notice the date, which is April 23, 2009. That is, ostensibly, the publication date of the article, right? Now check what happens when you click the link and go to read the article

syscon

Again, notice the date. It’s six years ago! So, where did Google get the 2009 date? That’s the date of the latest comment on the article. I’m not sure how they are getting Google to display the latest comment date in such a way that it looks like the publication date, but it looks like they are. I can’t imagine that Google just picked that date from the article when it spidered the site. Am I reading too much into this, or is Sys-Con gaming Google for better placement?

09/24/2009 4:26PM Update: Just for grins, I did my Google search for “java xml serialization” again. The Sys-Con article is still the top hit, but notice what’s missing from the search result

google2

Notice, that there is no longer a date showing. Very interesting.

Scala’s Nice Regex Class

I’m a big fan of regular expressions, because they let you parse text in very concise, and sometimes complicated, ways. Though I agree with jwz about regular expressions in lots of cases, I still use them frequently. Perl was the first language that really let me use regex, way back in 1991. After that I used various implementations in C, Python, Ruby, Java and various other languages. While I was glad that Java 5 finally added regex support, I was disappointed at the implementation. It’s kind of clunky, and because Java doesn’t have regex syntax baked into the language, and no support for “raw” strings, you end up with a regex with twice as many backslashes as necessary.

Last night while reading Programming In Scala, I came upon the discussion of Scala’s regex class. Scala has a raw string, so you have exactly as many backslashes in your pattern as necessary, but each regex you create also defines a Scala extractor, so you can easily bind local variables to groups within the expression.

First, let’s look at the Java code.

[java]Pattern emailParser = Pattern.compile("([\w\d\-\_]+)(\+\d+)?@([\w\d\-\.]+)");

String s = "zippy@scalaisgreat.com";

Matcher m = emailParser.matcher(s);

if (m.matches())
{
String name = m.group(1);
String num = m.group(2);
String domain = m.group(3);

System.out.printf("Name: [%s], Num: [%s], Domain: [%s]n", name, num, domain);
}[/java]

That’s pretty simple, though the double-backslashes really annoy me. Running this code results in the local variables name and domain getting assigned parts of the email address. The variable called num is assigned null, because the email address didn’t contain a plus sign followed by a number. Now, here’s the same program in Scala.

[scala highlight=”1, 5″]val EmailParser = """([wd-_]+)(+d+)?@([wd-.]+)""".r

val s = "zippy@scalaisgreat.com"

val EmailParser(name, num, domain) = s

printf("Name: %s, Domain: %sn", name, domain)[/scala]

In line 1, we are calling the “r” method on a raw string. This method converts the String into a Regex and returns it. We’re then assigning it to a local val called EmailParser. Also notice line 5. In that one line, we are declaring three local vals and assigning them whatever the groups in the regex matched, or null if they didn’t match. Just like with the Java example, num will be null since there was no plus sign followed by a number. If you change the email address in either example to something like “zippy+23@scalaisgreat.com”, then all three variables will be assigned parts of the string.

Do you have to have this level of support to do regex? No. Does it make things a lot nicer? Indeed.

Now, I just discovered that in Scala if the regex doesn’t match at all, then a MatchError is thrown. That’s sort of a bummer, because it means you’ll have to add a try/catch around your code. Still, I like the extractor syntax that binds regex groups to local variables in one step.

You can see some more examples of regex in Scala over here.

Announcing JUnitLaunchFixer Eclipse Plugin

My friend Chris and I both work for the same company. Our product has around 900 JUnit tests, and for some of them, the default heap size that Eclipse runs JUnit tests with has become too small. You can change the heap size on individual JUnit launchers, but if you have lots of tests, this gets tedious. It also means that you have to run a test, see if it runs out of heap, change the setting and then rerun the test. We both thought there had to be a better way.

We were wrong. There is apparently no way within Eclipse to set a default heap size for JUnit launchers, which means you have to set them individually. We don’t like that.

Thus, JUnitLaunchFixer was born. This is an Eclipse plugin that will set the max heap size on a JUnit launcher when it is created to a value you have previously specified in the plugin preferences. The default is 1G, but you can tailor it to whatever you want. The first time you start Eclipse after installing the plugin, you will be presented with a selectable list of launchers that you can update, and a chance to set the default heap size. You can select as many or as few as you want. This will only be run once, unless you ask for it to run again by setting a preference.

JUnitLaunchFixer is released under the Eclipse Public License. It’s an open source project, so if there’s something you’d like to see it do, or you want to help out, let us know.

As of this moment, it’s only been tested with Eclipse Ganymede (3.4) on Windows 7 and Snow Leopard. I will be testing with Europa and Galileo soon.

You can download the source and build it yourself, but we also have an Eclipse update site to easily install it from http://junitlaunchfixer.googlecode.com/svn/trunk/JUnitLaunchFixer-update-site/