Get a REPL On Your Current Java Project

I like to test things out interactively, so I love working with languages that provide a REPL. I’m currently working on a Java project, but Java doesn’t have a REPL. Several languages built on top of the JVM do have them, and these langauges can access the Java classes on their classpaths. Groovy, Scala and Clojure are just three such examples, that I happen to work with.

I got this tip from this response on a Stackoverflow.com post. His tip was for Scala, which looks like this:

scala -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`

The bit between the backticks runs a Maven goal that outputs the jars that your project depends on, and then extracts just the list of fully-qualified jar files to append to the Scala classpath. If you want to use Groovy for your REPL, it would look like this:

groovysh -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`

Similarly, if you’d rather use Clojure, do this:

java -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`:/path/to/clojure.jar clojure.main

(Note: In the examples above, I’ve split the code across multiple lines for clarity. In reality, it can be all on one line.)

99 Scala Problems #28 – I Like My Solution Better

I’ve been working through this list of 99 Scala Problems, which is modeled after this list of 99 Prolog Problems. As I’ve been going through them, I have been comparing my solutions to those provided (obviously). Sometimes, my solution is more or less the same as the “official” solution. Sometimes, theirs is better. In the case of problem 28, I think mine is far easier to read and understand.

Problem 28 has two parts. The first part reads:

a) We suppose that a list contains elements that are lists themselves. The objective is to sort the elements of the list according to their length. E.g. short lists first, longer lists later, or vice versa.

Running the function should look like this:

[scala]
scala> lsort(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res0: List[List[java.lang.String]] = List(List(o), List(d, e), List(d, e), List(m, n), List(a, b, c), List(f, g, h), List(i, j, k, l))
[/scala]

For this part, my solution was almost identical. Here’s what I came up with:

[scala]
def lsort[T](ls: List[List[T]]) = {
ls.sortWith {(a, b) => a.length < b.length}
}
[/scala]

You can see that this function takes a List of type T, and then calls the sortWith method on that list, passing in a function value that sorts the lists based on their length, shortest to longest. The “official” solution was only slightly different:

[scala]
def lsort[A](ls: List[List[A]]): List[List[A]] =
ls sort { _.length < _.length }
[/scala]

Here, they used A instead of T, but that doesn’t affect anything, and they specified the return type, while I left mine inferred. Instead of assigning each bucket of the list to a named variable, as I did, they use the underscore placeholder. The two functions are functionally (get it?) identical, but theirs is a bit shorter because they removed the outer braces, and were able to skip the parameter list, since they used the underscores.

Now, the second part is where I diverge from the official solution. Here’s the problem description:

b) Again, we suppose that a list contains elements that are lists themselves. But this time the objective is to sort the elements according to their length frequency; i.e. in the default, sorting is done ascendingly, lists with rare lengths are placed [first], others with a more frequent length come later.

And the expected call and result is

[scala]
scala> lsortFreq(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res1: List[List[java.lang.String]] = List(List(i, j, k, l), List(o), List(a, b, c), List(f, g, h), List(d, e), List(d, e), List(m, n))[/scala]

First, let’s look at what they presented as the solution. It referenced functions from other files, but I have included them all here for easy of viewing.

[scala]
def lsortFreq[A](ls: List[List[A]]): List[List[A]] = {
val freqs = Map(encode(ls map { _.length } sort { _ < _ }) map { _.swap }:_*)
ls sort { (e1, e2) => freqs(e1.length) < freqs(e2.length) }
}

def encode[T](ls: List[T]): List[(Int, T)] = {
val packedList = pack(ls)
packedList map {list => (list.length, list.head)}
}

def pack[T](ls: List[T]): List[List[T]] = ls match {
case Nil => Nil
case h :: tail => (h :: tail.takeWhile(_ == h)) :: pack(tail.dropWhile(_ == h))
}
[/scala]

I think this is very confusing code. It’s calling the encode function which does run-length encoding of the passed-in thing. It then uses a Map of these encodings to sort the passed-in list. The presence of five underscores in the first line, obscures where those parameters are coming from, and the final underscore is actually part of the _* method of the Array class!

My solution, while being a longer function, is far more readable, in my opinion. And, it’s the same number of lines as the three-method solution. Here it is

[scala]
def lsortFreq[T](ls: List[List[T]]) = {
val lengthMap = scala.collection.mutable.Map[Int, Int]()

for (l <- ls) {
val len = l.length
if (!lengthMap.contains(len)) {
lengthMap(len) = 1
} else {
lengthMap(len) += 1
}
}

ls sortWith {(a, b) => lengthMap(a.length) < lengthMap(b.length)}
}
[/scala]

In my function, I created a mutable Map and then iterate over the list, getting each item’s length, and then keep a running tally of how many items had that length. The map has these lengths as its keys, and the number of items with that length as its values. Get it? I then sort the original list by having each item in the comparison lookup how many items share its length, and use that as the sort criterion.

I have no idea which of these solutions is more efficient. For small problems like this, I doubt there’s any measurable difference. But I do believe that my solution is easier to read and understand. So much so, in fact, that I think someone who is not familiar with Scala would be able to easily figure out what it’s doing. I don’t know that the same can be said of the other solution.

I got criticized for promoting terse code in this article, so this is my attempt at balance. :-)

Note: I did change the inputs to these functions from symbols to strings. The code formatter I use on the blog wasn’t colorizing things properly when there were symbols involved.

Procedural vs. Functional

With the rise of Scala and Clojure, there’s been a lot of talk lately about procedural vs. functional styles of coding. Most developers are accustomed to procedural coding, and functional can be hard to get a handle on. I was working through Programming in Scala (again) this morning, and I came upon this function:

// Procedural implementation
def longestWord(words: Array[String]) = {
  var word = words(0)
  var idx = 0

  for (i <- 1 until words.length)
    if (words(i).length > word.length) {
      word = words(i)
      idx = i
    }

  (word, idx)
}

The purpose of this function is to find the longest word in the passed-in array, and return a tuple with that longest word, and its index in the array. You can see that in this function, we have two vars, one for the current longest word, and another for its index in the array. We then use a for expression to walk the array, reassigning word and idx when we find a longer word. This is very much like how you would write this in Java.

I decided to rewrite this function in a more functional style, just to see how my functional chops are coming along. Here’s what I ended up with:

// A more functional implementation
def longestWord(words: Array[String]) =
  (("", -1) /: words.zipWithIndex) {(old, cur) =>
    if (cur._1.length > old._1.length) cur
    else old
  }

First of all, notice how much shorter this function is than the first one. Also, notice that there is only a single expression in the function, so the outer curly braces aren’t necessary. What this expression is doing is calling zipWithIndex on the passed-in array, which results in an array of tuples containing each word and its index. We then call foldLeft using its operator name of /:, with its initial argument being a tuple with an empty string and -1 for an index. What foldLeft does is apply the function value passed to it to pairs of arguments. On the first pass, the arguments are what was passed in and the first element in the array. On the second iteration, the arguments are the result of the first pass and the second element in the array. This then continues through the entire array. What is returned after the final pass will be a tuple that contains the longest word in the array, and its index.

Now, I don’t claim to be a functional master or anything, but I think this is a decent illustration of how the functional style can reduce the lines of code, and the number of mutable variables, while making the code easier to read and understand.

Unmotivated

As you can probably tell, I haven’t been motivated to write anything in well over a month. I don’t know why, but that’s what’s happened. I didn’t finish my Lenten project, though I am still occasionally working on it. I did just earn my orange belt in Tae Kwon Do, so that’s cool. And I’m going to be 40 on Tuesday, which is not cool, but it’s sort of unavoidable. I had a lovely birthday dinner with my entire family on Saturday night at Stoney River, which is the most-bestest steakhouse in the world. I loves me some garlic mashed potatoes.

I have an idea for a post about direct vs. indirect quotations in the Greek NT text, but I haven’t fully scoped it out yet. Maybe it will be coming soon. Maybe not.

I’m doing a lot more playing around with Scala, though I am still a lightweight. Sometimes when I’m reading other people’s Scala code, I feel very uneasy about my skills. Scala is beautiful and elegant, but sometimes the terseness of it make it a little overwhelming.

I’m testing out Mercurial for version control. I also tried Git, but based on what I’ve read, and what I’ve experienced, I think Mercurial is the better choice, at least right now. Git is the new, sexy thing, but Mercurial is better established, and the tooling is far better than for Git. Git is gaining ground, but I’m going to stick with Hg for now. I just bought “Mercurial: The Definitive Guide” from O’Reilly, and I’m reading it now. My VP has started asking questions about DVCS and should we switch from SVN and such, so this experiment will be useful shortly.

I’m playing a lot of backgammon. I taught my mother-in-law to play a few weeks ago, and my mother last night. Both picked it up quickly. I’m reading “The Backgammon Book” by Jacoby and Crawford, and trying to commit all the charts and probabilities to memory and get some real strategy going. My game is improving, but I’m still easily beatable.

Oh, and I”ve lost 22 pounds since February 28. Yay, me!

So, there you have it. All two of you now know why there hasn’t been anything new here for a while.

Slides From My Presentation on Operator Overloading In Scala

Last night I spoke at the Atlanta Scala Enthusiats meeting about operator overloading and a little on implicit conversions. I think the talk went well as I got lots of really good questions from the audience, and they laughed at my jokes. This presentation grew out of a blog post I wrote a few months ago entitled Scala Gets Operator Overloading Right; I beefed it up and made some slides and more code samples. Incidentally, if you Google for “scala operator overloading” that blog post is the first result.

For those of you who weren’t there, here are my slides and the code samples that go with them. I wrote these samples against Scala 2.7.7.final. They should work with the latest Scala 2.8, but I haven’t verified this.

And here’s the source code: oopres.zip

What the Heck Is a Tuple, Anyway?

Yesterday I was talking with a friend about Scala and the subject of tuples came up. We both had a bit of a laugh that neither of us was sure how to pronounce it, though we both leaned toward TUH-ple instead of TOO-ple. Anyway, the utility of tuples in Scala was not immediately apparent to him, so I thought I’d take a whack at explaining it here.

A Tuple in Scala is an immutable container used for storing two or more objects, possibly of different types. While a List or Array can only store objects that all have the same type, Tuples can store objects of any type. The most common use of tuples is when you have a method that needs to return more than one value, but creating a class for that return value is more trouble than it’s worth. It’s true that for same-type objects you could return a List, and for different-type objects you could return a List[Any], but both of these have downsides, which we’ll discuss.

Let’s look at a very contrived example. Let’s say you created a function that takes a string and returns the starting index of the first numbers if finds and the numbers themselves. That code might look like this

def reFind(str: String) = {
	val re = """(d+)""".r

	val m = re findFirstMatchIn str

	m match {
		case Some(m) => (m.start, str.substring(m.start, m.end))
		case None => (0, "")
	}
}

(I’ve removed any error checking for brevity.) You can see here that we’re creating a regular expression that looks for one or more numbers grouped together. We then match that against the passed-in string. The matching method returns a Some[Match], so we pattern match against that to see if we actually got a match. If we did, we create a tuple with the starting index of the match, and the match itself, and return it. If not, we return a tuple with 0 for the starting index and an empty string.

Calling this function looks like this

scala> val t = reFind("foo 123 bar")
t: (Int, java.lang.String) = (4,123)

You can see that what was returned is something with type (Int, java.lang.String); that’s actually an instance of Scala’s Tuple2 class. (There’s a synonym for Tuple2, called Pair.)

Now that we have this tuple, what do we do with it? If you want to access the values it contains, you do it in a way that might seem a bit strange at first. To get at the elements, you could do this

scala> val i = t._1
i: Int = 4

scala> val m = t._2
m: java.lang.String = 123

There are two things to point out here. First, unlike Lists and Arrays, you don’t use the () notation. You use a method consisting of an underscore and the index of the part you want. Second, unlike Lists and Arrays, tuples are 1-based instead of 0-based. (According to Programming In Scala, this is a nod to Haskell and ML.) Also notice the types of the vals you are assigning. That’s one of the benefits of using a Tuple instead of something like List[Any]; you still get compile-time type safety. Had you instead written the function like this

def reFind(str: String) = {
	val re = """(d+)""".r

	val m = re findFirstMatchIn str

	m match {
		case Some(m) => List[Any](m.start, str.substring(m.start, m.end))
		case None => List[Any](0, "")
	}
}

and called it, look what happens when you try to store the Int index in a local variable

scala> val l = reFind("foo 123 bar")
l: List[Any] = List(4, 123)

scala> val i: Int = l(0)
<console>:10: error: type mismatch;
 found   : Any
 required: Int
       val i: Int = l(0)
                    ^

You would get a similar error trying to assign the String element to a local String val. That’s the major downside to using a List[Any]. (In the first example I used Scala’s type inference to set the types of the local variables; this time I wanted to be explicit to show the failure.)

As I mentioned earlier, you could define a class just to handle the return values of this function. There is nothing wrong with that solution, and some will find it superior to using a tuple, because you can assign meaningful names to the elements. You could define it like this

class ReResult(val index: Int, val part: String)

def reFind(str: String) = {
	val re = """(d+)""".r

	val m = re findFirstMatchIn str

	m match {
		case Some(m) => new ReResult(m.start, str.substring(m.start, m.end))
		case None => new ReResult(0, "")
	}
}

and call it like this

scala> val l = reFind("foo 123 bar")
l: ReResult = ReResult@57c52e72

scala> val i: Int = l.index
i: Int = 4

scala> val m: String = l.part
m: String = 123

If you think this is more maintainable, then by all means, use it. If you just want to easily return more than one value from a function, then consider using a tuple.

Another point on tuples is that you can assign all the elements of a tuple to local variables in a single step, rather than using multiple calls. So this is equivalent to all the assignments from the earlier examples

scala> val (i: Int, m: String) = l
i: Int = 4
m: String = 123

Depending on what you’re doing, this could be a useful way to get at the elements.

And one more thing. There are tuple classes that range from two elements all the way up to twenty-two. The classes are named Tuple2, Tuple3 … Tuple22. The () notation for creating tuples applies all the way up to twenty-two arguments, so you rarely need to actually use the class names. For example,

scala> val t = (23, "foo", 18.0)
t: (Int, java.lang.String, Double) = (23,foo,18.0)

scala> t.getClass
res31: java.lang.Class[_] = class scala.Tuple3

scala> val t1 = ('a', "quick", 23, "year-old", """foxy""".r, List(1, 2, 3))
t1: (Char, java.lang.String, Int, java.lang.String, scala.util.matching.Regex, List[Int]) = (a,quick,23,year-old,foxy,List(1, 2, 3))

scala> t1.getClass
res32: java.lang.Class[_] = class scala.Tuple6

I’m not going to provide an example of creating a Tuple22; that is left as an exercise. :-) I would argue that if you need more than three elements, you really should define a class to hold them. I think that beyond three elements it gets difficult to keep them straight. Tuples are great for holding two or three pieces of information, but don’t go crazy with them.

2 Solutions To Project Euler Problem #1

In an effort to not go a whole month without blogging, and in the interest of posting some code samples, I give you two solutions to Project Euler: Problem #1. If you’ve never heard of it, Project Euler (pronounced “oiler” after the Swiss mathematician Leonhard Paul Euler) is a set of increasingly difficult programming challenges. Participants can write their programs in any language and the only goal is to solve the problems and learn something. There are no prizes and you don’t have to show your work.

I had looked at this project years ago, and I swear I thought I had already solved some of them, but maybe I only thought about doing it. Anyway, I have two solutions to the first problem; one in Groovy and the other in Scala. Here, then, is how Project Euler describes the problem

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

So, the goal is to take a series of numbers from 0 to 1000, exclusive, find all the numbers divisible by 3 or 5 and add them together. Here’s the Groovy solution.

def subList = (0..<1000).findAll {it % 3 == 0 || it % 5 == 0} 

def sum = subList.inject(0) {i, sum -> sum + i}

println "Sum = ${sum}"

This is a very simple program, and I could have written it as a one-line program. I broke it up into a few lines for clarity. As you can see, the first line creates an exclusive range from 0 to 1000. It then calls the findAll method on that range, passing in a closure that will return true if the passed-in digit from the range, called “it” here, is evenly divisible by 3 or 5. The result of findAll is another collection, containing only those values that passed the divisibility test. We then take that list, passing 0 into the inject method, which will neatly sum the values up and return that value. Easy peasy.

Now here’s the Scala version. You’ll notice it is very similar to the Groovy solution.

val subList = for {
    i <- List.range(0, 1000)
    if i % 3 == 0 || i % 5 == 0
} yield i 

val sum = subList.foldLeft(0) {(i, j) => i + j}

println("Sum = " + sum)

I used a sequence comprehension to generate the sublist here. The bit beginning with “for” generates an exclusive range from 0 to 1000, which is then iterated over, assigning each value to “i”. Then, if it is divisible by 3 or 5, it is yielded up by the comprehension. The result is a collection of just those numbers that we want, assigned to the val called subList. We then call foldLeft on that sublist, doing exactly what we did in the Groovy solution. Again, pretty simple.

Now, I could have solved this one in an almost identical fashion to the Groovy solution by using the filter method of lists, but I wanted to solve it first using a list comprehension. Here is the second solution

val subList = List.range(0, 1000) filter {i => i % 3 == 0 || i % 5 == 0}

val sum = subList.foldLeft(0) {(i, j) => i + j}

println("Sum = " + sum)

I timed the solutions and all three finished in just over a second. The second Scala solution seemed ever-so-slightly faster than the other two.

As I get time, I will work on additional problems from the site and post the answers here. I don’t know that I’ll always do solutions in two languages, but I might.