LoLClojure – Chapter 3, Continued

I didn’t intend to wait a month between installments, but here we are. When we left off, we were discussing a couple of implementation os exponentiation, and how to print things out. We made it up to page 35.

Next, the book discusses the main Lisp data structure: the list. It discusses how Lisp assumes that in any list, the first item in that list is a command of some sort, and to change that behavior, you place a single quote, ', before the list. Clojure has the same expectation, so that if you enter this at a REPL:

(foo bar baz)

you will get the following error,

CompilerException java.lang.RuntimeException: Unable to resolve symbol: foo in this context...

indicating that it tried to execute that list as a function call, but failed to find a function called “foo”. Just as in Lisp, you can quote the list by prefixing it with a single quote. Thus, this

'(foo bar baz)

is now treated as data, instead of a function call. Similarly, this is a function call

(expt 2 3)

is a call to the expt function, while this

'(expt 2 3)

is a list with three elements.

Barski then goes into a length discussion about the list and its structure, specifically the cons cell. In Lisp, a cons cell is a pair of items, and its list is implemented as a linked list, with the second part of each cons cell linking to the next cons cell in the list. Clojure does not have a cons cell, and so does not implement its lists in terms of them. Lisp uses cons cells individually as simple pairs, too, and you can simulate this use by using a 2-element vector, like so

(def pair [:a :b])

Beginning on page 38, Barski starts discussing the various functions that operate on lists. I will provide the Clojure equivalent functions, where possible.

The cons Function

In Lisp, you create cons cells using the cons function. The result is a cons cell with the two arguments set as the left and right portions of the cell, or just a left-side, if the right is nil. You can simulate this, using the vector function

(vector 'chicken 'cat)

or as a literal vector

['chicken 'cat]

Clojure does have a cons function, but it functions somewhat differently than its Lisp counterpart. It prepends its first argument onto its second argument, which must be a list-like thing, returning the result as a list. So this in Lisp

(cons 'chicken 'cat)

can’t be done using Clojure’s cons function, since the second argument is not a list-like thing. If you try, you will get this error

IllegalArgumentException Don't know how to create ISeq from: clojure.lang.Symbol clojure.lang.RT.seqFrom (RT.java:505)

If you need to create a sequence of a chicken and a cat, use one of these:

(vector 'chicken 'cat)
['chicken 'cat]
'(chicken cat)

Obviously, the first two create vectors, while the last one create a list.

In Lisp, the empty list and nil are equivalent, but not so in Clojure. However, these two examples of consing a value and and empty list, and a value and nil, behave essentially the same, returning a list of just the first element.

(cons 'chicken 'nil) ; (chicken)
(cons 'chicken ())   ; (chicken)

The next three examples function effectively the same, though remember that what you are getting back is not a linked list of cons cells, as you would in Lisp.

(cons 'pork '(beef chicken))           ; (pork beef chicken)
(cons 'beef (cons 'chicken '()))       ; (beef chicken)
(cons 'pork 
      (cons 'beef (cons 'chicken ()))) ; (pork beef chicken)

The car and cdr Functions

Lisp has the basic functions car and cdr for accessing the first element of a lisp, and the remaining elements of a list, respectively. These names are directly related to the original hardware on which Lisp ran. Clojure does not have functions with these names, but it does provide its own functions that give the same results.

Instead of car, Clojure gives us first, which actually is a better name for what it does. It works like this

(first '(pork beef chicken)) ; pork

Instead of cdr, Clojure actually has two functions, next and rest. With a listy thing of 2+ elements, both these function behave the same, returning everything but the first element

(rest '(pork beef chicken)) ; (beef chicken)
(next '(port beef chicken)) ; (beef chicken)

They differ, however, on what happens if there’s nothing after the first element, or the entire list is the empty list. next will return nil in both these cases, while rest will return an empty list. As I said, in Lisp, these are the same thing, but in Clojure, they are very different. The empty list is truthy, while nil is not

(when nil :truthy) ; nil
(when '() :truthy) ; :truthy

The c*r Functions

Lisp defines combinations of car and cdr to allow you to get the second item, the third, the second item from the remainder of the remainder of a list, etc. Most CL implementations provide these functions up to four levels deep. Here’s a partial list to illustrate

(cadr '(pork beef chicken))          ; BEEF
(cdar '((peas carrots tomatoes)
        (pork beef chicken)))        ; (CARROTS TOMATOES)
(cddr '((peas carrots tomatoes)
        (pork beef chicken) duck))   ; (DUCK)
(caddr '((peas carrots tomatoes)
         (pork beef chicken) duck))  ; DUCK
(cddar '((peas carrots tomatoes)
         (pork beef chicken) duck))  ; (TOMATOES)
(cadadr '((peas carrots tomatoes)
          (pork beef chicken) duck)) ; BEEF

And let’s be honest, keeping all those ‘a’s and ‘d’s straight can be pretty confusing (at least, it is to me).

Since Clojure doesn’t have car and cdr, it also lacks these functions. It does have one analog function, and that is for the CL function cadr, which provides the car of the cdr of a list, also known as the second item. Thus, Clojure has second that does the same thing,

(second '(pork beef chicken)) ; beef

but that’s it for built-ins. You can roll your own versions of these CL functions, but from what I’ve read, this is considered a code smell. Even though I don’t think you should try to implement these functions, let me show you how you might do it, if you wanted to.

Instead of explaining each one, and its possible implementation in Clojure, I’m just going to include the code for all of them in one, big blob.

(defn cdar
  [lst]
  (rest (first lst)))

(cdar '((peas carrots tomatoes)
        (pork beef chicken)))       ; (carrots tomatoes)

(defn cddr
  [lst]
  (rest (rest lst)))

(cddr '((peas carrots tomatoes)
        (pork beef chicken) duck))  ; (duck)

(defn caddr
  [lst]
  (first (rest (rest lst))))

(caddr '((peas carrots tomatoes)
         (pork beef chicken) duck)) ; duck

(defn cddar
  [lst]
  (rest (rest (first lst))))

(cddar '((peas carrots tomatoes)
         (pork beef chicken) duck)) ; (tomatoes)

(defn cadadr
  [lst]
  (first (rest (first (rest lst)))))

(cadadr '((peas carrots tomatoes)
          (pork beef chicken) duck)) ; beef

So, if you shouldn’t create analogous functions, how do you get at the elements you need? One way would be to use Clojure’s destructuring. I’m not going to go into a full explanation of how destructuring works, but suffice it to say that it’s a way to assign elements of a listy thing to individual locals.

Here’s how you could achieve the same results as cdar

(let [[fst] '((peas carrots tomatoes) (pork beef chicken))]
  (rest fst)) ; (carrots tomatoes) ; instead of cdar

As you can see, this code assigns the first element '(peas carrots tomatoes) to the local fst. It then calls rest on that, which returns the remainder of that list.

Here, then, is a code dump of destructuring implementation of the rest of the c*r functions that I listed earlier.

(let [[fst] '((peas carrots tomatoes)
              (pork beef chicken))]
  (rest fst)) ; (carrots tomatoes) ; instead of cdar

(let [[_ & rst] '((peas carrots tomatoes)
                  (pork beef chicken) duck)]
  (rest rst)) ; (duck) ; instead of cddr

(let [[_ & rst] '((peas carrots tomatoes)
                  (pork beef chicken) duck)
      tail (rest rst)]
  (first tail)) ; duck ; instead of caddr

(let [[fst] '((peas carrots tomatoes)
              (pork beef chicken) duck)
      tail (rest fst)]
  (rest tail)) ; (tomatoes) ; instead of cddar

(let [[_ & rst] '((peas carrots tomatoes)
                  (pork beef chicken) duck)
      fst (first rst)
      tail (rest fst)]
  (first tail)) ; beef ; instead of cadadr

That’s It, For Now

OK, that brings us to the end of chapter 3. I think this may be the longest post so far. I will move on to chapter 4, and will be back soon with the next installment.

As always, full code is available at https://github.com/joeygibson/lolclojure.

LoLClojure – Chapter 3

Chapter three of Land of Lisp is all about Lisp syntax. This post will be sort of scattered as far as content goes, since the chapter covered a lot. Many things are the same in Clojure, but there are some serious differences. The first is how to define a function.

Defining Functions

In Lisp, you use defun, but in Clojure, it’s defn. Here’s a square function in Lisp.

(defun square (n) 
  (* n n))

And here’s the same function in Clojure. Notice that the function arguments are enclosed in square brackets (it’s a vector), instead of parens.

(defn square
  "Returns the square of the passed-in number"
  [n]
  (* n n))

That string is known as the docstring. It stays with the function, and is available in the REPL by running the doc function, like this (doc square). Lisp also supports docstrings in functions, but it comes after the argument list, instead of before. While docstrings are optional, I highly encourage you to include them. They can span multiple lines, and since they stay with your function, they are useful from the REPL.

Equality

In Lisp, there are may functions for determining equality, and you have to choose the right one for any given circumstance. Among these are eq, equal, equalp, and a few others. In Clojure, there’s just =. If you’re coming from Java, you know = by itself is assignment, not an equality check. For that, you have to use ==, but even that only computes reference equality, and is not always what you want. In Clojure, = does everything you want, in every circumstance. It is your friend.

Exponentiation

Starting on page 34, there are a few examples using the expt function, which raises its first argument to the power of its second. This is a built-in function in CL, but Clojure doesn’t have one. You could use Math.pow from Java, but this only works with doubles, and once the numbers get really large, it switches to scientific notation.

(Math/pow 2 2)       ; 4.0
(Math/pow 2 3)       ; 8.0
(Math/pow 2.0 3)     ; 8.0
(Math/pow 2 10)      ; 1024.0
(Math/pow 53N 53)    ; 2.4356848165022712E91

(In case you haven’t seen it, appending an N to a number literal causes the number to be of type clojure.lang.BigInt. Appending an M makes it a java.math.BigDecimal.)

You can write your own exponentiation function that gives better results than using the one from Java. Here are two different ways to write it. Both versions are tail-recursive, which means they won’t exhaust the stack, but the first uses a nested function, while the second is recursive on a loop. Here’s the nested function version

(defn expt
  "Raise x to the nth power"
  [x n]
  (letfn [(rexpt [n acc]
                 (if (zero? n)
                   acc
                   (recur (dec n) (* acc x))))]
    (rexpt n 1)))

Notice the letfn that contains a local function called rexpt. This function does all the work, and is called as the last line of the main function. It takes a parameter to be used as an accumulator, and this is returned once the exponent is decremented to zero. This nested function is also a closure, because the value of x is referenced directly. We don’t need to change it like we do n, so we just use its name.

Now, here is the version that uses loop. While CL has a loop macro, Clojure’s loop is completely different. All it does is provide a recursion point. This means that when you use the recur function later, execution will jump back to where the loop call is, instead of back to the beginning of the function. The locals declared in the loop’s vector are rebound with the values specified by the recur call. I think this version is easier to understand than the first one.

(defn expt
  "Raise x to the nth power"
  [x n]
  (loop [n n
         acc 1]
    (if (zero? n)
      acc
      (recur (dec n) (* acc x)))))

Notice that the code inside the loop is identical to that in the rexpt local function from the previous example. It’s just not wrapped inside another function. Also of note is in the let we assign n to n. This is a common technique, and will result in a local called n being assigned the value of the passed-in n. The local n can then be decremented with each recursion, without affecting the outer n.

Both of these function provide identical results.

(expt 2 2)       ; 4
(expt 2 3)       ; 8
(expt 2.0 3)     ; 8.0
(expt 2 10)      ; 1024
(expt 53N 53)    ; 24356848165022712132477606520104725518533453128685640844505130879576720609150223301256150373

Notice that passing in an integer results in an integer. Passing in a double results in a double. And passing in a BigInt results in a very large number (Hint: scroll horizontally… it goes on for a while).

Printing Things

CL uses (princ), (prin1), (print), etc., to output things to the console. In Clojure, you use (print) and (println).

(print "He yelled \"Stop that thief!\" from the busy street.") ; no newline
(println "He yelled \"Stop that thief!\" from the busy street.") ; newline

(print) outputs the string, but does not append a newline. (println) appends a newline, as you would expect, given its name.

To Be Continued…

This has gotten very long, so I will stop now, and continue in another post. Stay tuned for Chapter Three, Part Two.

LoLClojure – Locals

Just like in Lisp, Clojure uses let to define locals. The only real difference is that Clojure uses a vector of names and their bindings, whereas Lisp uses a nested list.

This Lisp code

(let ((a 5)
	  (b 6))
  (+ a b))

looks like this in Clojure

(let [a 5
      b 6]
  (+ a b))

I think the Clojure way is a little easier to read.

The biggest difference between the two is when it comes to local functions. CL has flet for defining local functions, and labels for defining local functions that need to be able to call each other (or call themselves recursively). Here’s an example of each

;; A single local function
(flet ((f (n)
		  (+ n 10)))
  (f 5))

;; Two local functions that don't need to call each other
(flet ((f (n)
		  (+ n 10))
	   (g (n)
		  (- n 3)))
  (g (f 5)))

Here’s the equivalent code in Clojure. Note how the square brackets make things stand out a little bit more.

(letfn [(f [n]
           (+ n 10))]
  (f 5))

(letfn [(f [n]
           (+ n 10))
        (g [n]
           (- n 3))]
  (g (f 5)))

If the functions need to reference each other, in CL you have to use labels, instead of flet. Here’s how that looks (the only difference is the form used; the arguments remain the same)

(labels ((a (n)
			(+ n 5))
		 (b (n)
			(+ (a n) 6)))
  (b 10))

In Clojure, you don’t need to use anything other than letfn, because it already supports the recursive nature that labels provides

(letfn [(a [n]
           (+ n 5))
        (b [n]
           (+ (a n) 6))]
  (b 10))

Finally, if you have local functions and other local bindings you need to establish, you can use a let, but no recursion is supported. This is sort of like CL’s flet but you can also use it for binding locals that are not functions

(let [a (fn [n]
          (+ n 5))
      b (fn [n]
          (+ (a n) 6))
      c 10]
  (b c))

I think the way Clojure uses square brackets in certain places that CL uses parentheses makes the code easier to read, overall.

LoLClojure – Land of Lisp In Clojure

I read Conrad Barski’s excellent book Land of Lisp a couple of years ago, and worked through all the examples using CLisp, but I thought it might be fun to go through it again, but use Clojure instead. Other people have done it already, but what’s one more, eh?

As I work through the book, I will be putting all the code on Github at https://github.com/joeygibson/lolclojure

So, the first example is for a program to guess a number you are thinking of. In Lisp, defparameter allows you to rebind values, but Clojure’s def is immutable. Using a ref gets around this, though it is a bit clunky (since refs are intended for multi-threaded access.) The code is not great, and you wouldn’t write a Clojure program like this (or a Lisp program, really); it’s just to get the discussion moving. Better code is coming.

Anyway, here’s the number-guessing program in non-idiomatic Clojure. To run it, load it into a REPL, then execute (guess-my-number). If you are so enraptured with the game that you want to play it again, execute (start-over) and then (guess-my-number).

(ns lolclojure.guess)

;; Using refs for these is overkill, but the original
;; used mutable global variables.
(def small (ref 1))
(def big (ref 100))

(defn guess-my-number
  "This is, effectively, a binary search between the two extremes."
  []
  (Math/round (Math/floor (double (/ (+ @small @big) 2)))))

(defn smaller
  "The guess was too high, so lower it."
  []
  (dosync
   (ref-set big (dec (guess-my-number)))))

(defn bigger
  "The guess was too low, so raise it."
  []
  (dosync
   (ref-set small (inc (guess-my-number)))))

(defn start-over
  "Reset everything and prepare to start over."
  []
  (dosync
   (ref-set small 1)
   (ref-set big 100)))

Get a REPL On Your Current Java Project

I like to test things out interactively, so I love working with languages that provide a REPL. I’m currently working on a Java project, but Java doesn’t have a REPL. Several languages built on top of the JVM do have them, and these langauges can access the Java classes on their classpaths. Groovy, Scala and Clojure are just three such examples, that I happen to work with.

I got this tip from this response on a Stackoverflow.com post. His tip was for Scala, which looks like this:

scala -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`

The bit between the backticks runs a Maven goal that outputs the jars that your project depends on, and then extracts just the list of fully-qualified jar files to append to the Scala classpath. If you want to use Groovy for your REPL, it would look like this:

groovysh -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`

Similarly, if you’d rather use Clojure, do this:

java -cp target/classes:`/usr/bin/mvn \
dependency:build-classpath \
| grep "^[^\[]"`:/path/to/clojure.jar clojure.main

(Note: In the examples above, I’ve split the code across multiple lines for clarity. In reality, it can be all on one line.)

Once More, This Time With Clojure

If you happened to read my post from the other day entitled My New “Top Artists Last 7 Days” Widget, you know that I went through three iterations of getting it going. The final solution, written in Ruby worked well. Until bands like Motörhead, Mötley Crüe and Einstürzende Neubauten showed up in the list. At that point, the HTML parsing library I was using would barf, and processing would stop, leaving the list showing on the blog in an incomplete state. It wasn’t the library’s fault; apparently Ruby still has problems dealing with non-ASCII characters. I did everything I thought I needed to do to tell Ruby that it would be dealing with UTF-8 encoding, but it just kept right on barfing.

I was left with only two choices: stop listening to any band with an umlaut in the name (and God help me if any of my Scandinavian bands popped up, with the Ø or å characters), or rewrite the stupid program, again, in a language that I knew could easily deal with UTF-8.

Since I’ve been working in Clojure a lot lately, it seemed lika the logical choice. I spent about an hour working on it last night, and I ended up with a working program and a bit more Clojure experience. Here’s the program for your edification, with a description to follow:

(ns lastfmfetch.core
(:gen-class))

(require '[clj-http.client :as client])
(import '(java.io PrintStream)
'(org.htmlcleaner HtmlCleaner))

(defn get-artist-and-playcount [cell]
(let [title (.getAttributeByName cell "title")
[match artist playcount] (re-matches #"^(.+), played ([wd]+)s*S*$" title)
playcountStr (if (= playcount "once") "1" playcount)]
[artist playcountStr]))

(defn get-url [cell]
(let [links (.getElementsByName cell "a" true)
a (first links)
href (.getAttributeByName a "href")]
(str "http://last.fm" href)))

(defn fetch-data [filename]
(let [response (client/get "http://www.last.fm/user/joeyGibson/charts?rangetype=week&subtype=artists")
cleaner (HtmlCleaner.)]
(if (= (:status response) 200)
(with-open [out (PrintStream. filename "UTF-8")]
(.println out "<html><head><meta charset="UTF-8"/></head><body><ol>")
(doto (.getProperties cleaner)
(.setOmitComments true)
(.setPruneTags "script,style"))
(when-let [node (.clean cleaner (:body response))]
(let [subjectCells (take 5 (.getElementsByAttValue node "class" "subjectCell" true true))]
(doseq [cell subjectCells]
(let [[artist playcount] (get-artist-and-playcount cell)
url (get-url cell)]
(.println out (str "<li><a href='" url "'>" artist "</a>, Plays: " playcount "</li>"))))))
(.println out "</ol></body></html>")))))

;; Main
(defn -main [& args]
(if (< (count args) 1)
(println "Usage: lastfmfetch <output_file>")
(fetch-data (first args))))

I ended up using a library called clj-http to handle the fetching of the URL. It’s a Clojure wrapper for the Apache HTTP Commons library, and was really easy to use. I’m using Leningen, by the way, so including clj-http was just a matter of including a line in the project.clj file. I also used a Java library called HTMLCleaner, that fixes broken HTML and makes it available as a DOM. Since it is also in Maven Central, it was easy to include by adding another line to the project file.

(defproject lastfmfetch "1.0.0-SNAPSHOT"
:description "Fetch chart data from Last.fm"
:dependencies [[org.clojure/clojure "1.3.0"]
[net.sourceforge.htmlcleaner/htmlcleaner "2.2"]
[ clj-http "0.2.6"]]
:main lastfmfetch.core)

The -main function begins on line 38, but all it really does is check that there is a single command-line argument, and exits with a usage message if there is not. It then calls the fetch-data function, which begins on line 20.

On line 21, we declare two locals; one that will contain the results of fetching the web page, and one that is the HTML cleaner. If the fetch of the URL was successful, the status code will be the standard HTTP 200. If we got that, we then open a PrintStream on the filename given, specifying that it should be encoded with UTF-8. (I’ve been working with Java for a very long time, and I always assumed that since Java strings are Unicode, files created with Java would default to UTF-8. That is not the case. That’s why there’s a second argument when creating the PrintStream, and why I’m not using a PrintWriter.) We then print the first part of the output HTML file, set a couple of options to HTML Cleaner that cause it to strip comments, style and script sections from the HTML, and then start doing the real work.

On line 29, we declare a local called node that will contain the output of HTML Cleaner if it successfully parsed and cleaned the HTML. That’s what when-let does; it assigns the local as long as the function returns something truthy and then executes its body. If that function doesn’t return something truthy, the rest of the code is skipped. We then take the first five elements from the HTML that have an attribute called “class” with a value of “subjectCell”. These are table cells. We then loop over them, extracting the artist and playcount value, and the URL. We do these things in two separate functions.

The function called get-artist-and-playcount, starting on line 8, takes the table cell as input. It then gets the attribute called “title” and uses a regular expression to pull out the artist and playcount values. If the playcount is the word “once,” it converts it to a 1, so all the values are numeric. It then returns the two values as a vector.

The function called get-url, starting on line 14, also takes the table cell as input. It then gets all the “a” elements from the cell (there’s only one), and then gets the “href” attribute’s value, which is the URL.

Back at line 34, we take the three values we extracted with the two support functions and concatenates them together into HTML that will be a single line in an ordered list. We then output all the necessary closing tags to make the HTML well-formed, and we’re done.

While the Clojure code is a bit more dense than the Ruby code, it’s actually four lines shorter. And it handles Unicode characters, which makes me happy.

JUnitLaunchFixer 1.0.4 Released

I’ve just released a new version of my Eclipse plugin called JUnitLaunchFixer. If you’ve never heard of it, it lets you set the default heap size for new debug launchers in Eclipse. You can read more about it in the original announcement. This new version fixes a bug that caused it not to accept new parameters in one case, and it also adds the ability to set a default for the maximum permgen size, which is something I’d been wanting to add for a while.

If you already have it installed, go to Help>Check for Updates in Eclipse, and you’ll see it. If you don’t have it installed, the update site is here:

http://junitlaunchfixer.googlecode.com/svn/trunk/JUnitLaunchFixer-update-site/

 

99 Scala Problems #28 – I Like My Solution Better

I’ve been working through this list of 99 Scala Problems, which is modeled after this list of 99 Prolog Problems. As I’ve been going through them, I have been comparing my solutions to those provided (obviously). Sometimes, my solution is more or less the same as the “official” solution. Sometimes, theirs is better. In the case of problem 28, I think mine is far easier to read and understand.

Problem 28 has two parts. The first part reads:

a) We suppose that a list contains elements that are lists themselves. The objective is to sort the elements of the list according to their length. E.g. short lists first, longer lists later, or vice versa.

Running the function should look like this:

[scala]
scala> lsort(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res0: List[List[java.lang.String]] = List(List(o), List(d, e), List(d, e), List(m, n), List(a, b, c), List(f, g, h), List(i, j, k, l))
[/scala]

For this part, my solution was almost identical. Here’s what I came up with:

[scala]
def lsort[T](ls: List[List[T]]) = {
ls.sortWith {(a, b) => a.length < b.length}
}
[/scala]

You can see that this function takes a List of type T, and then calls the sortWith method on that list, passing in a function value that sorts the lists based on their length, shortest to longest. The “official” solution was only slightly different:

[scala]
def lsort[A](ls: List[List[A]]): List[List[A]] =
ls sort { _.length < _.length }
[/scala]

Here, they used A instead of T, but that doesn’t affect anything, and they specified the return type, while I left mine inferred. Instead of assigning each bucket of the list to a named variable, as I did, they use the underscore placeholder. The two functions are functionally (get it?) identical, but theirs is a bit shorter because they removed the outer braces, and were able to skip the parameter list, since they used the underscores.

Now, the second part is where I diverge from the official solution. Here’s the problem description:

b) Again, we suppose that a list contains elements that are lists themselves. But this time the objective is to sort the elements according to their length frequency; i.e. in the default, sorting is done ascendingly, lists with rare lengths are placed [first], others with a more frequent length come later.

And the expected call and result is

[scala]
scala> lsortFreq(List(List("a", "b", "c"), List("d", "e"), List("f", "g", "h"), List("d", "e"), List("i", "j", "k", "l"), List("m", "n"), List("o")))

res1: List[List[java.lang.String]] = List(List(i, j, k, l), List(o), List(a, b, c), List(f, g, h), List(d, e), List(d, e), List(m, n))[/scala]

First, let’s look at what they presented as the solution. It referenced functions from other files, but I have included them all here for easy of viewing.

[scala]
def lsortFreq[A](ls: List[List[A]]): List[List[A]] = {
val freqs = Map(encode(ls map { _.length } sort { _ < _ }) map { _.swap }:_*)
ls sort { (e1, e2) => freqs(e1.length) < freqs(e2.length) }
}

def encode[T](ls: List[T]): List[(Int, T)] = {
val packedList = pack(ls)
packedList map {list => (list.length, list.head)}
}

def pack[T](ls: List[T]): List[List[T]] = ls match {
case Nil => Nil
case h :: tail => (h :: tail.takeWhile(_ == h)) :: pack(tail.dropWhile(_ == h))
}
[/scala]

I think this is very confusing code. It’s calling the encode function which does run-length encoding of the passed-in thing. It then uses a Map of these encodings to sort the passed-in list. The presence of five underscores in the first line, obscures where those parameters are coming from, and the final underscore is actually part of the _* method of the Array class!

My solution, while being a longer function, is far more readable, in my opinion. And, it’s the same number of lines as the three-method solution. Here it is

[scala]
def lsortFreq[T](ls: List[List[T]]) = {
val lengthMap = scala.collection.mutable.Map[Int, Int]()

for (l <- ls) {
val len = l.length
if (!lengthMap.contains(len)) {
lengthMap(len) = 1
} else {
lengthMap(len) += 1
}
}

ls sortWith {(a, b) => lengthMap(a.length) < lengthMap(b.length)}
}
[/scala]

In my function, I created a mutable Map and then iterate over the list, getting each item’s length, and then keep a running tally of how many items had that length. The map has these lengths as its keys, and the number of items with that length as its values. Get it? I then sort the original list by having each item in the comparison lookup how many items share its length, and use that as the sort criterion.

I have no idea which of these solutions is more efficient. For small problems like this, I doubt there’s any measurable difference. But I do believe that my solution is easier to read and understand. So much so, in fact, that I think someone who is not familiar with Scala would be able to easily figure out what it’s doing. I don’t know that the same can be said of the other solution.

I got criticized for promoting terse code in this article, so this is my attempt at balance. 🙂

Note: I did change the inputs to these functions from symbols to strings. The code formatter I use on the blog wasn’t colorizing things properly when there were symbols involved.

Procedural vs. Functional

With the rise of Scala and Clojure, there’s been a lot of talk lately about procedural vs. functional styles of coding. Most developers are accustomed to procedural coding, and functional can be hard to get a handle on. I was working through Programming in Scala (again) this morning, and I came upon this function:

// Procedural implementation
def longestWord(words: Array[String]) = {
  var word = words(0)
  var idx = 0

  for (i <- 1 until words.length)
    if (words(i).length > word.length) {
      word = words(i)
      idx = i
    }

  (word, idx)
}

The purpose of this function is to find the longest word in the passed-in array, and return a tuple with that longest word, and its index in the array. You can see that in this function, we have two vars, one for the current longest word, and another for its index in the array. We then use a for expression to walk the array, reassigning word and idx when we find a longer word. This is very much like how you would write this in Java.

I decided to rewrite this function in a more functional style, just to see how my functional chops are coming along. Here’s what I ended up with:

// A more functional implementation
def longestWord(words: Array[String]) =
  (("", -1) /: words.zipWithIndex) {(old, cur) =>
    if (cur._1.length > old._1.length) cur
    else old
  }

First of all, notice how much shorter this function is than the first one. Also, notice that there is only a single expression in the function, so the outer curly braces aren’t necessary. What this expression is doing is calling zipWithIndex on the passed-in array, which results in an array of tuples containing each word and its index. We then call foldLeft using its operator name of /:, with its initial argument being a tuple with an empty string and -1 for an index. What foldLeft does is apply the function value passed to it to pairs of arguments. On the first pass, the arguments are what was passed in and the first element in the array. On the second iteration, the arguments are the result of the first pass and the second element in the array. This then continues through the entire array. What is returned after the final pass will be a tuple that contains the longest word in the array, and its index.

Now, I don’t claim to be a functional master or anything, but I think this is a decent illustration of how the functional style can reduce the lines of code, and the number of mutable variables, while making the code easier to read and understand.

Slides From My Presentation on Operator Overloading In Scala

Last night I spoke at the Atlanta Scala Enthusiats meeting about operator overloading and a little on implicit conversions. I think the talk went well as I got lots of really good questions from the audience, and they laughed at my jokes. This presentation grew out of a blog post I wrote a few months ago entitled Scala Gets Operator Overloading Right; I beefed it up and made some slides and more code samples. Incidentally, if you Google for “scala operator overloading” that blog post is the first result.

For those of you who weren’t there, here are my slides and the code samples that go with them. I wrote these samples against Scala 2.7.7.final. They should work with the latest Scala 2.8, but I haven’t verified this.

And here’s the source code: oopres.zip