Once More, This Time With Clojure

If you happened to read my post from the other day entitled My New “Top Artists Last 7 Days” Widget, you know that I went through three iterations of getting it going. The final solution, written in Ruby worked well. Until bands like Motörhead, Mötley Crüe and Einstürzende Neubauten showed up in the list. At that point, the HTML parsing library I was using would barf, and processing would stop, leaving the list showing on the blog in an incomplete state. It wasn’t the library’s fault; apparently Ruby still has problems dealing with non-ASCII characters. I did everything I thought I needed to do to tell Ruby that it would be dealing with UTF-8 encoding, but it just kept right on barfing.

I was left with only two choices: stop listening to any band with an umlaut in the name (and God help me if any of my Scandinavian bands popped up, with the Ø or å characters), or rewrite the stupid program, again, in a language that I knew could easily deal with UTF-8.

Since I’ve been working in Clojure a lot lately, it seemed lika the logical choice. I spent about an hour working on it last night, and I ended up with a working program and a bit more Clojure experience. Here’s the program for your edification, with a description to follow:

(ns lastfmfetch.core
(:gen-class))

(require '[clj-http.client :as client])
(import '(java.io PrintStream)
'(org.htmlcleaner HtmlCleaner))

(defn get-artist-and-playcount [cell]
(let [title (.getAttributeByName cell "title")
[match artist playcount] (re-matches #"^(.+), played ([wd]+)s*S*$" title)
playcountStr (if (= playcount "once") "1" playcount)]
[artist playcountStr]))

(defn get-url [cell]
(let [links (.getElementsByName cell "a" true)
a (first links)
href (.getAttributeByName a "href")]
(str "http://last.fm" href)))

(defn fetch-data [filename]
(let [response (client/get "http://www.last.fm/user/joeyGibson/charts?rangetype=week&subtype=artists")
cleaner (HtmlCleaner.)]
(if (= (:status response) 200)
(with-open [out (PrintStream. filename "UTF-8")]
(.println out "<html><head><meta charset="UTF-8"/></head><body><ol>")
(doto (.getProperties cleaner)
(.setOmitComments true)
(.setPruneTags "script,style"))
(when-let [node (.clean cleaner (:body response))]
(let [subjectCells (take 5 (.getElementsByAttValue node "class" "subjectCell" true true))]
(doseq [cell subjectCells]
(let [[artist playcount] (get-artist-and-playcount cell)
url (get-url cell)]
(.println out (str "<li><a href='" url "'>" artist "</a>, Plays: " playcount "</li>"))))))
(.println out "</ol></body></html>")))))

;; Main
(defn -main [& args]
(if (< (count args) 1)
(println "Usage: lastfmfetch <output_file>")
(fetch-data (first args))))

I ended up using a library called clj-http to handle the fetching of the URL. It’s a Clojure wrapper for the Apache HTTP Commons library, and was really easy to use. I’m using Leningen, by the way, so including clj-http was just a matter of including a line in the project.clj file. I also used a Java library called HTMLCleaner, that fixes broken HTML and makes it available as a DOM. Since it is also in Maven Central, it was easy to include by adding another line to the project file.

(defproject lastfmfetch "1.0.0-SNAPSHOT"
:description "Fetch chart data from Last.fm"
:dependencies [[org.clojure/clojure "1.3.0"]
[net.sourceforge.htmlcleaner/htmlcleaner "2.2"]
[ clj-http "0.2.6"]]
:main lastfmfetch.core)

The -main function begins on line 38, but all it really does is check that there is a single command-line argument, and exits with a usage message if there is not. It then calls the fetch-data function, which begins on line 20.

On line 21, we declare two locals; one that will contain the results of fetching the web page, and one that is the HTML cleaner. If the fetch of the URL was successful, the status code will be the standard HTTP 200. If we got that, we then open a PrintStream on the filename given, specifying that it should be encoded with UTF-8. (I’ve been working with Java for a very long time, and I always assumed that since Java strings are Unicode, files created with Java would default to UTF-8. That is not the case. That’s why there’s a second argument when creating the PrintStream, and why I’m not using a PrintWriter.) We then print the first part of the output HTML file, set a couple of options to HTML Cleaner that cause it to strip comments, style and script sections from the HTML, and then start doing the real work.

On line 29, we declare a local called node that will contain the output of HTML Cleaner if it successfully parsed and cleaned the HTML. That’s what when-let does; it assigns the local as long as the function returns something truthy and then executes its body. If that function doesn’t return something truthy, the rest of the code is skipped. We then take the first five elements from the HTML that have an attribute called “class” with a value of “subjectCell”. These are table cells. We then loop over them, extracting the artist and playcount value, and the URL. We do these things in two separate functions.

The function called get-artist-and-playcount, starting on line 8, takes the table cell as input. It then gets the attribute called “title” and uses a regular expression to pull out the artist and playcount values. If the playcount is the word “once,” it converts it to a 1, so all the values are numeric. It then returns the two values as a vector.

The function called get-url, starting on line 14, also takes the table cell as input. It then gets all the “a” elements from the cell (there’s only one), and then gets the “href” attribute’s value, which is the URL.

Back at line 34, we take the three values we extracted with the two support functions and concatenates them together into HTML that will be a single line in an ordered list. We then output all the necessary closing tags to make the HTML well-formed, and we’re done.

While the Clojure code is a bit more dense than the Ruby code, it’s actually four lines shorter. And it handles Unicode characters, which makes me happy.

Grails Podcast Mentions My Closure Post

Like other bloggers with an ego, I have Google Alerts set up to let me know when someone mentions me or my blog anywhere that Google knows about. I got an alert yesterday letting me know that I’d been mentioned on the latest episode of the Grails Podcast. How cool is that? Specifically, they mentioned my [cref groovy-sql-closure-examples Groovy Sql Closure Examples] post. Thanks, Glen and Sven, for the podcast love. :-)

I’ve been spending some time with Grails latest and have been really impressed with it. I spent a couple of hours on Saturday playing with it, seeing how much of my Rails knowledge was applicable to Grails. Quite a bit of it, actually. I really like what I’ve seen of Grails, so far. I’d probably have to use it on a real project to really get a feel for it, but it looks like it would be a nice environment to work in.

Why I Love Closures

I’ve been a big fan of closures for years. I was first introduced to them in Smalltalk, where they were just called blocks. Ruby has them and also calls them blocks. Java does not have them, though there are proposals (such as this one) to add them to a future version of the language. Groovy has them now, and while they aren’t as shiny as those in Ruby, they do work.

So what’s so great about closures anyway? They are blocks of code that retain links to things that were in scope when they were created, but they also have access to things that are in scope when they execute. They can be passed around, usually to methods that will execute them at a later time, in a possibly different context. That doesn’t sound all that exciting, but what is exciting is when the language in question lets you pass in a closure to do some work, and then cleans up after you when the work is done. Here’s an example of some Ruby code that I have written in tons of scripts.

count = 0

DBI.connect(*CREDENTIALS) do |con|
  File.open("results.txt", "w") do |file|
    con.select_all(sql) do |row|
      file.puts "... interesting data from the query ..."
      count += 1
    end
  end
end

puts "Total records: #{count}"

What that code does is declare a variable, count, that will be used to keep track of how many records we processed. It then connects to a database, creates a file called “results.txt,” executes a SQL select, writes some of the data from each row to the file and increments the count variable. At the end, we print out the count variable.

There are three closures in that bit of code. They begin on lines 3, 4 and 5, respectively. The first connects to the database. The second creates the output file and the third performs a SQL select, writes the output and bumps the counter. Rather concise code, don’t you think?

Do you notice anything missing from this code? Two very important things are not there: closing the file and closing the database connection. I said these bits are not there, but they really are. The block version of DBI.connect ensures that no matter how events unfold, either successfully or with an exception, the database connection will get closed. Similarly, the File.open ensures the file will get closed. This is one of the most beautiful aspects of languages that support closures.

As I said earlier, Groovy has closure support baked in. While I’m not completely thrilled with the syntax, it’s close enough to Ruby’s that it’s not bad. As I was writing this post, I was surprised to discover that Groovy’s SQL module doesn’t support a closure passed to the connection, which means you still have to worry about closing your connection when you’re done. Anyway, Groovy’s file-writing idiom looks like this

new FileWriter("results.txt").withWriter {writer ->
  writer.write("... interesting data ...")
}

I don’t really like using the -> as the closure parameter delimiter; I’d rather use a | (pipe symbol) like Smalltalk and Ruby do. Regardless of syntax differences, that’s how you write the file, but what about the SQL stuff? Since we can’t use a closure, it’s a bit more involved.

con = Sql.newInstance(url, user, pass, driver)

try
{
  new FileWriter("output.txt").withWriter {writer ->
    con.eachRow("select * from foo") {row ->
      writer.write("Foo: ${row.id}n")
    }
  }
}
finally
{
  con.close()
}

You can see that even though we can’t use a closure to ensure the database connection gets closed, there are still two closures in use. The one beginning on line five opens the file, while the one beginning on line six writes out some of the data from each row returned by the query. Pretty nifty, eh? I’m disappointed that Groovy doesn’t support passing a closure to the database connection, but maybe we’ll get that in a future version.

For comparison, here’s how you would write this program in straight Java.

Connection con = null;
Statement stmt = null;
ResultSet rs = null;

try
{
  Class.forName(driver);
  con = DriverManager.getConnection(url, user, pass);
  stmt = con.createStatement();
	
  rs = stmt.executeQuery("select * from Foo");
	
  PrintWriter writer = new PrintWriter(new FileWriter("output.txt"));

  try
  {
    while (rs.next())
    {
      writer.println("Foo: " + rs.getString("Id"));
    }
  }
  finally
  {
    writer.close();
  }			
}
catch (Exception e)
{
  e.printStackTrace();
}
finally
{
  try
  {
    if (rs != null)
    {
      rs.close();
    }
	
    if (stmt != null)
    {
      stmt.close();
    }
	
    if (con != null)
    {
      con.close();
    }
  }
  catch (Exception e)
  {
    e.printStackTrace();
  }
}

The majority of that code deals with cleaning up when the real work has been finished. I don’t know what Java’s closures will ultimately look like, or if we’ll get them at all. I’m hopeful, though, that we’ll get a decent implementation, and can then jettison code like you see above. (I know that using Hibernate or some other mapping tool can eliminate code like this, but not every situation needs that type of framework.)

I’m Digging Java Again

I first started doing Java back in 1995. That’s quite a long time ago. Once I got going, I wrote Java code every single day, for thirteen years. I co-authored a Java book, gave talks on Java and was an all-around, Java Guy™. And sometime around 2006, I got bored with it. Completely and totally bored. I was a one-man shop at a small company, my code was running just-fine-thanks-very-much, and I didn’t feel like doing anything new with it, at all. I was more interested in Ruby and, to a lesser degree, Rails, so Java changes didn’t really interest me. And thus, I failed to notice some really cool stuff that was going on in Java-land.

In June of this year I joined a new company that is doing some rather advanced Java work. I had to get current, tout de suite, and in so doing, I’ve really gotten interested and engaged again. Spring and Hibernate have really changed from the older versions I was using, and so has JUnit. All for the better, from what I can tell.

And with this renewed interest, I’ve bought my first new Java books in over 3 years. I bought Effective Java (2nd Edition) to replace my first edition and Java Concurrency in Practice, because I heard good things about it. So far, I’ve read about 2/3 of  Effective Java. I used to buy Java books all the time. I have tons of them. But when I got bored, I stopped shelling out the cash on the Java books.

Java-land is still a very nice place to play. Sometimes you have to get an outside perspective to realize that.

Ruby + MySQL + Windows = SUCCESS! (Finally)

If you do a Google for “ruby mysql windows” you will find, among other things, lots of people trying to use Ruby to access MySQL on a Windows system. I’ve been trying for some time, and have finally gotten things going.

About a year ago I found this article in which he fought the same fight. He was able to get a working .so file, using the .Net C++ compiler, and offered his binary, but that won’t work for everyone. Specifically, a file compiled with VC7 isn’t usable by VC6 (at least not that I could see).

Anyway, I followed his advice to a point, and then started experimenting. I opened up irb and started poking MKMF to see what I could accomplish. What I finally ended up with is a hacked extconf.rb that can be edited easily to get the library built on your own system. It will work with both VC6 and VC7, by changing one line in the file. Assuming that you have MySQL 4.1 installed in the default location, and you are using VC6, and Ruby 1.8.2 is installed in the default location, you should be able to do this

    ruby extconf.rb
    nmake
    nmake install

and have everything compile and install. Notice that I said should. You should still take a look at the file, near the top, to see if the settings are appropriate for your system.

Once installed, you can use this module directly, or install DBI for that “standard” interface. DBI will use the library you just built and installed, so all you have to do is download the DBI module, then do this

    cd ruby-dbi-all
    ruby setup.rb config --with=dbi,dbd_mysql
    ruby setup.rb setup
    ruby setup.rb install

If you want to install other DBD drivers, add them after dbd_mysql. That should be it.

Here’s the hacked extconf.rb file.

There are only two lines you might need to change, and they are at the top of the file with comments around them.

I have tested this slightly using the binary I built with VC6 and it seems to work fine. I have not tested the VC7-compiled binary, because I don’t have a Ruby distro that was compiled with VC7. Let me know if these directions don’t work for you. Not that I can do anything about it, but I could put a note here for others.

Also, notice that I used 8.3 pathnames, which is really ugly. I would have been happy to use the pathnames with spaces, but VC6 doesn’t like spaces in pathnames. VC7 can grok them fine, but not 6. Both understand the 8.3, so it was just easier to use that.

One more thing. I don’t believe that installing Visual Studio.Net installs the necessary header and library files for building non-.Net binaries. IOW, you might not have windows.h which will cause things not to work. If you download and install the Platform SDK, then you’ll be fine. My extconf.rb will add the default Platform SDK directory to the necessary paths, so if you have that installed, and try to do a VC7 build, you should be ok.

I Am Bushed…

Today is Day Three of RubyConf2004 and I’m tired. This has been a good conference. Lots of stuff to think about and play with. Lots of code flying around and lots of laptops all in one room. I was up until 02:00 this morning working on an ADO adapter for Active Record in order to talk to SQL Server. I’ve got about 40 failing tests right now (out of 143) so I’m closing in on finishing… It’s been interesting. I only just began looking at Active Record yesterday morning, so I think I’m doing pretty well. Active Record is an OR framework in Ruby that is really simple to work with. Very cool. It’s part of the Rails suite, which is an MVC framework for building websites in Ruby. Very intriguing.