Java Regex APIs and Quoting

I’ve just been digging through the J2SE 1.4 regex stuff, and every time I have to do regex work in Java I keep thinking how much easier it is in other languages. Specifically I’m talking about the clunkiness of the various regexen APIs in Java and the requirement to double-backslash regex operators. We need a better way. Ruby and Perl both have native regex support built in to the language, so the backslashes are just fine. Python, which doesn’t have native regex support (it’s in the library), does have “raw” string quoting, which allows you not to double-up the backslashes. So what I have to write like this in Java:

1  Pattern p =
2      Pattern.compile("(\(\d+\))?\s*(\d{3}\s*\-\s*(\d{3})");
3  Matcher m = p.matcher(my_string);
4  if (m.matches())
5  {
6      ...;
7  }

or

1  if (Pattern.matches("(\(\d+\))?\s*(\d{3}\s*\-\s*(\d{3})", my_string))
2  {
3      ...;
4  }

looks like this in Python:

1  if re.match("((d+))?s*(d{3}s*-s*(d{3})", my_string):
2      ...

and could be even more easily written in Ruby thus:

1  if my_string =~ /((d+))?s*(d{3}s*-s*(d{3})/
2      ...
3  end

See the difference? The built-in regex support is really nice and the ease of quoting is a beautiful thing. I doubt that we’ll ever see either of these in Java since they would certainly be considered non-trivial to add.

J2SE 1.4.2 + WLS 7.0 + weblogic.ejbc = Problem

I discovered something interesting the other day at the office. We have a guy who has been unable to build a certain entire jar of entity beans ever since he started working there and I had only given it scant thought as to why. I knew that I had had no problems… So he came by my office on Wednesday and I said “OK. Let’s figure this out.” I ran our Ant build file that is specifically used for building this particular jar full of entity beans and got the same errors he was reporting. Odd, thought I. The errors being reported by ejbc were that the compiler couldn’t resolve symbols with names like Foo$ValueObject. That was odd since that looked like the inner classes defined in each of the entity beans were not there. But looking at the classes directory, those classes were definitely there. What was really odd was that simply dropping back to J2SE 1.3.1, or even 1.4.1 worked.

For reference, we have a group of entity beans that each have a nested class called ValueObject. (I don’t particularly like that, but they’re there nonetheless.) Essentially we have 80 entity interfaces that follow this basic pattern:

public interface Foo extends EJBLocalObject
{
   ...

   public ValueObject getValueObject()

   public static class ValueObject
   {
       ...
   }
}

and then 80 entity bean classes that actually implement the getValueObject method.

It was at this point that I started looking at that “$” in the class name. It then hit me that the code should be using a “.” between the outer and inner class names, not a “$”. But the code where the error was occurring was generated by ejbc. I’ve tried this now with WLS 7.0 sp2 and sp3 and the results are identical.

Even though I’ve not been able to find this documented in the release notes of J2SE, it would appear that versions prior to 1.4.2 allowed code to specify an inner class using a dollar sign, even though it was not technically correct, and that 1.4.2 has stopped being lenient in this regard. Yes, I know that WLS 7 is not officially supported with 1.4.2… I haven’t tried with WLS 8 yet; I would assume that since it is supported with 1.4.2 they’ve changed the code generation routines inside ejbc.

So the moral of the story is that if you find yourself needing to run WLS 7 with J2SE 1.4.2, and you happen to have entity beans that you need to run through ejbc, first run your build file for those entities using J2SE <= 1.4.1, and run everything else under 1.4.2.

USAPhotoMaps GPS Software

USAPhotoMaps is one of the coolest pieces of (free!) software I’ve ever seen. It uses images from the MicroSoft TerraServer to populate location information with satellite photos or topographical maps. It interfaces with several GPS systems (such as Garmin or anything that groks NMEA) and will show your current location, let you send/receive waypoint data to/from the GPS unit, zoom in/out, scroll, create routes to send to your GPS unit and lots of other things. This thing has it all and it’s free! If you’re into maps or GPS (you don’t have to have a GPS unit to use it), this is a nice thing to have.

A Game Based on 9-11?

That’s right. Some sick people are creating a computer game based on the horrible events of 9-11-2001. Can you believe that? At the risk of giving them free publicity, the site is here. They have “actual in-game” screenshots of a guy leaping to his death to escape the fires behind him. What kind of person comes up with something like this? I can’t even imagine. And I sure hope I never meet them.

A Really Nice CVS Client

This morning I found out about SmartCVS; a really nice looking GUI client for CVS. I’ve been using WinCVS for a while now, but (no offense to the developers!) it just didn’t feel right. I can’t explain it, but it just didn’t give me all that I wanted. (Laugh all you want about SourceSafe, but that is a really nice client tool.)

Anyway, I just downloaded SmartCVS a few minutes ago and started playing with it. It’s written in Java so it will run on lots of OSs, and it’s fast too. It looks a lot like IntelliJ IDEA and apparently JetBrains has signed some sort of license with SmartCVS to incorporate it into IDEA. There is a free version and a “pro” version with more features. A single license of the pro version is $45 USD which doesn’t sound too bad. I’m going to keep using the free version for a while to make sure I really like it, but thus far it looks like a great tool.

Now if someone would just create a cvs client that can use a proxy! I’m behind a firewall that blocks most useful ports, one of which is 2401… Sigh…

Lisp Macros Are Very Cool

So I’m playing around with Lisp, reading Successful Lisp and thoroughly enjoying myself. I really like Lisp, I just haven’t gotten to use it on anything other than test stuff yet. One of the things that I find the most interesting, and powerful, is the macro facility. Sure, some languages like C have macros that are processed by a preprocessor, but Lisp’s macros are in a league of their own. Consider this code (lifted wholesale from Successful Lisp)

  1  (defmacro def-i/o (writer-name reader-name (&rest vars)) 
  2    (let ((file-name (gensym)) 
  3          (var (gensym)) 
  4          (stream (gensym))) 
  5      `(progn 
  6         (defun ,writer-name (,file-name) 
  7           (with-open-file (,stream ,file-name 
  8                                    :direction :output 
  9                                    :if-exists :supersede) 
 10                           (dolist (,var (list ,@vars))
 11                             (declare (special ,@vars))
 12                             (print ,var ,stream)))) 
 13
 14         (defun ,reader-name (,file-name) 
 15           (with-open-file (,stream ,file-name
 16                                    :direction :input
 17                                    :if-does-not-exist :error) 
 18                           (dolist (,var ',vars) 
 19                             (set ,var (read ,stream))))) 
 20         t)))

What does this mass of parentheses, backquotes, commas and colons do? Lots. Executing the macro thusly

 (def-i/o save-checks load-checks (*checks* *next-check-number* *payees*))

will define two functions, one called save-checks and the other called load-checks, that will store and retrieve the global variables *checks*, *next-check-number* and *payees* to and from a given file name. These methods could be called thusly

 (save-checks "checks.dat") (load-checks "checks.dat")

This macro could be included in any program for which we needed to have reader and writer functions for marshaling data to and from disk files. This example was for a fictional bank, but let’s say I had a program to process data about the Tour de France and I had buckets for teams, riders, jerseys and sponsors. I could do this

 (def-i/o save-tdf-info restore-tdf-info (*riders* *teams* *jersyes* *sponsors*)

and would get save-tdf-info and restore-tdf-info functions that could be called thusly

 (save-tdf-info "tdf.dat") (restore-tdf-info "tdf.dat")

Maybe I’m just easily impressed, but I think that’s pretty cool.

Eclipse 3.0 M2 Is Out

I just got the notice that Eclipse 3.0 M2 is now available. I’ve already been using 3.0, but judging by the release notes there’s a lot of chewy Eclipse goodness loaded in this release. I’ve already downloaded it and “installed” it, such as it is. I’ve upgrading my JDK to 1.4.2 and will fire the new Eclipse up shortly.

Update: OK, I’ve got M2 and JDK 1.4.2 installed. While it does monopolize my CPU on startup, once up it is fast! And there are lots of spiffy new features. My favorites so far are the JavaDoc and Declaration views. The JavaDoc view shows the javadoc for the selected method and the declaration view shows the declaration. Clicking on the “println” part of System.out.println results in the javadoc or the source code being shown, depending on which view is selected. Another feature, which mimics something that is in the new IntelliJ is a change indicator in the gutter on lines that you’ve changed in this editing session. Hovering over the marker shows what the line looked like before and allows you to revert changes. Very nice, indeed!

Kata 6 In Lisp

I got bored tonight and had a go at writing Dave Thomas’ Kata 6 in Lisp. It just seemed like a good thing to do. The code is below. I’m not a Lisp wizard by any stretch, so I welcome any comments from Lisp mavens. It’s interesting to note that this version comes up with 2,531 matches, while my Ruby version only found 2,506. Dave says you should find 2,530. Also note that all I did was the finding. I didn’t implement the largest set, long word, etc from the original kata.

(setq anagrams (make-hash-table)) 
(setq count 0) 
(defun canon (word)   
(setq norm-word (string-downcase word))  
 (setq canon-word (sort (copy-seq norm-word) #'char-lessp))  
 (setq canon-word (intern canon-word))  
 (setf (gethash canon-word anagrams)        
 (cons norm-word              
 (gethash canon-word anagrams))))  
(with-open-file (stream "wordlist.txt")                 
(do ((line (read-line stream nil)                            
(read-line stream nil)))                   
  ((null line))
                   (setq count (+ count 1))
                   (canon line)))
  (maphash #'(lambda (key val)
              (if (= (length val) 1)
                  (remhash key anagrams)))
          anagrams)
  (format t "Total words: ~D; Total anagrams: ~D" count
         (hash-table-count anagrams))
  (maphash #'(lambda (key val)
              (print val))
          anagrams)

Update: I discovered today that instead of interning the string I could have created the hashtable with a different test, like so

 (setq anagrams (make-hash-table :test #'equal)) 

and then removed this line

 (setq canon-word (intern canon-word)) 

Simplicity and Consistency

Mike Clark this morning has a bit of a nudge for Rael to give Ruby a try. Mike makes the following statement that I completely agree with

The beauty of Ruby is its simplicity and consistency. With Ruby, I find myself writing code to get the job done rather than to appease the compiler.

So true! Since Ruby is a dynamic language, there are no variable types to declare, no static checking; variables are just slots. The number of lines of Ruby code to do something is far less than the equivalent Java code, and I would argue more readable. You don’t have to jump through hoops to make the compiler happy, you just write your code to do what you need done. That’s it. It’s a beautiful thing.

The fact that regular expressions are baked right into the language is also a giant plus. This is how Perl does it, and Matz basically lifted this approach when he created Ruby. Python‘s regex support is not nearly as nice since you have to create a regex and call methods on it instead of using a regex literal and using special variables to get the groups, etc. Where having baked-in regex support really shines is in not having to escape backslashed atoms in the regex. Regexen in Java are even more difficult to read than usual because every backslash is doubled to keep the Java string parser from barfing on unknown escapes.

Kata 6

I took a swipe at implementing Dave Thomas’ Kata 6 which is an assignment dealing with anagrams. The goal is to parse a list of 45000-ish words, finding all the words that are anagrams of other words in the file. Dave claims there are 2,530 sets of anagrams, but I only got 2,506. I’m not sure where the disconnect is, but here’s my solution. I welcome any comments and critiques.

 words = IO.readlines("wordlist.txt")  anagrams = Hash.new([])  words.each do |word|     base = Array.new     word.chomp!.downcase!      word.each_byte do |byte|         base << byte.chr     end      base.sort!      anagrams[base.to_s] |= [word] end  # Find the anagrams by eliminating those with only one word anagrams.reject! {|k, v| v.length == 1}  values = anagrams.values.sort do |a, b|     b.length  a.length end  File.open("anagrams.txt", "w") do |file|     values.each do |line|         file.puts(line.join(", "))     end end  largest = anagrams.keys.max do |a, b|     a.length  b.length end  puts "Total: #{anagrams.length}" # puts "Largest set of anagrams: #{values[0].inspect}" #  print "Longest anagram: #{anagrams[largest].inspect} " #  puts "at #{largest.length} characters each" 

Update: Of course, 10 seconds after uploading the code, I see something I could change. Instead of sorting the anagram hash descending by array length, I could have done the following:

 longest = anagrams.to_a.max do |a, b|     a[1].length  b[1].length end 

This will sort and pull the largest one off. The key is bucket 0 and the interesting array is in bucket 1.