Showing posts with label java. Show all posts
Showing posts with label java. Show all posts

Monday, August 2, 2010

On Removing Java Checked Exceptions By Means of Perversion

/*

What do you do in Java when you need to throw a checked exception in a context where it's not allowed, for instance when implementing a third party interface with no declared throws clause? If you're smart and its allowed you just write the code in another language. Otherwise you probably just wrap the bastard checked exception in an unchecked exception and be on your way.

But the wrapping solution a) requires an extra object allocation and b) unwrapping the original exception is a bit messy.

No, no, those justifications for what I'm about to do are horse shit rationalizations. The fact is I just want to pervert Java. I make it my job to pervert languages for their own good. Or at least, my own amusement.

Instead of "throw" meet "chuck" which turns checked exceptions into unchecked chucked exceptions. This post is an executable Java program.

Edit: This is an idea that can be traced back to Java Puzzlers by Bloch and Gafter and perhaps further. I've just refined it a bit with a bottom return and a way to catch the exception.

*/

import java.io.IOException;

public class Unchecked {

/*

The key to today's lesson in programming aberration is the fact that Java allows you to cast to a generic type parameter but can't actually check the cast at runtime due to type erasure. The following method does an unchecked cast from a Throwable to some generically specified subtype of Throwable and then immediately throws it. Neither the static nor dynamic type system ever have a chance to detect any deviant behavior such as casting to a "wrong" type. The code also lets the caller specify any return type. A previous article explains why that's useful.

*/

   @SuppressWarnings("unchecked")
   private static <T extends Throwable, A> 
     A pervertException(Throwable x) throws T {
      throw (T) x;
   }

/*

Then chuck puts some lipstick on that kinky little pig by telling it to act like it's throwing a RuntimeException. Et voilà, no need to declare the exception in a "throws" clause.

*/

   public static <A> A chuck(Throwable t) {
      return Unchecked.
        <RuntimeException, A>pervertException(t);
   }

/*

Some sample code shows how to use chuck.

*/

   public static int testChuck() {

/*

We can't throw an IOException because it's checked

*/

      // throw new IOException("checked, oh noes");

/*

But we can chuck an IOException

*/

      return chuck(new IOException("unchecked, hellsyeah"));

/*

And, just like with a throw the next line won't compile because it's unreachable.

*/

      // System.out.println("dead code");
   }

/*

If you never want to catch the exception or you want to handle it with a top level "catch Exception" then that's all there is to it. But if you want to catch and handle the IOException specifically then there's one tiny flaw in the system. Sadly, the following won't compile because IOException isn't statically known to be thrown by testChuck() and Java wouldn't want you to catch something that can't be thrown now would it?

*/


//   public static int wontCompile() {
//      try {
//         testChuck();
//      } catch (IOException e) {
//         System.out.println("Why did you leave me with" + 
//                            " the sad clown of life?");
//      }
//   }

/*

The fix is to add a way to tell the compiler what exceptions we might chuck, kinda like the "throws" keyword but this one chucks. Totally different. The chucks method threatens to throw a T but in a sudden fit of shame it does nothing - per a tip from a commenter it takes a class argument to avoid Java's hideous generic method invocation syntax.

*/

   public static <T extends Throwable> 
     void chucks(Class<T> clazz) throws T {}

/*

And here we go, we can catch our chucked exception.

*/

   public static void main(String[] args) {
      try {
         chucks(IOException.class);

         testChuck();         
      } catch (IOException e) {
         System.out.println("Caught chucked exception " + 
                             e + ".");
      }
   }
}

/*

Running that should display "Caught chucked exception java.io.IOException: unchecked, hellsyeah."

My depraved perversion work is done. Now I may rest. I leave you with this imponderable: how many unchecked checked exceptions will Unchecked.chuck() chuck?

*/

Tuesday, May 25, 2010

Anatomy of an Annoyance

The warning "Type safety : A generic array of T is created for a varargs parameter" is so irritating it makes me want to beat Java with a snoracle, an implement that allows defunct tech companies to breath while underwater. This article will explain what the warning is, where it comes from, and why it is pretty boneheaded. The warning also shows just how hard it is to extend an existing language without creating pain for your users.

My sample use case: anytime you build a Map in Java it takes approximately 6 metric tons of boilerplate.

Map<Integer, String> numbers = 
   new HashMap<Integer, String>(3);
numbers.put(1, "one");
numbers.put(2, "two");
numbers.put(3, "three");
// etc

It would sure be nice if that could be done in one line of code as in most sane languages. So let's fix it! First, we need the ubiquitous Pair class[1].

public static class Pair<A, B> {
   private A first;
   private B second;

   public Pair(A first, B second) {
      this.first = first;
      this.second = second;
   }

   public A getFirst() {
      return first;
   }

   public B getSecond() {
      return second;
   }
  
   /* a real Pair class would also have equals and 
     hashCode, but for this code I don't need them */
}

And then, to make it more pleasant to use, we write a static function that can take advantage of Java's half-assed type inference.

public static <A, B> Pair<A, B> pair(A first, B second) {
   return new Pair<A, B>(first, second);
}

Finally, the pièce de résistance

public static <A, B> Map<A, B> map(
      Pair<A, B>... pairs) {
   Map<A, B> map = 
      new HashMap<A, B>(pairs.length);
   for(Pair<A, B> pair : pairs) {
      map.put(pair.getFirst(), pair.getSecond());
   }
   return map;
}

And so, with faces smiling as the sunoracle shines down upon us, we use it

Map<Integer, String> numbers = 
   map(pair(1, "one"), pair(2, "two"), pair(3, "three"));

Ah, sweet sweet beauty[2]. Except one nasty, tiny, shoot-me-now-and-bury-my-carcass-in-downtown-Santa-Clara annoyance. That single lovely line of code that so innocently tried to use all our wonderful machinery is tagged with the warning "Type safety : A generic array of T is created for a varargs parameter." And the only way to "fix" it is to nuke the whole area from orbit with '@SuppressWarnings("unchecked"),' possibly hiding real problems in the same method. Gah.

History, Part The First

To understand the problem we have to go way back in history, at least as far as Java 1.0. Maybe even to Oak or to the egg and sperm that fused to make James Gosling. Whatever. In designing Java there was a question: since String is a subtype of Object shouldn't String[] be a subtype of Object[]. After all sometimes that's useful: if I have an array of Strings and you want to write a function that iterates through an array of Objects, aren't we good if I send you my Strings?

Well, no, we're not good. We're very bad. If String[] is a subtype of Object[] then Bad Things Happen™


void badThing(Object[] objects) {
   objects[0] = new Integer(42);
}

void test() {
   String[] strings = {"a", "b", "c"};
   badThing(strings);

   // length of an integer?
   System.out.println(strings[0].length());
}

If the above were allowed then we'd be trying to find the length of an Integer and that, um, won't work. Reading is okay but writing is bad. So to solve the problem the original Java designers made a pact with Satan. Instead of statically preventing code like that they added a dynamic check: anytime you store into a non-primitive array Java does an instanceOf-like check to make sure it's legit. In my sample code, the assignment is checked in "objects[0] = new Integer(42)". Failure will give an ArrayStoreException. That means every array store has to do expensive extra work and, more importantly to this article, at time of construction an array must be associated with a dynamic type tag. Like all pacts with Satan the negative consequences of this deal weren't obvious until much later.

History, Part the Second

With Java 1.5, Java programmers were finally able to move to the forefront of 1970's era technology with parameteric polymorphism, er, excuse me, generics. But see, there was this huge collection library and 3rd party libraries that extended it and...how the hell are we going to make the future safe for the past?

The decision was to compile with "type erasure" - dynamic type tags no longer carry all the information in the static type. Now, there's nothing evil about type erasure. Some of my best friends are languages that compile with even more type erasure than Java. It's just that Java has "instanceof" and other similar things that don't play well with it. Relevant to this article we have the aforementioned pact with Satan where arrays need a dynamic type tag or Bad Things Happen™.

Concerned with Bad Things, the Java designers started spackling over some holes. The upshot: this code fails to compile with the static error "Cannot create a generic array of T"

public <T> T[] arrayOfT(T a1, T a2, T a3) {
   return new T[]{a1, a2, a3}; // bzzt static error!
}

Same deal with this, "Cannot create a generic array of Pair<A, B>".

public <A, B> Pair<A, B>[] twoPairs(
      A a1, B b1, A a2, B b2) {
   return new Pair<A, B>[]{pair(a1, b2), pair(a2, b2)};
}

But this is a very shallow spackling indeed. Keep reading.

History, Part the Third

Also with Java 1.5, the designers added varargs: the ability to pass an arbitray number of arguments of the same static type to a method. So convenient! But, uh, apparently the varargs guys and the generics guys didn't talk to each other until late in the game. See, the varargs folks decided to implement varargs as sugar over arrays rather than something that's actually type safe.

public void whatClassIsIt(Object... things) {
   System.out.println(things.getClass());
}

whatClassIsIt(pair(1, "one"), 
   pair(2, "two"), pair(3, "three"));

There's our old buddy erasure. The above code prints "class [LPair;" - array of Pairs. But note it's not dynamically known to be an array of Pair<Integer, String>.

Making Bad Things Happen™

Now we've got enough knowledge to screw things up nicely. We've got statically unsafe arrays that depend on runtime type tags to provide a reasonably early error. And we have an ability to create arrays that don't carry all the information needed to do the runtime check.

public static <A, B> Pair<A, B>[] pairs(
      Pair<A, B>... pairs) {
  return pairs;
}

Pair<Integer, String>[] pairs = 
  pairs(pair(1, "one"), pair(2, "two"), pair(3, "three"));
Object[] objects = pairs;
// no error in this assignment, statically or dynamically!
objects[0] = pair('c', 2.0); 

// arrgg ClassCastException
System.out.println(pairs[0].getSecond().length()); 

The code doesn't have an error at the point of assigment. The error does finally occur later when I attempt to use a value. That means that in real code the error could be very far from the real cause. The only diagnostic that prevents this Bad Thing™ from happening is the warning on the first line; our lovely "Type safety : A generic array of T is created for a varargs parameter".[3] And that warning has to occur in a place that doesn't know whether the use of the array is actually bad or not. Hence, our code's beauty (or at least what passes for beauty in Java) is shot.

Conclusion

Ah, if only. If only Java didn't have covariant arrays. Or, since it does, if only the designers had used something other than arrays to carry varargs. Woulda coulda shoulda. This article shows just how hard it can be to extend an existing language without shooting yourself and your users in the ass. Runtime checked array assignment + varargs in arrays + type erasure == pain.

Now you know why Scala's arrays are invariant.

Footnotes

  1. Q: What makes Java so manly? A: It forces every programmer to grow a Pair.
  2. Please don't rag me about how I wrote 17 lines of code to avoid writing 3 - presumably I'd reuse this infrastructure a lot.
  3. For an even nastier problem with the erasure scheme and arrays see this example of trying to write higher order code using generics and varargs.

Thursday, July 23, 2009

Void vs Unit

I suspect most people who come to one of the statically typed functional languages like SML, OCaml, Haskell, F#, or Scala will have only seen static typing in one of the popular C derived languages like C++, Java, or C#. Some differences in type system become clear fast. One particular spot is the relationship and difference between void and unit.

So imagine your favorite C derived language. Consider your language's equivalent of the following function definition [1]

void printHello(String name) { printf("hello %s", name); }

What does the word "void" mean? To any C, C++, Java, C# programmer it means that printHello doesn't return anything. Simple.

But a functional programmer[2] sees it quite differently. This article will explain the different point of view and why it's useful.

What's a Function?

It turns out to be very hard to write a single definition of "function" that covers all the ways the word has been used. But to a functional programmer at the very least a function is something that maps inputs to outputs. In the impure functional languages that function may also have an effect like printing "hello." I'll stick with the impure definition in this article.

Now, when a functional programmer talks about mapping inputs to outputs he or she doesn't mean taking a string as an input and outputting it on the console. Instead he or she is talking about arguments and returns of a function. If something is to have any pretense at being a function, or at least a function that returns normally, then it must return something.

printHello clearly does return normally so a functional programmer insists that it returns something. But a functional programmer also must recognize that it doesn't return anything "meaningful."

Thus rather than the concept "void," a functional programmer thinks of unit. In some languages the unit type is written as (), but for clarity I'm going to stick with "unit." Now, unit is a type, conceptually not much different from say "int". But unit, as its name somewhat implies, has exactly one value conventionally represented as an empty pair of parentheses ().

In OO terms () is a "singleton". There's only one instance of it. It's also immutable. In a very real sense it carries no information. You can't write a program that behaves differently based on what a function of type unit returns - it always returns the same ().

A Performance Diversion

At this point a little guy in the back of your head might be asking "doesn't returning something meaningless just waste cycles?" Well, it's important to distinguish between abstraction and implementation. The unit value, (), is an abstraction that the implementation is free to optimize away. If you write in Scala

def printHello : Unit = {
  println("hello")
  ()
}

Then the compiler will generate the equivalent of the original void based code. If in Scala you write

print(printHello)

Then the compiler might generate something like

printHello
print( () )

Because () is a singleton it doesn't matter at the machinecode/bytecode level that it's not "actually" returned. The compiler can figure it out with no performance hit at all.

Making It Useful

Okay, blah, blah, blah, functional think this, unit that. Whatever. What's the point? () is a value with no information, unit is a type with only that value...what's the practical use?

Well, it turns out that its useful for writing generic[3] code. Java and C++, somewhat famously, do not have first class functions. But they can be faked. Here's the Java

interface Function<Arg, Return> {
   Return apply(Arg arg);
}
Function<String, Integer> stringLength = new Function<String,Integer> {
  Integer apply(String arg) {
    return arg.length();
  }
}

In C++ you can do it essentially the same way with a template and operator() overloading[4]

template <typename Arg, typename Result>
struct Function {
  virtual Result operator()(Arg arg) = 0;
};
struct StringLength : Function<string, int> {
  int operator()(string arg) {
     return string.length();
  }
}
Function<int, string> *stringLength = new StringLength();

C# has a function type and lambdas so it's very easy to write.

Func<string, int> stringLength = s => s.Length;

So, if stringLength is an object of type Function<String, int> / Function<int, string> * / Func<string, int> then I can write the following

int result = stringLength.apply("hello"); // Java
int result = (*stringLength)("hello"); // C++
int reulst = stringLength("hello"); // C#

So here's the question: how would I convert the printHello function earlier into a Function<String,X>/Function<string, X> */Funct<string, X>. What type is X?

If you've been playing along so far you recognize that it "should be" void, but Java and C# don't allow it. It's invalid to write Function<String, void> in either language.

Let's pretend you could. After all, C++ will let you write a class PrintHello : Function<std::string, void>. But there's still a problem even in C++. Generic code might expect a result and none of these languages let you denote a value of type void. For instance (in Java) imagine thise simple bit of reusable code.

public static <A,B> List<B> map(List<A> in, Function<A,B> f) {
  List<B> out = new ArrayList<B>(in.size());
  for (A a : in) {
     out.add(f.apply(a));
  }
  return out;
}

C++ has std::transform and C# has IEnumerable#Map, which are more or less the same idea.

Anyway, given something like the above calling map/transform/Map with a list of strings and the printHello object should result in a List<void> of the same length as the original list. But what's in the list? The answer, in a functional language, is a bunch of references to (). Tada, the meaningless is useful for writing generic code.

C++ will allow a Function<std::string, void> but any attempt to call something like map with it will fail to compile. C++ doesn't have a () value so our generic code has become slightly less generic.

The Work Arounds

Java, C++, and C# programmers do solve this problem all the time. One common option is to not use Function<string, void> but some other concept like "Action<string>" to mean a function that takes a string, performs a side effect, and returns nothing useful. Then you essentially duplicate "map" into something called perhaps "foreach" that expects an Action<T> and returns void. That probably makes sense in this case since a list of references to a meaningless value is perhaps silly, but it also means a great deal of code duplication if this kind of higher order programming is common. For instance, you can't write just one compose function that creates a function from two other functions, you also have to write a compose that takes an action and a function to create an action.

Another answer is to make the printHello function object have type Function<String,Object>/Function<string,void*>/Func<string,object> and return a null. That's pretty disgusting.

Perhaps the best option is to create your own Unit enumeration with only a single value. That does leave Unit nullable in Java, but railing against nullability in Java is another blog post.

enum Unit { unit }; // C++, Java, or C#

As for good old C, well, the type system doesn't do parametric polymorphism so generic code in C is problematic no matter what you do. Still, with function pointers it's conceivable you might find a use for a unit value. Typically a C programmer would use 0.

A Quick Aside About Purity

For the most part I've casually used effects everywhere. Most functional languages like SML, OCaml, F#, and Scala are impure and let the programmer get away with that. Some functional languages, e.g. Haskell without unsafe* extensions, do not. They must track effects using the type system. But even with this caveat, the unit type is still useful to, for instance, distinguish from a function that fetches a string from the keyboard which would have might have type IO String vs one that prints to the console which might have type String -> IO Unit.

Conclusion

Void and unit are essentially the same concept. They're used to indicate that a function is called only for its side effects. The difference is that unit actually has a value. That means that unit functions can be used generically wherever a generic first class function is required. The C derived languages miss out by treating void functions as functions that don't return anything instead of as functions that don't return anything meaningful.

Next time I'll talk about functions that really, truly return nothing.

Postscript on Aesthtics and Tuples

Mostly I've talked about the practical value of the unit type in this article, but I do love it when convenient practice and beautiful theory line up. So here's an attack from a different direction.

Many functional languages have tuples. A tuple is an ordered fixed size collection of values of (possibly) different types[5]. Tuple types are usually written with parens. For instance, (Foo, Bar) is the type of pairs (2-tuples) with one Foo, one Bar. The same type is sometimes written as Foo × Bar, which makes sense too. You can see the type as a set that consists of a kind of Cartesian product of the Foo and Bar sets. For instance if Foo has only two possible values {f1, f2} and Bar has only two possible values {b1, b2} then the type Foo × Bar has 4 possible values, {(f1, b1), (f1, b2), (f2, b1), and (f2, b2)}.

You can have tuples that with higher numbers of members like triples and 4 tuples and such.

What about a 1-tuple? Well, the type Foo can be seen as a 1-tuple. There's would be no useful difference between the type Foo and the type (Foo).

The question to ask is, given that we have 3-tuples, 2-tuples, and 1-tuples does it make sense to have a 0 tuple? The answer, dear reader, is unit. The unit value is a tuple with no members. That's why Haskell calls both the type and the value "()" - it's like the type (A,B) without the A,B. And just as an empty list and empty set aren't quite "nothing" neither is the empty tuple.

And here's another interesting thing, type theorists will also call the unit type "1" (not to be confused with the value 1). To see why, look at the type (Foo, unit) which can also be written as Foo × 1. If Foo only has 2 possible values {f1, f2} then the entire set of possible values for the type Foo × 1 is {(f1, ()), (f2, ())}. It's the same 2 values extended with a little bit of "meaningless" information that can be stripped off or added back in as desired. Formally speaking it's isomorphic to the type Foo. To abuse notation slightly, Foo × 1 = Foo. So unit is not only the type that has only 1 value, it's the type that behaves like a 1 at the type level.

Footnotes

  1. In C of course it would be char *name, in C++ it might be char * or I might use the string and iostream libraries. Whatever. The differences aren't that relevant to this discussion.
  2. To save myself a lot of time I'm using phrases like "functional language" to mean the statically typed functional languages that have been influenced to a large extent by ML. Some of what I'm talking about does translate into some dynamically typed languages as well, although they often "pun" things so that there isn't a unit value that's distinct from nil, null, false or some other singleton value.
  3. I use the phrase "generic" here to mean what the functional community means by "parametrically polymorphic" or just "polymorphic." However, my target audience is more likely to associate polymorphism with OO dispatch, so "generic" it is.
  4. In C++ you can also do it without the pure virtual base class and just use template structural typing based on operator(), but shortly it'll be easier for me to describe a problem if I use a base class to give the concept a name. I could also allocate on the stack but that slightly confuses my point. I'm going this route to keep things symmetrical among the languages. Besides, this isn't a tutorial on why C++ isn't Java. On a related note it's a shame C++0x won't have concepts.
  5. This is quite different from a Python tuple which is really just a list.

Friday, April 24, 2009

Java Has Type Inference and Refinement Types (But With Strange Restrictions)

When enthusiasts talk about Scala they frequently start with "it's a lot like Java but with all this other cool stuff like type inference and structural types and..." I guess it's okay to introduce Scala that way. I've done it myself. But it does give the impression that Scala has somehow heaped complexity onto the Java. Oddly enough, it hasn't, at least not in the areas of type inference and structural types. Java has both. Sorta. They're just not very useful.

In this short article I'm going to try to show that while type inference and structural types appear to be additions to Scala over and above Java, they can also be explained as the removal of peculiar restrictions from Java. The goal is not just to illustrate these two points, but to make a broad statement: Scala isn't a just bunch of features heaped on top of Java. To a significant degree it's a generalization of things already found in Java. The result is that many aspects of Scala are both simpler and more regular than Java.

Type Inference

Java folk get excited about Scala's type inference. Usually positively, but sometimes negatively. But here's the thing: every practical statically typed language has some type inference. Every one. Even Java has type inference. It's just very limited. Let me show you by starting with a simple class

class Foo {
  public Foo bar() { return this; }
  public String toString() { return "Foo!"; }
}

With that I can write

System.out.println(new Foo().bar().bar().bar());

It's silly code but it illustrates my point. Java doesn't need me to explicitly annotate the return type of each call to bar(). Java infers the type resulting from new Foo().bar() as being a Foo, which in turn can be sent a bar() message resulting in a Foo, etc. If I change things just a little the type system complains

System.out.println(new Foo().bar().bar().baz()); // bzzt error!

In Java I "only" have to annotate types when I want to assign to a variable or return from a method. Which, come to think of it, is an odd restriction. The compiler obviously knows the type, so why do I have to tell it yet again?

Foo f = new Foo().bar().bar().bar();

Thus, rather than saying Scala adds type inference we can equally well say that it removes a silly restriction from Java's type inference[1].

Structural Refinement Types

The fact that you don't have to annotate types when building expressions in Java is well known even if people don't usually call it type inference. What's not as well known is that Java has refinement types. So I'll have to explain, this time starting with Scala.

In Scala you can say method doSomething takes any object with a bar method, like so

def doSomething(x : {def bar : Unit}) = ...

That's called structural typing - the doSomething method puts a constraint on the structure of objects that may be passed to it. But actually, in Scala, that's just shorthand for the following refinement type

def doSomething(x : AnyRef {def bar : Unit}) = ...

That says that x must be a subtype of AnyRef and must have a bar method. The type of x is a "refinement" of AnyRef with an additional structural constraint. Scala's refinement types aren't limited to just refining AnyRef, but that's really not important to my point here. My point is that Java has refinement types, too.

If you're a Java programmer, you're probably scratching your head thinking I've lost my sanity. But I promise I haven't. I ask you, in Java what exactly is the type of this expression?

new Object() { 
  public void bar() {
     System.out.println("bar!"); 
  } 
}

If you said "Object!" think again. In Java it's actually something more sophisticated. Go paste this into your favorite Java IDE and run it. I'll wait

public class Test {
  public static void main(String[] args) {
    (new Object() { 
      public void bar() { 
        System.out.println("bar!"); 
     } 
    }).bar();
  }
}

Surprised? Remember my point about limited type inference above? Here, Java has clearly inferred that the result of the expression is an Object refined with a bar method[2]. And so it lets you send a bar() message to the object. Rename either the method or the call to "baz" and you get a type error. But if you want to do anything interesting with the (new Object {public void bar() {...}};) object, like assign it to a variable, return it from a method, or apply it as an argument to a method then Java has a problem. It has no way to denote the refinement. So the refinement gets "lost" from the static type system. Poof. Gone. It's part of the object, you can find it via reflection, but the static type system won't let you get at it. The static type system only lets you get at a refinement "immediately" in a surrounding expression. Which is pretty close to useless. When you look at it that way, Scala hasn't so much added refinement types as removed another silly Java restriction.

Conclusion

When Java people first start looking into Scala it's common to think of Scala as being some complicated mishmash. In large part, evangelists (including me) are to blame. It's surprisingly hard to convey a nice feature of language A to users of language B without making it sound like something totally new, hence something that will complicate their lives. Hopefully this article has helped illustrate that in a very real sense some of the things that appear to be "new" in Scala are really "old" but generalized to be made more useful.

So this article is a call to all Scala pundits out there: when comparing Scala to another language, especially Java, go ahead and get excited about a "new" feature...but then see if you can't explain it as a generalization of an "old" one. You might be surprised how often you can and your audience might come away less intimidated.

Foot Notes

  1. Technically, Scala's type inference is quite a bit more sophisticated than Java's. But I won't let inconvenient facts like that get in the way of making a good point.
  2. Technically, I think the Java spec says that the type isn't structural, but some anonymous nominative type that the compiler can infer and use over a very limited scope. Then again, maybe not. I'm too lazy to go re-read the spec. Also, as already established, I won't let inconvenient facts like that get in the way of making a good point.

Friday, July 18, 2008

Java Is Too Academic

Occasionally you'll hear people praising Java because it's incredibly readable; there's only one way to do things; and other languages are too academic. But I have proof that they're wrong. After you view the code below I think you'll agree with me that with Java:

  1. It's far too easy to write code that's illegible to most programmers.
  2. There are far too many ways to do things.
  3. It's far too academic.

I have reason to believe other popular languages like Ruby and Python allow this sort of thing, too. When will we finally stop inventing such academic languages and let real world working programmers get some real world work done?

The proof that Java is too academic and obscure? Right here: factorial using a fixed point operation in an applicative order lambda calculus. "Obscure" and "academic" don't even begin to describe this code*.

public interface Lambda {
    public Object apply(Object x);
}

public class Factorial {
 public static void main(String[] args) {
  final int x = new Integer(args[0]);
  
  System.out.println(x + "! = " + 
  ((Lambda)(new Lambda() {
   public Object apply(final Object f) {
    return new Lambda() {
     public Object apply(final Object x) {
      return ((Lambda)f).apply(new Lambda(){
       public Object apply(final Object y) {
        return ((Lambda)((Lambda)x).apply(x)).apply(y);
       }       
      });
     }     
    }.apply(new Lambda() {
     public Object apply(final Object x) {
      return ((Lambda)f).apply(new Lambda(){
       public Object apply(final Object y) {
        return ((Lambda)((Lambda)x).apply(x)).apply(y);
       }       
      });
     }     
    });    
   }   
  }.apply(new Lambda(){
   public Object apply(final Object f) {
    return new Lambda() {
     public Object apply(final Object x1) {
      final int x = (Integer)x1;
      return x == 0 ? 1 : x * (Integer)((Lambda)f).apply(x -1);      
     }
    };
   }   
  }))).apply(x));
 }
}
* "Obscure" and "academic" may not begin to describe it, but "perverse" starts getting close

Sunday, August 26, 2007

Martians vs Monads: Null Considered Harmful

I hate null. More specifically I hate NullPointerExceptions(NPEs). Scala has an alternative, but let me explain why I hate NPEs. They're like evil, sneaky muppets. Remember the Martians on Sesame Street? They'd ask each other if a phone was a cow and then say "nope, nope, nope, nope." To my eye, NPE looks a lot like "NoPE."

Me: Is my unit test working?
Computer: NoPE
Me: Um...okay, fixed that problem...how 'bout now?
Computer: NoPE, NoPE, NoPE, NoPE
Me: ARGGHRHGR! You're a Sesame Street Martian!
Computer: Yip, yip, yip, yip1

Does all this code/test/fix/Martians/yelling have to be part of the game? Can't a static typing system save me from this? Can't we stop the Muppets from chanting in my head?

The answer in Java and C# is "No! Well, sorta! Um, no." It is possible to implement a pattern whereby you should never get NoPEs. It's pretty painful to do and the "should" qualifier means it's not really worth it, but it does help illuminate a solution that's available in Scala.2

Some Craptacular Code

First, what's the big deal? Can't I just check for null and deal with it? Here's Java to demonstrate. Any resemblance to a C# code is purely coincidental.


public interface Spaceship {
  public Martian getMartian();
}

// the main method...
Spaceship spaceship = getSpaceshipSomehow();
Martian m = spaceship.getMartian()
if (m != null) {
  m.dieDieDie();
  System.out.println(m.toString() + 
    " is toast.");
} else
  System.out.println(
    "No Martians here!");
}  

The idea here is that some spaceships have Martians while others don't. If there is a Martian, that little bastard will never NoPE me again. Checking for null certainly works. But there's a problem - how do I know for certain that getMartian can return a null? If I didn't write Spaceship, what made me decide to do the null check? And how do I know that getSpaceshipSomehow() won't ever give me a null if I didn't wite it? The answer is I don't, that's how. It might be in the documentation or might not. The docs might be right or they might not. I might have access to the source code or I might not. I might remember what I wrote last week or I might not. The only answer would seem to be to wait for the NoPE to show up in testing or check for null defensively everywhere. Hmmm...stupid Martians.

Okay, let's try creating a holder that I'll call "Option" for reasons we'll see later. Once again, here's some Java (C#ers please consult your magic decoder rings).


public class Option<T> {
  private T value;

  public Option<T>(T value) {
    this.value = value;
  }
}

public interface Spaceship {
  public Option<Martian> getMartian();
}

// the main method...
Spaceship spaceship = getSpaceshipSomehow();
Option<Martian> o = spaceship.getMartian();
if (!o.value() == null) {
  Martian m = o.value();
  m.dieDieDie();
  System.out.println("I'm a 733T Martian h4x0r."
    + m.toString() + " has been pwned.");
} else
  System.out.println("W00T");
}

The idea is that API builders indicate nullibility by saying methods return an Option. getSpaceshipSomehow promises that it will return a Spaceship while getMartian may or may not return a Martian. It's okay as far as it goes, but it's entirely too easy to call o.value() without checking for null first. I'd like the type checker to help me here - maybe with subclasses. Here's the attempt in Java. The equivalent C# should be...um...equivalent.

public abstract class Option<T> {}
public class None<T> extends Option<T>{}
public class Some<T> extends Option<T>{
   private T value;

   public Some<T>(T value) {
     this.value = value;
   }
   public T value() {
     return value;
   }
}

public interface Spaceship {
  public Option<Martian> getMartian();
}

Spaceship spaceship = ...
Option<Martian> o = spaceship.getMartian();
if (o instanceof Some<Martian>) {
  Martian m = ((Some<Martian>)o).value();
  m.dieDieDie();
  System.out.println(m.toString() + 
    " has been laterally motivated.");
} else
  System.out.println("Thinking outside the " +
    "box leverages synergies.");
}

I've got an abstract base class, Option, that's really just a marker that a value might be "null" or not. None means "null", and Some means it's a real value. Code depending on this structure uses instanceof checks to determine which way to cast.

Unfortunately, this code has three holes and one major (huge) annoyance. The first hole is that the spaceship designer has to ensure getMartian always returns an Option and never a null. If that gets blown, we get NoPEd. The second hole is that there's no way in Java to prevent somebody else from creating a subclass of Option - and that would break the logic. The third hole is that I could just cast an Option to a Some without doing an instanceof check, turning a NoPE into an equally irritating ClassCastException . Finally, the major (huge) annoyance is that I have to write all that crappy code every time I deal with this construct. Frankly, I wouldn't blame you if you'd rather deal with NoPE.

Making it Optional in Scala

In Scala, equivalents to Option/Some/None already exist3 - but they're far, far easier to use and they're "sealed" so no logic breaking subclasses are allowed. Here's the last Java example translated into Scala

trait Spaceship {
  def getMartian: Option[Martian]
}

val spaceship = getSpaceshipSomehow
spaceship getMartian match {
  case Some(m) =gt; {
    m dieDieDie;
    println(m + " was terminated.")
  }
  case _ =gt; println("I'll be back.")
}

The Spaceship trait requires all spaceships to implement a getMartian method which may return Some[Martian] or Nothing. Once we have a spaceship we do a match to see if we got a Martian or not, and if we did then it's...retired.

The first thing you might notice is less annotation of types. Let me assure you that the code is just as statically typed as the Java version, it's just that the Scala compiler can figure out the types of "spaceship" and "m" just fine, thank you very much. You may also notice a lot fewer dots, parens, semicolons etc. Scala's syntax allows you to type all that line noise but generally only requires it when things get ambiguous. Finally, in Scala, square brackets work more or less like angle brackets in Java/C# for denoting "generics."

More importantly notice in the "Some" case how cleanly the Martian, m, is "pulled out". The code is easily as clear as the Java version that checked for null. But unlike the null checking Java code it's clear in this case that I have to do the check. If the check wasn't required the code wouldn't even compile. Unlike the Java versions of Option, the Scala version ensures that I've done my checking right.

Confessions of A Leaky Abstraction

Now a bit of a confession. You see, one of Scala's design goals is to interoperate with Java and .Net seamlessly. That's a great source of strength since the libraries for those platforms are huge, the underlying VMs are heavily optimized, and targeting those environments may be a way for Scala to sneak in through the back door of corporate America.

It's also a source of holes. In order to be truly seamless Scala has to have both null and casting. As a result, Scala's Option ends up with some of the same holes as we found in the Java subclassing code. If Scala had tried to plug the null and casting holes, the result would have been an impedance mismatch between Scala and the target VM environments.

Still, the Scala libraries use Option whenever a "null" could be used and Scala projects are following that pattern. As long as you never ever create a cast or a null, you'll only have to worry about NoPE when dealing with existing Java or .Net code. Which is to say, a lot <sigh>. So even though it's not the static guarantee that I want, Scala's Option is incredibly useful since it's just as easy to use as null and far more expressive 5

A Closing Note With Some Rabble Rousing

In Scala, Option is a class and that means it can have methods. One of those methods is a nifty shorthand for getting Some value is available or getting a default value if None. Imagine my Muppet bloodlust (stuffinglust) is so high that if the spaceship doesn't have a Martian, I'll create a new one just to kill it.


val m = spaceship.getMartian.getOrElse(new Martian)
m.dieDieDie

Due to another feature of Scala, optional laziness, a Martian is only constructed when needed. Think of it the way an expression might or might not be evaluated in Java/C# when you use logical or. For example, in Java/C# "m = a() || b()" the b() method won't be called if a() is true. Laziness just generalizes this concept.

Option can also be chained with other Option returning expressions using orElse. If I have 3 ships, s1, s2, s3 and I want to get the first available Martian (if any) then

val o=s1.getMartian.orElse(s2.getMartian.orElse(s3.getMartian))

Again, laziness ensures that we only keep looking for Martians until we find one.

Finally, in Scala, Option is a monad. I won't explain all about monads in this article, but I do want to show that Option brings far more power than what I've shown so far. Imagine that from a spaceship I want to get a Martian, get the Martian's ray gun, see who the ray gun's target is, and tell the target to duck. In other words, in Java/C# I want spacehip.getMartian().getRayGun().getTarget().duck(). A nice, easy, one liner. Cool, but now imagine the Martian, the ray gun, and the target are each optional and my rule is that if any of them are null then nobody needs to duck and for God's sake no NoPEs. Urk! In Java/C# you could chain together a bunch of logical ands


if (spacehips.getMartian() != null
  && spacehip.getMartian().getRayGun() != null
  && ...

Or you could pile up a series of ifs

RayGun r = null;
Target t = null;
Martian m = spacehip.getMartian();
if (m != null) {
  r = m.getRayGun();
}
if (r != null) {
  ...

Either way, it ain't pretty. Because Option is a Monad in Scala the code looks like this

for (
  m <- spaceship getMartian;
  r <- m getRayGun;
  t <- r getTarget
) t duck

It might not be clear just yet how the "for" keyword is doing this magic, but you have to admit the code is eminently readable. Far more readable than the Java/C# alternatives.

Hopefully this has convinced you that null really should be considered harmful. Just as it's become accepted that there are better alternatives to "goto" there really are better alternatives to "null."

So go spread the word! End the NoPE madness! Monads are the keys to defeating our Martian overlords!

Yip, yip, yip, yip.

Footnotes

1. My apologies to Sesame Street. Realy, sorry, Muppets.

2. There are several languages with the same solution including most notably Haskell. But don't be scared - I won't use the word "monad." Oops, I just did. Also I used it in the title and in the closing section. Oh, and in the next footnote.

3. In Haskell, the Option monad is called Maybe. 4

4. Monad. Just wanted to say "monad" again.

5. Haskell and OCaml don't have these holes since they have neither null nor casting.

Thursday, August 16, 2007

The Kingdom of Nerbs

In his famous rant Execution in the Kingdom of Nouns, Steve Yegge hilariously excoriates Java for being so "noun oriented" that it's difficult to express ideas that are simple to express in functional programming languages. Scala is a language for the JVM that attempts to unify the funcional and object oriented worlds using first class functions, pattern matching, and what I'm going to call nerbs.

First, a short attention span theater version of Steve's post.1

Once Upon A Time In The Land of JVM

Nouns(together): You there, verbs, get encapsulated at once!
Verbs: Oh please set us free.
Noun1 (implements an interface with only one verb): I am a verb. I am a functor!
Nouns(together): Hahahaha, what a clown.
Citizens of functional lands: Look how much power our verbs have.
Verbs: Free, free, set us free.2

So what was to be done for the poor denizens of the JVM?

To Make a Nerb

There are a bunch of language implementations that support functional programming on the JVM to one extent or another (Cal, Jaskell, Kawa, SISC, Jython, JRuby, Groovy, Rhino, ABCL, hey guys, 'sup?). Unfortunately, none of them are named Java.

For this post I'm going to pretend that Scala is the One True Answer(TM). Other answers will require their own fairy tales er blog entries. Scala is interesting because it not only supports "mere" first class functions, it also supports nerbs.

So what is a nerb?3 Good question, I'm so glad you asked. A nerb is a noun that has been verbed. If you don't believe me, go Google it. But before you do, realize that Google is a publicly traded corporation that makes oodles of cash from selling paid advertising using free web search as a loss leader. But in saying "Google it" I'm not referring to the noun that offers free donuts to its programmers, I'm referring to the verb that this noun is most well known for enabling. Hence the ever so slightly tautological definitions "Google: (verb) to search the web using a web search engine provided by Google - (noun) the company that lets you Google(verb)." 4

Before I continue, let me say that the word "nerb" does not from my close scrutiny seem to appear in any portion of the Scala language specification. But it should.

Back to my butchering of Steve's fairy tale. The verbs are clearly functions and the nouns are clearly objects.5 So a nerb would be an object that's been turned into a function. But in my version of the fairy tale, the verbs all laughed at the noun um object pretending to be a function. Why? Well, because while such beasts work they're just so, so, ... verbose and awkward. Here's one in Java:

public interface Transformer<P, R> {
  public R execute(P param)
}

public class NumberToFrenchStringTransformer 
    implements Transformer<Integer, String> {
  public String execute(Integer param) {
    switch (param.intValue()) {
      case 0: return "zero";
      case 1: return "un";
      case 2: return "deux";
      case 3: return "trois";
      default: throw new 
        RuntimeException("Je ne sais quoi");
    }
  }
}

public class Test {
  void countToThree(Transformer<Integer, String> 
      numberToStringTransformer) {
    for (int i=1;i<4;i++)
      System.out.println(
        numberToStringTransformer.execute(
          new Integer(i)));
  }
  
  public static void main(String[] args) {
    new Test().countToThree(
      new NumberToFrenchStringTransformer());
  }
}

Transformer is a generic interface that requires two types to fully specify: a parameter and a return type. The countToThree method requires a Transformer from Integer to String. The goal is that another Transformer could allow counToThree to speak in some other language, but this code only implements French. The main method hooks everything together.

For a toy example this may not seem so bad. But consider: every time you want to turn an existing function into an object you recreate a significant amount of boilerplate. Imagine writing a significant portion of a real application this way. I could have made it slightly more concise by using an anonymous class, but that would have been a minor change. Let's clean things up for the Scala interpreter.6

def numberToGermanString(n: Int) = n match {
  case 0 => "null"
  case 1 => "ein"
  case 2 => "zwei"
  case 3 => "drei"
  case _ => throw new RuntimeException(
    "Ich bin nicht ein Berliner")
}

def countToThree(
    numberToString: Int => String) =
  for (n <- List.range(1,4)) 
    println(numberToString(n))

countToThree(numberToGermanString)

The play by play: numberToGermanString is a function from integers to strings, countToThree just happens to require such a function, and the last line makes everybody go to town. It's concise and exactly to the point. In a lot of ways it's similar to any other language with first class functions. Again, I could have used an anonymous function but that would have been a minor change. Still we haven't created a nerb - at least not obviously. First class functions are more like gerunds - verbs turned into nouns.

To motivate the desire for a nerb, I ask you this: what else, when given an integer can return a string? Well, how 'bout an array of strings? I can hear you now, "yeah, but an array is a noun not a v.... hey...wait a minute..."

def countToThree(numberToString: Int => String) =
  for (n <- List.range(1,4))
    println(numberToString(n))
  
val words = 
  Array("none", "one", "a couple", "many")
  
countToThree(words)

Yup, in Scala an Array is a nerb. Clearly, an Array is a noun. But we didn't change countToThree a bit and it took the array - the array was verbed. Cool, eh? In scala, getting the 3rd element from the words array just do words(3).

So you might be thinking that arrays must get some special treatment, then, some special syntactic sugar all their own. Nope. You can sprinkle that sugar anywhere you want and turn any of your classes into nerbs.

How? Well, much like you would in Java but you use interfaces (well, traits) that Scala provides.

object NumberToSpanishStringTransformer() 
    extends Function1[Int, String] {
  def apply(n: Int) = n match {
    case 0 => "cero"
    case 1 => "uno"
    case 2 => "dos"
    case 3 => "tres"
    case _ => throw new 
      RuntimeException("Soy un sustantivo")
  }
}

def countToThree(
    numberToString: Int => String) =
  for (n <- List.range(1,4)) 
    println(numberToString(n))
  
countToThree(NumberToSpanishStringTransformer)

Obviously in this case it would be cleaner if I just used a function instead of creating a nerb. But the point here is that I CAN create a nerb - which is nice when I have a more complex class like Array. The "object" keyword means I'm creating a singleton. Function1 is an interface (trait in Scala terms) that is parameterized by two types. Think of it like a generic in Java or a template class in C++ but using square brackets instead of angle braces. The trait requires me to implement one method: apply. In other words, it's pretty much the same as the Java implementation above, except that I was able to treat my object as a function.

The Nerbs United Shall Never be Divided

Earlier I said of our first class function that we had not created a nerb "at least not obviously." Here's the punch line to my cryptic hint: whenever you use a function in a first class manner Scala implicitly creates a nerb. So this code

def numberToGermanString(n:Int) = ... 

def countToThree(numberToString: Int => String) =
  for (n <- List.range(1,4)) 
    println(numberToString(n))
  
countToThree(numberToGermanString)

translates into something along the lines of

def numberToGermanString(n: Int) = ... 

def countToThree(
    numberToString: Function1[Int, String]) =
  for (n <- List.range(1,4)) 
    println(numberToString.apply(n))
  
val $anonymousNerb = new Function1[Int, String]{
  def apply(n: Int) = numberToGermanString(n)
}

countToThree($anonymousNerb)

The byte code generated won't be exactly that but the the gist will be the same: underneath the hood Scala turns first class function creation into nerb creation. This might just seem like just an implementation detail, but it turns out that the Function1 trait provides additional methods for creating function compositions and those methods can be used anytime you create a first class unary function. It's also how Java code can interact with Scala first class functions.

The Grand Unification

In Scala, classes and objects can be turned into nerbs using the object oriented technique of inheritance. Functions and methods can be turned into gerunds using typical functional programming techniques but the gerunds turn out to be nerbs.

Nerbs are a key part to how Scala unifies the object oriented and functional paradigms. There's much more to the unification story including closures and pattern matching. I'll save those for another time.

Once Upon a Time in the Land of Scala

Unlike the land of Java the land of Scala was all peace and harmony. Verbs could be nouned and nouns could be verbed as needed. Together they held hands and sang the songs of nerbiness.

Footnotes

1. My apologies to Steve. Really, sorry dude.

2. My apologies to The Police. Really, sorry dudes.

3. Due credit - I stole the word "nerb" from Nancy Allison's "Every noun can be...".

4. To be madly self referential I would say the word "Google" has been nerbed.

5. Actually, the fairy tale draws the analogy of nouns being both objects and classes but Steve didn't make much distinction and I didn't either. I also won't make the distinction between functions and methods. Please just read on and pretend I know what I'm analogizing.

6. For compiled Scala we just need to wrap all the code in an object that extends Application. I know, I know, it's ironic after all my talk about setting verbs free, but please bear with me until I can talk about singletons.