Java just won’t manage Non-Primitive Numbers


I just stumbled over something really funny; something that would fit to the Watman lightning talk by Gary Bernhardt. As you may know, Java is jet another language that copied a really great programming language of the visionary Alan Kay, named Smalltalk. In Smalltalk everything is an Object (yes, really really everything! So even classes are objects, describes by meta-classes – even creating a subclass is done by sending the superclass a message, to which it replies with a subclass!) and all programming is done by objects sending and receiving messages, and answering to them. This is why in OOP, you don’t have functions but methods. A function can be called – it can be applied to values. So you actually define what to do. In OOP, an object decides on how to properly react on a message. It does so by looking up it’s methods on how to react on the received message. But how it will react is totally up to the object, it’s even possible that many different objects reply to the same message by using different methods; a function on the other hand is unique. But the key idea is that while in imperative programming you see and apply the function, in object-oriented programming this is a black-box to you (unless you programmed the method).

But I digress. Coming back to Java, when it was developed, it wanted to implement OOP, but on the other hand it also wanted to keep up with the speed popularity of C. So not only did the syntax change to be more C-like, but also a lot of things that work purely object-oriented in Smalltalk where implemented as they are in C. This also applies to numbers, they are primitives; values that lie in RAM, values on which functions are applied to. Now, this may sound totally normal to us – we don’t think of a number as an object, which we ask to do something, and then see if it does it, or not. But with Smalltalk, that’s exactly how it’s done:

1 class
=> SmallInteger

1 resondsTo: #+
=> true

Other pure object-oriented languages, such as Objective-C or Ruby (example below) behave in a similar way:

1.class
=> Fixnum

1.respond_to?(:+)
=> true

So how does it work? + is a method, same as any other method – with the exception that it is written in a special way, so that it can be used more human-readable, in an infix notation. #+ and :+ are symbols in Smalltalk/Ruby, which are used to identify the method. Say you have a method called println(), the symbol would be :println in Ruby or #println in Smalltalk. Ruby allows us to send a message to a object not only by naming the method, but also by using the send()-Method, that every object understands, and where the first argument is the the symbol of the method and the following arguments are arguments accepted by the method. So, here it should become obvious that + is actually a method and the second number (another object) an argument:

1.send(:+, 2)
=> 3

The Answer to the message is actually a new object. Not so in Java. Primitives are non-object values

System.out.println(1.getClass());

=> Unresolved compilation problem:
Cannot invoke getClass() on the primitive type int

This has of course already mentioned speed advantages. On the other hand mixing objects and primitives also leads to problems – Objects are not primitives and vice versa. Java, for instance, ran into problems when implementing Collections and introducing Generics: LinkedList<int> list = new LinkedList<int>(); is not possible, as Generics only apply to objects. LinkedList<Object> list = new LinkedList<Object>(); on the other hand wouldn’t work on primitives, as a primitive is not an object. Now in many cases you’d really want to use a Collection, as those are the only data structures in Java that allow dynamically changing array-structures at runtime. This is why Java introduced Wrapper Classes and Autoboxing. Wrapper Classes, such as Integer, contain a primitive in an object:

Integer a = new Integer(7);
System.out.println(a.getClass());

=> class java.lang.Integer

So now also niceties, such as sending a number a message so it prints it’s value as Binary or Hexadecimal are possible; this was quite a voyage before. Still it’s a bit icky to always manually having to wrap the values into objects, which is where autoboxing comes in. So instead of writing the above we could also write

Integer a = 7;

Now a is of type Integer, and 7 is a primitive, but as this primitive is to be saved in a, 7 is “autoboxed” into an anonymous Integer-Object, thus an Integer object with value 7 is assigned to a new Integer object variable (a) – the expression is valid. A core idea is that Integer-objects should behave same as int-primitives.

But now, let’s finally have some fun. Executing this code:

int a = 127;
int b = 127;

System.out.println(a <= b);
System.out.println(a >= b);
System.out.println(a == b);

As you would expect, you’ll get three times true printed to your Standard Out. Same should happen, if we do the same using Integer-Objects:

Integer c = 127;
Integer d = 127;

System.out.println(c <= d);
System.out.println(c >= d);
System.out.println(c == d);

And it does. Now what happens if I instead of 127 (what a portentous value!) use 128?

Integer e = 128;
Integer f = 128;

System.out.println(e <= f);
System.out.println(e >= f);
System.out.println(e == f);

You should probably have gotten: true, true, false.

Wat?!

Now we know from Maths:

(a ≥ b) ∧ (a ≤ b) ⇔ a = b

So why is Java saying they are not equal? Of course! You may have heard of the equality vs. identity problem. If you do object-oriented programming, an object can be referenced by many variables. One would say the values of the variables are identical, they all point to the one object. This object is not copied, anything done to it by using the first variable will be reflected when inspecting the object through the second one. So this is what’s called identity. But what happens if you have two objects, that are created, but that have exactly the same values? In this case they are equal, but not identical.

If you have a twin, you may look totally alike – you’re hair color, eye color, your height, your weight, your birthmarks, your birthday, maybe even your likes and dislikes and your personalities might all be equal to each other. Yet each of you has it’s own identity which is different from each other, as you are not the same person. An actor on the other hand may play different roles, but each role is filled by the same person, it’s the one identity. And if said actor plays two different roles on two different sets at the same time – if the actor has an injury on set 1, and he breaks his leg (in reality, not as part of his role), his leg would be broken, even if he afterwards goes to the second set and plays his other role.

For numbers it makes no sense to separate identity from equality. 1 is 1; now there may be many different things, each of the same quantity; but then again it’s like, the amount of apples points to the value 1, as does the number of pers, while the number of grapes points to the value of 40. In a mathematical sense there is just one number of each. The equals sign might say, that two different functions or formulas will evaluate to the same value – but said value is identical to itself.

In Smalltalk and Ruby numbers are unique objects as well, which means they are immutable and all have the same ID; 1 cannot be changed to 2 (because there’s already a 2). In Java’s primitive world this is true as well. But when it comes to Javas object-oriented world, they are not! That is why the variables e and f are equal but not identical; and comparing two objects with == checks for identity, not for equality! This is fine for primitives, but it isn’t for objects. Check it out yourself, by checking the equality with equals():

System.out.println(e.equals(f));

This code – checking for equality instead of identity – will produce what you expected, i.e. the third true. And following code will show you that they are actually different objects:

System.out.println(System.identityHashCode(e));
System.out.println(System.identityHashCode(f));

Now the thing that should really deeply bother you, is why this only happens to a bunch of numbers, while the other bunch (actually all numbers between -128 and 127)  work just like their primitive counterparts do!

So I did some digging, and what I came up with is, that Java actually pre-caches the values to:

(…) yield significantly better space and time performance by caching frequently requested values.

To make use of these pre-cached values, Java suggests that you actually rather use the valueOf()-Method of the Integer class, than the actual constructor, to create new Integer objects. Wtf?!? This is like totally against Javas own convention of using constructors! One always uses the constructor! So let’s see what would have happened if we had used the constructor instead of using the autoboxing feature:

Integer g = new Integer(127);
Integer h = new Integer(127);

System.out.println(g <= h);
System.out.println(g >= h);
System.out.println(g == h);

This actually will behave like not expected (because different ot primitive behaviour), but consistent with the object-oriented world, i.e. in accordance to the examples with numbers above 127, we get true, true, false, because now we are not accessing pre-cached values but creating new objects, that aren’t identical.

Here is the original Java code, that you’ll find in your JDK Libs, under java.lang.Integer, that implements the valueOf()-Method:

public static Integer valueOf(int i) {
	assert IntegerCache.high >= 127;
	if (i >= IntegerCache.low && i <= IntegerCache.high)
		return IntegerCache.cache[i + (-IntegerCache.low)];
	return new Integer(i);
}

IntegerCache is defined as inner class of java.lang.Integer:

private static class IntegerCache {
	static final int low = -128;
	static final int high;
	static final Integer cache[];

	static {
		// high value may be configured by property
		int h = 127;
		String integerCacheHighPropValue =
			sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
		if (integerCacheHighPropValue != null) {
			int i = parseInt(integerCacheHighPropValue);
			i = Math.max(i, 127);
			// Maximum array size is Integer.MAX_VALUE
			h = Math.min(i, Integer.MAX_VALUE - (-low));
		}
		high = h;

		cache = new Integer[(high - low) + 1];
		int j = low;
		for(int k = 0; k < cache.length; k++)
			cache[k] = new Integer(j++);
	}
	private IntegerCache() {}
}

Now this makes it even more interesting. I can even change the range for positive numbers, so the behaviour actually differs from VM to VM. Of course, you’ve already guessed it – Java uses valueOf(), when autoboxing numbers, and thus Interger i = 128 is not the same as Integer i = new Integer(128) even though that’s what Java promised.

Caching the values might be good for performance, but doing it by pre-caching only certain values leading to different behaviour of actual objects of the same class – that is really nasty. If I’m not sure whether something works as I suspect it to do, I’ll normally just write up two lines to check it. So I would have just written 1 == 1 to see if I can check for identity or if I really have to use the longer 1.equals(1) – and as 1 == 1 is natural for the primitive, I’d expect it to work as object as well. I might have started writing some code, using an array of ints, looping through them and doing some comparissons.Only later I’ll realize, that I’d rather need a dynamically growing data structure and switch to a Collection-Class. Being inconsistent, the code will work for small examples – if I just have an easy test case, everything will be fine – maybe this bug won’t even show up untill extreme cases in runtime. By furthermore even basing the size on a VM-parameter, this problem could only occur on some machines, making debugging even harder. Just try it out yourself. All you need is to add XX:AutoBoxCacheMax=”300″ to your VM properties, e.g. by calling it in the console via

java -XX:AutoBoxCacheMax="300" -cp . Test

and the problem wont occur untill you autobox 301. And what would you do, if your code didn’t run? Check the standard language Libraries provided by Oracle? You wouldn’t. You’d probably debug the crap out of your own code.

Since Java 1.5 many people have been criticising different things in Java; to me this is by far the biggest downside, and yet another proof that pure object-oriented programming languages, such as Ruby or Objective C are much more worth pursuing, than Java. A good language has clear and consistent langauge features; conceptually Java never had them by abandoning Smalltalk’s ingenious idea of everything being an object. It run into problems then, and by trying to fix them it just runs deeper into trouble…

Advertisements

Please comment. I really enjoy your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s