[Ruby] Equal is not always equal


The reason for this blog article is a question that dealt with the different ways of checking equality in Ruby, or more specifically the so called “threequals” operator method. You might have come across it, it is the three equals sign ===, a very Ruby specific thing. Even though everybody calls it operator, I think for the understanding it is crucial to be specific here – “threequals” is a method (of an object), not an operator. I stress this, as this is not true for other object oriented languages, such as Java, in which all infix operators (such as ==) are simply part of the language, i.e. their definition exists outside the object world.

After the before mentioned question I struggled myself of finding any good explanation of the different methods Ruby provides of comparing things, which is why I decided to write down the things I told him; and additionally place it into the context of all four methods, Ruby offers. Yes, there are four ways of comparison, which is twice as much as languages such as Java or Smalltalk offer (and think of C++ which just knows just the == comparison). To fully appreciate the differences, let me start of with the first and typical stumbling block, every novice programmer encounters: value equality vs. reference equality (skip this if you are familiar with the concept).

Identity vs. Equality

Let me start of with Java, because here we only have two operations and it will become clear, why Ruby does some things different.

When you look at non-object oriented languages, there is just one comparison and you do not really need to worry about equality vs. identity. If you write 1 == 1 then there is just one number; each number exists just once – there are no two 1s that have different identities and might even differ from each other. Same goes for any literal – and as we are in an non-object oriented world, we just have literals whose values we save into memory and have variables point at them. Numbers are immutable – a 1 will always stay a 1 and never become a 2. If we want a 2, we delete the 1. If we have more than one variable whose value is 1, they could basically all point to the same value in memory, because that value is immutable, and any value change of just one variable can be implemented as changing the memory cell to wich that variable points if all others still need to point at the 1.

This allows for fast and memory efficient code, which was especially important in the old days where memory and cache capacity was much smaller, and accessing them much slower. Java on some level adapted that philosophy, and therefore literals and numbers are treated as immutable, singular unique entities, while everything you build and some data types (such as Strings) are (also) objects. And that lead to an interesting problem, which is shown in this code example:

1 == 1;               // true
"test" == "test";     // true
/* BUT */
String a = "test";
String b = "test";
a == b;               // false

So, actually Javas comparison operator, that it inherited from the C language does not really check for equality, because obviously string a and string b are equal. Javas == should rather (and always) be read as “a is identical to b”. And while for the first two examples this is true, because all numbers are unique, immutable entities in Javas language, and same goes for all literals, in the last example we explicitly construct two objects. And for any two object goes that even if they are in any regard similar, they are as monozygotic twins: They look identical but have different identities.

As it turns out, Javas == has actually nothing to do with equality, even though it uses the equals sign. To check the equality, Java implemented the equals(Object)-method that is inherited from Object so every class you write actually inherits it. equals(Object) checks whether two Objects with different identities actually are similar, by checking the different (relevant) values of each Object.

In Ruby Double-Equals Equals Equality

Now that we freshened up on equality vs. identity, lets look at Ruby. Here, == actually means equality. So:

1 == 1                  # true
“test“ == “test“        # true
# AND
a = String.new("test")
b = String.new("test")
a == b                  # true

Identity is checked in another way (see below). Personally I find this neater than the Java way, because when you see the =-Sign, you always automatically think about equality, not identity. As said previously, even experienced programmers will say “a equals b”, when they read a == b, which strictly speaking is totally wrong.

It is feasible for another reason: In most cases you are actually not interested in identity. However, equality is something you often need. So a language that uses a long method, such as a.equals(b) instead of a == b forces you to type much more for something fairly common. Java gets away with it, because most time you actually compare a variable to a literal or you check numbers and for them it does not matter if you check identity or similarity (and in fact you cannot check similarity of literals, as in Java they are no objects and therefore have no equals(Object)-method).

Threequals, Triple Equals, or Case Equals operatormethod

I’ve been struggling myself to explain this method yesterday and had to look some things up, which is how I came to write this article because the reasons and explanations for this method that is unique to Ruby that you find on the internet are mostly horrible.

First off, a common misconception about the infamous === “threequals” method (this seems to be the most catchy and famous name for it) is that it is probably identity checking, as == is equality. One could not be more wrong. Especially if you look up the API for different Ruby classes you will see that in most cases, threequals just references to equals (e.g. String#===). So if it does exactly what the above described equals does, why the extra method? What do you or Ruby need it for? The case equals actually tells you what Ruby needs it for: It is the method that Ruby internally uses for comparison when using the switch-statement. To freshen up: switch-statements are actually syntactic sugar for if-then-else-statements (they are a bit more, as some compilers may use optimizations for all the case-lines, such as using lookup tables – but let us not worry about that right now). Take this Java example:

if(current_month == 1) {
     current_month_str = "January";
} else if(current_month == 2) {
     current_month_str = "Febuary";
} else if(current_month == 3) {
     current_month_str = "March";
} 
...
} else {
     current_montht_str = "Invalid month";
}

This can be written in a more expressive switch-statement:

switch(current_month) {
     case 1: current_month_str = "January";
             break;
     case 2: current_month_str = "Febuary";
             break;
     case 3: current_month_str = "March";
             break;
...
     default: current_month_str = "Invalid month";
}

With Java, the case <nr> internally translates to if (current_month == <nr>). In Ruby it would translate to if current_month === <nr> and the reason is, that you now can rewrite any objects comparison in the switch-statement to makes more sense while not changing the wanted equality of those objects if explicitly used.

If you are not interested in details, this is all you need to know. In 99% of the time you will never need to use === manually, and when it comes to switch-statements, everything is handled internally. So, if you are not interested just skip the rest and read on at eql? Stops Equal from Equalizing. Yet, I believe that the threequals stuff, once understood, is actually pretty fun and neat, and it will empower you to use switch-statements in a more powerful way than in other languages.

To really understand === one should first forget everything about equality, because you will subconsciously try fit the threequals method into your understanding of equality and assume properties that threequals does not fulfill. When you think of equality, you’ll assume that it is reflexive (i.e. for every object, comparing that object with itself will result in a true statement, i.e. 1 == 1), that it is commutative or symmetric (i.e. if a == b, then b == a) and that it is transitive (i.e. if a == b and b == c, then a == c). However this is not true for the threequals (especially refexivitiy and commutativity are not a given), which why a bunch of people suggest to call it the “subsumption” operator, instead of using equals in the name at all.

Subsumption method

Although I would have chosen something different, subsumption is catching on, so I will use it as well – however I’ll keep on insisting on the method, because the power of the subsumption (or threequals or what you might want to call it) actually comes from the fact, that it can be overwritten for each class that inherits it (same actually goes for the == in Ruby, but you shouldn’t mess with that). So we have already established that for most objects, it is the equals operator anyway. But you can also apply it to other elements and get interesting results. Let’s just take a quick look at the String class:

"test" === "test"          # true
String === "test"          # true
"test" === String          # false
String === String          # false

This might look strange as first – the first line is clear, as for String objects === references ==. The rest can be best described by the approach from Jörg W Mittag, who explains x === y this way:

If x described a set, would y be a member of that set?

This also is described by subsumption – a term from classification meaning that one category is positioned underneath another category.

Subsumption for Classes

So now that subsumption is clear, the above code should make more sense: Any object that is an instance of class String is in a category beneath the category String. Or to put it in Mr. Mittags words: If the String class described a set, all String objects would be members of that set. That set does not include the String class itself. And here it also becomes clear, that the subsumption method cannot be symmetric:
While “test” object is an element of the String set, the String class is not an object of a “test” set.

But what is won with that? There are other ways to find out the class of an String object:

"test".instance_of?(String)     # true
"test".class                    # String

However, lets say that according to the class that an object is an instance of, you want to react differently. Without the subsumption, the only way doing it using switch-statement would be:

case x.class
when Fixnum.to_s
  puts "It's a number"
when Bignum.to_s
  puts "It's a big number"
when String.to_s
  puts "It's a string"
end

So instead of testing what x actually is, you are working around it by checking what the class of x is (returning a string), and then comparing it to the string representation of every class – this is the way to go, if you’d be writing this in Java. But as you now know, you could write it more elegant this way:

case x
when Fixnum
  puts "It's a number"
when Bignum
  puts "It's a big number"
when String
  puts "It's a string"
end

I have to confess that though it is somewhat shorter, I wasn’t so psyched either – pretty much stuff for pretty little return, if you’d asked me. But now comes the fun stuff.

Subsumption for Ranges

Let me start with an example that in Java could not be written as a switch-statement. Say you want to have different temperature regions. -5 – 5°C is freezing, 5°C – 15°C is cold, 15°C – 20°C is cozy, 20°C – 25°C is warm, 25°C – 35°C is hot.

Normally the only way to solve this is as follows:

if(temp >= -5 && temp < 5) { 
     feels = "freezing"; 
} else if(temp >= 5 && temp < 15) { 
     feels = "cold"; 
} else if(temp >= 15 && temp < 20) {
     feels = "cozy"
}
...

To do it in a switch-statement, one would actually have to check for each number in each range. However, in Ruby we have range objects to express ranges, and the subsumption method is defined in a reasonable way, so whenever a number is inside a range, the method will return true.

case temp 
when -5..5
  feels = "freezing"
when 5..15
  feels = "cold"
when 15..20
  feels = "cozy"
...

So, the implementation of subsumption that you can read out of that code is: (m..n) === x # => true, if x is in the range. See how much more elegant and expressive this code is?

Subsumption for Regular Expressions

Another cool thing in Ruby is that regular expressions are actually objects. This alone makes Ruby extremely powerful when it comes to String processing. And they redefine the subsumption method, so that any matching string returns true. Use that in a switch-statement and you get something extremely powerful:

case number_in_words
when /^*two$/
  puts "Numbers last digit is a 2" 
when /^*three$/
  puts "Numbers last digit is a 3"
when /^*four$/
  puts "Numbers last digit is a 4"
...

Subsumption for Procs

This is pretty cool as well. If you have some fancy predicate, you can write it down as proc. Procs are objects representing (anonymous) functions, and procs overwrite the subsumption method to return true whenever the value returns true. Here is an example:

case x
when lambda {|x| x < 0} 
  puts "It's negative" 
when 0 
  puts "It's zero" 
when lambda {|x| x > 0}
  puts "It's positive"
end

This way any other predicate for comparison can be used in a switch-statement, making the switch-statement as powerful as the if-statement. However, I wouldn’t recommend it as a regular practice, because in such cases the if-statement is more expressive:

if x < 0
   puts "It's negative"
elsif x > 0
   puts "It's positive"
else
   puts "It's zero"
end

However, if most of the time you’d check for simple equality or subsumption and only in a few of these cases you’d like to be able to compare it with another predicate, proc subsumption will show it’s powers.

Overwrite subsumption for your own Classes!

The above are just some examples where the Ruby library shows us that separating equality form the “equality” case uses can be pretty neat for creating compact yet expressive code. Even though most objects simply redirect to equality, there might be a lot of special situations where you might want your own subsumption method, so just override them. Here is a (rather bad) example:

class employee
  def initialize(employees=[])
    @supervisor_of = employees
  end

  def ===(employee)
    employees.include? employee
  end
end

Say you have a class representing employees at a company, and some of these employees have supervisor function, and a list of employees he supervises. The subsumption method returns true if an employee is supervised by a person. So a switch-statement could look like this

case employee
when richard
  supervisor = "Richard Hendricks"
when erlich
  supervisor = "Erlich Bachman"
when gilfoyle
  supervisor = "Bertram Gilfoyle"
when dinesh
  supervisor = "Dinesh Chugtai"
end

eql? Stops Equal from Equalizing

eql? is a tricky method, that in generally behaves just like ==, so again we have another equality method. However, eql? is to be understood as a stricter version of equals. What this does can best be shown in this example:

1 == 1.0    # true
1.eql? 1.0  # false

Why is that? 1 == 1.0 does check equality, and even though one might argue that 1 and 1.0 are different (because one of them is a point number and the other one is not), actually the values are the same. Ruby understands that and therefore the equality check evaluates to true. However, the more stricter eql? takes that into consideration. This is done by using the hash-function every object has. Even though the same values, the hash that 1.0 creates is a totally different hash that 1 creates, and therefore, these are not the same.

Numbers are not the only objects where eql? does not behave like ==. Another thing are keys. In Ruby a key is part of the so called Hash list, that other programming languages also call dictionaries, or associative arrays – a storage of key-value pairs, where keys are (different to normal arrays that only have numbers as keys) anything. However, comparison for keys of a Hash list is done with eql? (so actually if you had two keys, one being 1 and the other 1.0 they would be different keys, referring to different values). If you wrote your own class that should be used as key, make sure, eql? and hash methods are implemented accordingly.

class MyKey
  attr_reader :key

  def initialize(key)
    @key = key
  end

  def eql?(comparison_key)
    key == comparison_key.key
  end

  def hash
    key.hash    # Call hash method of type that MyKey wraps as key
  end
end

If you left out the definition of eql? or hash than the following would happen, rendering your MyKey class useless as key for hashes.

a = MyKey.new('key')
b = MyKey.new('key')

dict = { a => 'value' }
dict.has_key? a          # true
dict.has_key2 b          # false, even though mk1 and mk2 are equal in value!

equals? is Not Equal

Up to now I’ve been praising Ruby a lot. I actually really like the subsumption method, and I think it is not that bad to have a more general equality and a more strict one to choose from (though I would have preferred a method name along the lines of hash_equals? which is more expressive than eql?). Now comes the in my eyes really huge disappointment.

If you’ve counted up to now, we have three equality functions and no identity function, so the last one should be an identity function. And yes, it is. And it does, what you would expect it to do – it returns true only in the case of two variables pointing to exactly the same object in memory.

However, it is called equals?. And that is really really straight-out-of-hell awful. First, because there is no way that only by the name you would come to the conclusion of what this method actually does… the method name strongly suggests checking for equality. Second, any Java programmer switching to Ruby is screwed, because == and .equals? are in both languages and mean the exact opposite. I was a long time Java programmer and recently used a lot of Ruby – for some years now. But still I regularly struggle here.

A sensible name would have been identical_to? or something along those lines.

Anyhow it is, what it is, and maybe “equal is not equal” might help you remember this, if you are one that is also struggling with this.

Summary

To wrap it up, Ruby – unlike other programming languages – comes with three equality and one identity method:

  • == is equality. Fairly reasonable name/symbol!
  • === is subsumption. Similar to equality (and often the same), but in a logical sense not an equality relation! Yet extremely powerful for expressive switch-statements, therefore also known aus case equality
  • .eql? is strict equality based on an objects hash and not its values. Poor choice of naming – think of it as hash_equals?
  • .equals? is identity. One of Rubys biggest and most awful naming choices!
Advertisements

Please comment. I really enjoy your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s