Skip to main content

Why using XOR might not be a good hash code implementation?


Using XOR for computing hash codes works great for most of the cases specially when order of computation does not matter. It also has the following benefits:
  • XOR has the best bit shuffling properties of all bit-operations and provides better distributions of hash values.
  • It is a quick single cycle operation in most computer 
  • Order of computation does not matter. i.e. a^b = b^a
However, if ordering of elements matter then it is often not a good choice.

Example
For simplicity consider you have a class with two string properties named Prop1 and Prop2 and your GetHashCode returns the xor of their hash code.

It will work fine for most of the cases except cases where same values are assigned to different properties. It will generate same hash-code i.e. collision in that case as can be seen in the below example.


However, using the modified approach as recommenced by Joshua Bloch's in Effective Java which uses prime multiplication and hash chaining provides more uniform distribution and a different hash-code.

It is a kind of mystery why the above approach provide better distribution and performance. According to Joshua, he tested some sample of prime numbers and found that 31 gave the best distribution over all possible Strings from Merriam-Webster's 2nd Int'l Unabridged Dictionary.

However, I think by multiplying using a prime number (which are unique by definition) and then combining the result we are increasing the probability of our hash code being unique and hence reducing the probability of a collision. Less collision means less elements in a bucket and hence efficient search. Also, I like the following comment from  Erickson which provides a better explanation of why choosing 31 performs better:

By multiplying, bits are shifted to the left and thus uses more of the available space of hash codes reducing the collision. Also, by not using a power of two, the lower order rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.
The expression n * 31 is equivalent to (n << 5) - n. and can be computed efficiently

It is important to note that, modern computer are very efficient with big multiplayer and you might get better result by using a large prime number compared to 31 depending on your input size.


Reference
  1. Jon Skeet's blog
  2. Why does Java's hashCode() in String use 31 as a multiplier? [link]




Comments

Popular posts from this blog

Creating dynamic email templates using C# and Office Outlook

It is quite common for many applications to send automated email notifications. Couple of months ago, I have worked on improving our old email template format to make it more user friendly.
In this tutorial I will walk you though regarding how I took advantage of Microsoft Outlook to quickly generate custom email template and later using the html template for building an automated custom email application using C#.

Steps:Creating Templates: Using the rich text editor support  in Outlook create a nicely formatted email. Use placeholder text for the values you like to change dynamically based on your task completion status.

To keep this tutorial simple, I have created a  simple table with placeholder text inside the third bracket [place holder text]. However, you can use anything supported by outlook editor.Getting HTML code: Send the created email to your own address. After that, open the sent email and right click to view source. It will display the HTML source of the email body with …

Book Review - The Phoenix Project

Here at Skybox Labs, we do regular lunch and learn session where a fellow colleague present on topics ranging from clean code, continuous integration, game development, machine learning to almost any areas where there are reasonable interests.

One of the very recent lunch and learn series that I have attended was focusing on DevOps which got me interested to learn more about the topic. I was looking for recommended books and the lead presenter highly recommended that I start with 'The Phoenix Project' by Gene Kim

Being inspired by the stellar reviews in Amazon, I have decided to get a copy and read it over the weekend.
A fantastic book that contains a wealth of information and delivers it in an intelligent and interesting way; a story. The book successfully captures the events and struggles of most people who work in IT Operations and gives a very good explanation on why these problems exist, and how you can solve them. It portrays a very effective way of thinking in applying …

Persian Music - Homayoon - Nemitonam English lyrics

I love this song. I have requested one of my Persian friend to translate it for me and she did really nice job.. I am sure you will love it..
"

"The person who was the only person I had,
was the only refuge of my lonely heart,
Left me alone and went from my side
I am restless form the pain of her separation

I thought she stays with me
sings love song for me
I thought she understand my words
I didn’t know she’s unkind

Though gone, but still
I am full of her love
Her thought is always with me
Wherever I go, she is in front of my eyes, in front of my eyes
I want to stand out
find a way to reduce my pain of her separation
But it is not possible, There is no way
I cannot bear
I cannot bear

The person who was the only person I had,
was the only refuge of my lonely heart,
Left me alone and went from my side
I am restless form the pain of her separation

I thought she stays with me
sings love song for me
I thought she understand my words
I didn’t know she’s unkind

Though gone, but still
I …