Using XOR for computing hash codes works great for most of the cases specially when order of computation does not matter. It also has the following benefits:
- XOR has the best bit shuffling properties of all bit-operations and provides better distributions of hash values.
- It is a quick single cycle operation in most computer
- Order of computation does not matter. i.e. a^b = b^a
Example
For simplicity consider you have a class with two string properties named Prop1 and Prop2 and your GetHashCode returns the xor of their hash code.
It will work fine for most of the cases except cases where same values are assigned to different properties. It will generate same hash-code i.e. collision in that case as can be seen in the below example.
However, using the modified approach as recommenced by Joshua Bloch's in Effective Java which uses prime multiplication and hash chaining provides more uniform distribution and a different hash-code.
It is a kind of mystery why the above approach provide better distribution and performance. According to Joshua, he tested some sample of prime numbers and found that 31 gave the best distribution over all possible Strings from Merriam-Webster's 2nd Int'l Unabridged Dictionary.
However, I think by multiplying using a prime number (which are unique by definition) and then combining the result we are increasing the probability of our hash code being unique and hence reducing the probability of a collision. Less collision means less elements in a bucket and hence efficient search. Also, I like the following comment from Erickson which provides a better explanation of why choosing 31 performs better:
By multiplying, bits are shifted to the left and thus uses more of the available space of hash codes reducing the collision. Also, by not using a power of two, the lower order rightmost bits are populated as well, to be mixed with the next piece of data going into the hash.
The expression n * 31 is equivalent to (n << 5) - n. and can be computed efficiently
It is important to note that, modern computer are very efficient with big multiplayer and you might get better result by using a large prime number compared to 31 depending on your input size.
Reference
Comments