Categories
.net algorithm gethashcode hashcode

What is the best algorithm for overriding GetHashCode?

1618

In .NET, the GetHashCode method is used in a lot of places throughout the .NET base class libraries. Implementing it properly is especially important to find items quickly in a collection or when determining equality.

Is there a standard algorithm or best practice on how to implement GetHashCode for my custom classes so I don’t degrade performance?

7

  • 45

    After reading this question and the article below, i could implement override of GetHashCode. I hope it would be helpful for others. Guidelines and rules for GetHashCode written by Eric Lippert

    – rene

    Mar 22, 2012 at 21:59

  • 4

    “or to determine equality”: no! Two objects with the same hashcode are not necessarily equal.

    Sep 2, 2015 at 22:03

  • 3

    @ThomasLevesque You are right, two objects with the same hash code are not necessarily equal. But still GetHashCode() is used in very many implementations of Equals(). That’s what I meant with that statement. GetHashCode() inside Equals() is often used as a shortcut to determine inequality, because if two objects have a different hash code they have to be objects that are not equal and the rest of the equality-check doesn’t have to executed.

    – bitbonk

    Sep 2, 2015 at 22:27

  • 5

    @bitbonk Usually, both GetHashCode() and Equals() need to look at all fields of both objects (Equals has to do this if it the hashcodes are equal or not-checked). Because of this, a call to GetHashCode() inside Equals() is often redundant and could reduce performance. Equals() may also be able to short circuit, making it much faster – however in some cases the hashcodes may be cached, making the GetHashCode() check faster and so worthwhile. See this question for more.

    Apr 2, 2017 at 3:52

  • 8

    UPDATE JAN 2020: Eric Lippert’s blog located at: docs.microsoft.com/en-us/archive/blogs/ericlippert/…

    Jan 15, 2020 at 14:06

1751

I usually go with something like the implementation given in Josh Bloch’s fabulous Effective Java. It’s fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

As noted in comments, you may find it’s better to pick a large prime to multiply by instead. Apparently 486187739 is good… and although most examples I’ve seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I’ve used numbers which apparently work well – but the initial value isn’t a prime. (The multiplication constant is prime though. I don’t know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is “good enough” and it’s incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we’re using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn’t actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled – The Art of Hashing

87

  • 8

    The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ?

    – bitbonk

    Nov 4, 2008 at 21:44

  • 14

    Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me – time to add a link to the book…

    – Jon Skeet

    Nov 4, 2008 at 21:51

  • 81

    23 is no good choice, since(as of .net 3.5 SP1) Dictionary<TKey,TValue> assumes good distribution modulo certain primes. And 23 is one of them. So if you have a dictionary with Capacity 23 only the last contribution to GetHashCode influences the compound hashcode. So I’d rather use 29 instead of 23.

    Nov 21, 2010 at 22:41

  • 25

    @CodeInChaos: Only the last contribution influences the bucket – so it might, at worst, have to look through all 23 entries in the dictionary. It’s still going to check the actual hash code of each entry, which will be cheap. If you’ve got a dictionary that small, it’s unlikely to matter much.

    – Jon Skeet

    Nov 21, 2010 at 23:14

  • 24

    @Vajda: I usually use 0 as the effective hash code for null – which isn’t the same as ignoring the field.

    – Jon Skeet

    Jan 22, 2013 at 16:49

518

ValueTuple – Update for C# 7

As @cactuaroid mentions in the comments, a value tuple can be used. This saves a few keystrokes and more importantly executes purely on the stack (no Garbage):

(PropA, PropB, PropC, PropD).GetHashCode();

(Note: The original technique using anonymous types seems to create an object on the heap, i.e. garbage, since anonymous types are implemented as classes, though this might be optimized out by the compiler. It would be interesting to benchmark these options, but the tuple option should be superior.)

Anonymous Type (Original Answer)

Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:

new { PropA, PropB, PropC, PropD }.GetHashCode();

This will work for any number of properties. It does not use boxing. It just uses the algorithm already implemented in the framework for anonymous types.

16

  • 90

    Yes, anonymous GetHashCode implementation is very effective (BTW it’s the same as the one in the Jon Skeet’s answer), but the only problem with this solution is that you generate a new instance at any GetHashCode call. It can be a bit overhead-ish in particular in case of intensive access to big hashed collections…

    – digEmAll

    Jan 8, 2011 at 9:50


  • 5

    @digEmAll Good point, I didn’t think about the overhead of creating an new object. Jon Skeet’s answer is the most efficient and won’t use boxing. (@Kumba To solve the unchecked in VB, just use a Int64 (long) and truncate it after the calculations.)

    – Rick Love

    Apr 2, 2011 at 17:30


  • 19

    VB.NET must use Key in anonymous type creation: New With {Key PropA}.GetHashCode() Otherwise GetHashCode will not return the same hashcode for different objects with the same ‘identifying’ properties.

    Aug 20, 2014 at 15:58


  • 4

    @Keith in that case, I would consider saving the IEnumerable as a list value somewhere instead of enumerating it each time the hashcode is calculated. Caclulating ToList each time inside GetHashCode could hurt performance in many situations.

    – Rick Love

    Oct 20, 2015 at 20:40

  • 7

    For those who like this, (PropA, PropB, PropC, PropD).GetHashCode() is now available on C#7 without GC pressure @digEmAll concerns. Quick and Simple Hash Code Combinations

    Aug 16, 2018 at 11:59


119

Using System.HashCode

If you are using .NET Standard 2.1 or above, you can use the System.HashCode struct. On earlier frameworks it is available from the Microsoft.Bcl.HashCode package. There are two methods of using it:

HashCode.Combine

The Combine method can be used to create a hash code, given up to eight objects.

public override int GetHashCode() => HashCode.Combine(this.object1, this.object2);

HashCode.Add

The Add method helps you to deal with collections:

public override int GetHashCode()
{
    var hashCode = new HashCode();
    hashCode.Add(this.object1);
    foreach (var item in this.collection)
    {
        hashCode.Add(item);
    }
    return hashCode.ToHashCode();
}

GetHashCode Made Easy

An alternative to System.HashCode that is super easy to use while still being fast. You can read the full blog post ‘GetHashCode Made Easy‘ for more details and comments.

Usage Example

public class SuperHero
{
    public int Age { get; set; }
    public string Name { get; set; }
    public List<string> Powers { get; set; }

    public override int GetHashCode() =>
        HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers);
}

Implementation

public struct HashCode : IEquatable<HashCode>
{
    private const int EmptyCollectionPrimeNumber = 19;
    private readonly int value;

    private HashCode(int value) => this.value = value;

    public static implicit operator int(HashCode hashCode) => hashCode.value;

    public static bool operator ==(HashCode left, HashCode right) => left.Equals(right);

    public static bool operator !=(HashCode left, HashCode right) => !(left == right);

    public static HashCode Of<T>(T item) => new HashCode(GetHashCode(item));

    public static HashCode OfEach<T>(IEnumerable<T> items) =>
        items == null ? new HashCode(0) : new HashCode(GetHashCode(items, 0));

    public HashCode And<T>(T item) => 
        new HashCode(CombineHashCodes(this.value, GetHashCode(item)));

    public HashCode AndEach<T>(IEnumerable<T> items)
    {
        if (items == null)
        {
            return new HashCode(this.value);
        }

        return new HashCode(GetHashCode(items, this.value));
    }

    public bool Equals(HashCode other) => this.value.Equals(other.value);

    public override bool Equals(object obj)
    {
        if (obj is HashCode)
        {
            return this.Equals((HashCode)obj);
        }

        return false;
    }

    public override int GetHashCode() => this.value.GetHashCode();

    private static int CombineHashCodes(int h1, int h2)
    {
        unchecked
        {
            // Code copied from System.Tuple a good way to combine hashes.
            return ((h1 << 5) + h1) ^ h2;
        }
    }

    private static int GetHashCode<T>(T item) => item?.GetHashCode() ?? 0;

    private static int GetHashCode<T>(IEnumerable<T> items, int startHashCode)
    {
        var temp = startHashCode;

        var enumerator = items.GetEnumerator();
        if (enumerator.MoveNext())
        {
            temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));

            while (enumerator.MoveNext())
            {
                temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
            }
        }
        else
        {
            temp = CombineHashCodes(temp, EmptyCollectionPrimeNumber);
        }

        return temp;
    }
}

What Makes a Good Algorithm?

Performance

The algorithm that calculates a hash code needs to be fast. A simple algorithm is usually going to be a faster one. One that does not allocate extra memory will also reduce need for garbage collection, which will in turn also improve performance.

In C# hash functions specifically, you often use the unchecked keyword which stops overflow checking to improve performance.

Deterministic

The hashing algorithm needs to be deterministic i.e. given the same input it must always produce the same output.

Reduce Collisions

The algorithm that calculates a hash code needs to keep hash collisions to a minumum. A hash collision is a situation that occurs when two calls to GetHashCode on two different objects produce identical hash codes. Note that collisions are allowed (some have the misconceptions that they are not) but they should be kept to a minimum.

A lot of hash functions contain magic numbers like 17 or 23. These are special prime numbers which due to their mathematical properties help to reduce hash collisions as compared to using non-prime numbers.

Hash Uniformity

A good hash function should map the expected inputs as evenly as possible over its output range i.e. it should output a wide range of hashes based on its inputs that are evenly spread. It should have hash uniformity.

Prevent’s DoS

In .NET Core each time you restart an application you will get different hash codes. This is a security feature to prevent Denial of Service attacks (DoS). For .NET Framework you should enable this feature by adding the following App.config file:

<?xml version ="1.0"?>  
<configuration>  
   <runtime>  
      <UseRandomizedStringHashAlgorithm enabled="1" />  
   </runtime>  
</configuration>

Because of this feature, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection and they should never be persisted.

Read more about this here.

Cryptographically Secure?

The algorithm does not have to be a Cryptographic hash function. Meaning it does not have to satisfy the following conditions:

  • It is infeasible to generate a message that yields a given hash value.
  • It is infeasible to find two different messages with the same hash value.
  • A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value (avalanche effect).

5

  • 4

    This is very good answer. As an addition, you could consider changing “speed” to “performance” and adding the property of being allocation-free. The built-in HashCode type satisfies that too.

    – Timo

    Jul 10, 2020 at 15:22

  • How does this compare to the ValueTuple.GetHashCode() answer recently updated by @ricklove above?

    Feb 18, 2021 at 3:10

  • 2

    The HashCode.Combine is a static method which will not allocate anything, while ValueTuple will start with allocating on the stack.

    Feb 18, 2021 at 8:35

  • 2

    HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers) – that is nice syntax 🙂

    – Amos Egel

    Mar 9, 2021 at 8:14

  • they should never be used as key fields in a collection, Isn’t that the whole point of hash codes though? And the existence of hash tables, hash sets, dictionaries?

    Dec 30, 2021 at 20:54