NHibernate Forge
The official new home for the NHibernate for .NET community

Identity Field, Equality and Hash Code

Article
Comments (9)
History (1)
66% of people found this useful

Identity Field, Equality and Hash Code

In this article I'll describe a possible base class for domain entities which implements a surrogate key as identity field and provides equality and hash code.

Introduction

Martin Fowler writes in his PoEAA book: "The identity field saves a database ID field in an object to maintain identity between an in-memory object and a database row."

And further he states: "The first concern is whether to use meaningful or meaningless keys. A meaningful key is something like the U.S. Social Security Number... A meaningless key is essentially a random number the database dreams up that's never intended for human use."

There are many reasons why meaningful keys often are NOT good candidates for an identity field. Primarily they often are not immutable (due to possible human errors) and not unique. Thus Martin Fowler states: "... As a result, meaningful keys should be distrusted. ..."

Having you provided some background about the ongoing dispute about what is a good candidate for an identity field I'll now make my choice. I always choose meaningless keys as identity fields. Such fields are often called surrogate key. Important: "The surrogate key is not derived from application data."

My favorite type of surrogate key is a GUID (global unique identifier). The mathematical algorithm used to generate a new GUID is such as that it is (nearly) impossible to generate the same ID twice (the probability tends to zero).

NHibernate supports GUID as one possible type for the identity field.

Problem Description

When dealing with NHibernate one often uses a special type of collection known as Set. A set is a collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.Equals(e2), and at most one null element. As the Set is not provided by the .NET framework NHibernate uses the IESI collections library which contains an implementation of a set.

In the definition above you find which is the important predicate to decide whether two elements are the same or not. It is the Equals function. By default the Equals function takes the hash code of two objects and compares it. So if two variables e1 and e2 refer to 2 different instances of a class Equals will always return false. But we want to use the identity field as the relevant part in the comparison of two instances. If two different instances have the same identity field then they are equal (that is they refer to the same database record).

Implementation

The default implementation of the Equals function is to be found in the System.Object class. From this class all other classes in .NET implicitly or explicitly inherit. Fortunately the Equals function is virtual and we are able to override it. But when overriding the Equals function we have to also override the GetHashCode function.

Assuming that we take a GUID called Id as identity field we can define the following base class from which all our domain classes directly or indirectly will inherit

public class IdentityFieldProvider<T>
    where T : IdentityFieldProvider<T>
{
    private Guid _id;
 
    public virtual Guid Id
    {
        get { return _id; }
        set { _id = value; }
    }
}

Now lets override the Equals method. A possible solution is

public override bool Equals(object obj)
{
    T other = obj as T;
    if (other == null)
        return false;
 
    // handle the case of comparing two NEW objects
    bool otherIsTransient = Equals(other.Id, Guid.Empty);
    bool thisIsTransient = Equals(Id, Guid.Empty);
    if (otherIsTransient && thisIsTransient)
        return ReferenceEquals(other, this);
 
    return other.Id.Equals(Id);
}

We have to distinguish 3 possible cases. The first one is that the user/developer wants to compare two objects of different type. This case is trivial; the answer is ALWAYS "not equal". The second case is when the two objects are both new (also called transient) then the two references point to the same instance. And the third case just takes the implementation of the Equals method of the GUID type to check for equality.

Now we have to also override the GetHashCode method also inherited from System.Object.

private int? _oldHashCode;
 
public override int GetHashCode()
{
    // Once we have a hash code we'll never change it
    if (_oldHashCode.HasValue)
        return _oldHashCode.Value;
 
    bool thisIsTransient = Equals(Id, Guid.Empty);
    
    // When this instance is transient, we use the base GetHashCode()
    // and remember it, so an instance can NEVER change its hash code.
    if (thisIsTransient)
    {
        _oldHashCode = base.GetHashCode();
        return _oldHashCode.Value;
    }
    return Id.GetHashCode();
}

Now, why this kind of code you might ask yourself? Well, a object should never ever change it's hash code during its life, that is from the moment the object is instantiated until it is disposed. If a object is restored from database there is no problem since any existing database record has always a well defined and unique identity field. Thus we can derive the hash code from this Id field. This is done in the last line of code in the code snippet above.

A little bit more problematic is the case when a new object is created in memory, then it's identity field is undefined (the object has not been saved to the database so far and is thus considered as being transient). In our case undefined means that the Id field has a value of Guid.Empty. In this case we take the default implementation (of System.Object) of the GetHashCode method to generate a hash code. But we store is in an instance variable for further reference.

Later in the life cycle of the instance it may be persisted to the database (but still continues to sit around in the memory). At this moment NHibernate assigns a new unique value to the Id field of the instance. Now the object isn't transient any more but the 2 first lines in the method avoid that the hash code of the object changes. It is still the same object as before. It has just been made persistent.

Finally we can also override the two operators '==' and '!=' to make it possible to compare two instances with those operators instead of only the Equals method.

public static bool operator ==(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return Equals(x, y);
}
 
public static bool operator !=(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return !(x == y);
}

That's it. You can now use this class as the base for every entity class in your domain and never ever have to think about the identity field and the equality of objects. It just happens...

Recent Comments

By: MickyD Posted on 10-03-2011 1:19

i'm not sure your implementation of gethashcode() is correct.  one it perhaps should not lock down to just the Id property; it ignores other properties in derived classes; and most importantly it seems to be ignoring compare as equal behaviour.  

MSDN:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

By: patearl Posted on 08-07-2011 3:49

The problem of having the objects coming from the database not being the same as the the transient objects that were just saved can be solved by using a new id generator that copies from a pre-determined GUID in the transient instance.  For example, every object generates a unique GUID when it's created.  This is stored in a separate field from the Id.  The hash code always uses this unique guid for transient objects and the regular Id for persistent objects.  When the object is saved, the GUID is copied into the Id field by a custom generator.  I haven't actually tried this, but it seems doable.

[Edit: Oh... looks like people already wrote about approximately the same thing.]

By: nestor Posted on 10-01-2010 18:58

Hi,

I agree with many of you regarding to do NOT recomend to override equals and hashcode based on the identifier.  I think if you override it you could have two instance pointing to the same record in the database and the Identity Map gets broken.  

By: SzymonPobiega Posted on 07-21-2009 8:56

MSDN documentation states that 'The default implementation of Equals supports reference equality for reference types, and bitwise equality for value types', not that hashcode is taken into account.

NHibernate has built-in functionality of Identity Map, so it is warrantied that two persistent instances with same identity field will be be equal-by-reference so they will be pointing to the same address in managed heap. Furthermore, two not-yet-persisted objects or one persisted and one not will never be equals because they will point to different addresses on the heap.

Ergo, I see no point in implementing such a sophisticated mechanism when the default .NET and NHibernate behavior is enough for most cases.

By: Stefan Steinegger Posted on 03-04-2009 10:50

<p><p><p><p><p><p><p>I don't recommend writing Equals dependent to the id provided by the database. The business identity starts when the object is created and should not change when storing it (because storing does not change the object). It could be based on unique properties. We are using guids that are initialized in the constructor.</p></p></p></p></p></p></p>

<p><p><p><p><p><p><p>You could have bugs that only appear when the object is stored at a certain point (before or after a comparison). I wouldn't like to search such a bug.</p></p></p></p></p></p></p>

<p><p><p><p><p><p><p>I don't recommend to write such a == and != operator for regular entities. If you call those operators you normally expect a reference comparison. You don't want to look into the code of the entity to see if it has been overridden and does something sophisticated (which you need to know when you call it). So this kind of operator implementation is just making things more confusing. I would only do this for value types.</p></p></p></p></p></p></p>

<p><p><p><p><p><p><p>Have you ever had the same object in more than one instance? One from the database and one from the client? You need to find out that the object is identical (Equals), but not the same (==).</p></p></p></p></p></p></p>

View All
Powered by Community Server (Commercial Edition), by Telligent Systems