NHibernate Forge
The official new home for the NHibernate for .NET community

Identity Field, Equality and Hash Code

Blog Signature Gabriel

In this post I'll describe a possible base class for domain entities which implements a surrogate key as identity field and provides equality and hash code.

Introduction

Martin Fowler writes in his PoEAA book: "The identity field saves a database ID field in an object to maintain identity between an in-memory object and a database row."

And further he states: "The first concern is whether to use meaningful or meaningless keys. A meaningful key is something like the U.S. Social Security Number... A meaningless key is essentially a random number the database dreams up that's never intended for human use."

There are many reasons why meaningful keys often are NOT good candidates for an identity field. Primarily they often are not immutable (due to possible human errors) and not unique. Thus Martin Fowler states: "... As a result, meaningful keys should be distrusted. ..."

Having you provided some background about the ongoing dispute about what is a good candidate for an identity field I'll now make my choice. I always choose meaningless keys as identity fields. Such fields are often called surrogate key. Important: "The surrogate key is not derived from application data."

My favorite type of surrogate key is a GUID (global unique identifier). The mathematical algorithm used to generate a new GUID is such as that it is (nearly) impossible to generate the same ID twice (the probability tends to zero).

NHibernate supports GUID as one possible type for the identity field.

Problem Description

When dealing with NHibernate one often uses a special type of collection known as Set. A set is a collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.Equals(e2), and at most one null element. As the Set is not provided by the .NET framework NHibernate uses the IESI collections library which contains an implementation of a set.

In the definition above you find which is the important predicate to decide whether two elements are the same or not. It is the Equals function. By default the Equals function takes the hash code of two objects and compares it. So if two variables e1 and e2 refer to 2 different instances of a class Equals will always return false. But we want to use the identity field as the relevant part in the comparison of two instances. If two different instances have the same identity field then they are equal (that is they refer to the same database record).

Implementation

The default implementation of the Equals function is to be found in the System.Object class. From this class all other classes in .NET implicitly or explicitly inherit. Fortunately the Equals function is virtual and we are able to override it. But when overriding the Equals function we have to also override the GetHashCode function.

Assuming that we take a GUID called Id as identity field we can define the following base class from which all our domain classes directly or indirectly will inherit

public class IdentityFieldProvider<T>
    where T : IdentityFieldProvider<T>
{
    private Guid _id;
 
    public virtual Guid Id
    {
        get { return _id; }
        set { _id = value; }
    }
}

Now lets override the Equals method. A possible solution is

public override bool Equals(object obj)
{
    T other = obj as T;
    if (other == null)
        return false;
 
    // handle the case of comparing two NEW objects
    bool otherIsTransient = Equals(other.Id, Guid.Empty);
    bool thisIsTransient = Equals(Id, Guid.Empty);
    if (otherIsTransient && thisIsTransient)
        return ReferenceEquals(other, this);
 
    return other.Id.Equals(Id);
}

We have to distinguish 3 possible cases. The first one is that the user/developer wants to compare two objects of different type. This case is trivial; the answer is ALWAYS "not equal". The second case is when the two objects are both new (also called transient) then the two references point to the same instance. And the third case just takes the implementation of the Equals method of the GUID type to check for equality.

Now we have to also override the GetHashCode method also inherited from System.Object.

private int? _oldHashCode;
 
public override int GetHashCode()
{
    // Once we have a hash code we'll never change it
    if (_oldHashCode.HasValue)
        return _oldHashCode.Value;
 
    bool thisIsTransient = Equals(Id, Guid.Empty);
    
    // When this instance is transient, we use the base GetHashCode()
    // and remember it, so an instance can NEVER change its hash code.
    if (thisIsTransient)
    {
        _oldHashCode = base.GetHashCode();
        return _oldHashCode.Value;
    }
    return Id.GetHashCode();
}

Now, why this kind of code you might ask yourself? Well, a object should never ever change it's hash code during its life, that is from the moment the object is instantiated until it is disposed. If a object is restored from database there is no problem since any existing database record has always a well defined and unique identity field. Thus we can derive the hash code from this Id field. This is done in the last line of code in the code snippet above.

A little bit more problematic is the case when a new object is created in memory, then it's identity field is undefined (the object has not been saved to the database so far and is thus considered as being transient). In our case undefined means that the Id field has a value of Guid.Empty. In this case we take the default implementation (of System.Object) of the GetHashCode method to generate a hash code. But we store is in an instance variable for further reference.

Later in the life cycle of the instance it may be persisted to the database (but still continues to sit around in the memory). At this moment NHibernate assigns a new unique value to the Id field of the instance. Now the object isn't transient any more but the 2 first lines in the method avoid that the hash code of the object changes. It is still the same object as before. It has just been made persistent.

Finally we can also override the two operators '==' and '!=' to make it possible to compare two instances with those operators instead of only the Equals method.

public static bool operator ==(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return Equals(x, y);
}
 
public static bool operator !=(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return !(x == y);
}

That's it. You can now use this class as the base for every entity class in your domain and never ever have to think about the identity field and the equality of objects. It just happens...

Enjoy

Blog Signature Gabriel


Posted sep 06 2008, 10:02 p.m. by gabriel.schenker
Filed under: ,

Comments

Felix Gartsman wrote re: Identity Field, Equality and Hash Code
on 09-13-2009 14:16

According to msdn.microsoft.com/.../system.object.gethashcode.aspx this implementation does not meet consistency requirements.

The requirement is: If two objects compare as equal, the GetHashCode method for each object must return the same value.

Consider object A was transient and saved its hashcode. Then was persisted and assigned Id=X. Afterward we load object B for identity X. Now we get A==B, but their hashcodes differ!

According to this "The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's  Equals method." you actually can change the hashcode, though it'll probably mess things up.

gabriel.schenker wrote re: Identity Field, Equality and Hash Code
on 09-13-2009 18:41

In the article I explicitely write that this is a very specific implementation of equality. The equality is based on several conventions:

a) for a given id only one instance can/shall exists at the same time (first level cache!)

b) ONLY the id is the relevant for comparison

if you are talking of "value type" (in the DDD sense) comparison then your statement is valid!

Stefan Steinegger wrote re: Identity Field, Equality and Hash Code
on 08-29-2011 10:50

@Felix Gartsman: This shouldn't ever be a problem. There can't be two objects with the same id in the same session - except one is the proxy of the other. But in the case of proxy, it is not possible that one is persistent and the other is not. Additionally, the proxy calls the object's GetHashCode and Equals methods, so they return the same values.

Whatever you do to get two objects with the same id (you need to take it from another session), you'll get a problem. I don't think that an Equals or GetHashCode method could ever avoid this problem. And I can't think of any scenario where it would make sense.

Powered by Community Server (Commercial Edition), by Telligent Systems