NHibernate Forge
The official new home for the NHibernate for .NET community

Get unique results from joined queries

Attachments

Page Details

First published by:
Ken Egozi
on 09-12-2008
Last revision by:
Woil
on 09-18-2008
9 people found this article useful.
Article
Comments (7)
History (4)
90% of people found this useful

Get unique results from joined queries

Filed under: [Edit Tags]

Problem:

A recurring issue that new NHibernate users keep asking about, is the fact that they sometimes get duplicate results from their queries

For example, take a look at the next model:

where Comment.Name refer to the name of the person who wrote the comment.

If you'd want to issue a query on Post that joins to Comment, say "all of the posts that were commented by 'Ken Egozi'", the next hql query should do the trick:

   select  p
   from    Post p
   join    p.Comments c
   where   c.Name = 'Ken Egozi'";
 
So, you go ahead and
   var hql = @"
      select  p
      from    Post p
      join    p.Comments c
      where   c.Name = 'Ken Egozi'";
   var posts = session.CreateQuery(hql)
      .List<post>();

Assuming the next data is in the DB:

  Post#1
Comment by 'Ken Egozi' Comment by 'Fabio Maulo' Comment by 'Ken Egozi' Post#2 Comment by 'Fabio Maulo' Post#3 Comment by 'Fabio Maulo' Comment by 'Ken Egozi'

you'd expect the list 'posts' to include two posts: Post#1 and Post#3.
However, if you'd look into posts, you'll see that it has three entries, two entries pointing to Post#1 and one entry pointing to Post#3

Note that NHibernate is smart enough not to create two separate instances for Post#1, but rather a single instance will be created, and the two entries in the list will point to the same instance in memory.

Explanation:

The said hql query (or an equivalent Criteria) is translated to the joined SELECT clause

   SELECT  {p.*}
   FROM    Posts p
     JOIN  Comments c ON c.PostId = p.Id
   WHERE   c.[Name] = 'Ken Egozi'
 

where {p.*} is substituted by the list of fields needed to satisfy a Post object. Running this SQL query directly on the data detailed above, will result with three lines in the resultset.

Solutions

There are three solutions to that in SQL world:

  1. The DISTINCT keyword, when added on the fields of Posts, will ensure that there will be no duplicate lines. there are a few problems with that. Some RDBMS systems will not allow DISTINCT over certain data types, like the blob types in SQL Server. And if you also need to retrieve the joined data, then the DISTINCT keyword will be useless.
  2. Using a subselect
       SELECT *
       FROM   Posts p
       WHERE EXISTS
       (
          SELECT 1
          FROM   Comments c
          WHERE  c.PostId = p.Id
            AND  c.[Name] = 'Ken Egozi'
       )
  3. Remove duplicates in the client code.

The solutions in NHibernate's world, in accordance to the SQL solutions:

  1. Use the 'distinct' keyword.
    The problems that exists in the SQL world still apply, and it can only be used in HQL, not it Criteria API.
  2. Use subselects.
    The syntax in HQL:
       from   Post p
       where exists
       (
          from Comment c
          where c.Post = p
          and c.Name = 'Ken Egozi'
       )
    and in Criteria API:
     var commentsByKenEgozi = DetachedCriteria.For<comment>()
         .Add(Restrictions.Eq("Name", "Ken Egozi"))
         .Add(Restrictions.EqProperty("Post.Id", "p.Id"))
         .SetProjection(NHibernate.Criterion.Projections.Id());
     
      var posts = session.CreateCriteria(typeof (Post), "p")
         .Add(Subqueries.Exists(commentsByKenEgozi))
         .List<post>();</comment>
  3. Remove duplicates in client code.
    That's actually pretty easy with NHibernate, thanks to the result transformers. all you need to do is add .SetResultTransformer(new DistinctRootEntityResultTransformer()) to either your query, or criteria, and NHibernate will remove duplicate entries from the resulted list.
    so, using the same hql as in the first example, you can write
     var hql = @"
          select  p
          from    Post p
          join    p.Comments c
          where   c.Name = 'Ken Egozi'";
       var posts = session.CreateQuery(hql)
          .SetResultTransformer(new DistinctRootEntityResultTransformer())
          .List<post>();

Recent Comments

By: nabeelfarid Posted on 06-30-2010 8:26

Hi jbs,

I am having the same issue with nest child objects. Did you manage to solve this?

Awaiting

Nabeel

By: wuxsh Posted on 06-23-2009 6:44

Cool!

By: jbs Posted on 06-15-2009 16:30

Sorry, forgot to add....the case of loading is through the Person (and lazy loading turned off)

By: jbs Posted on 06-15-2009 16:29

Any idea how one can handle this case with nested objects with join queries where its not the root entity being affected?

e.g.

Person.Business.Addresses  where there are 1+ addresses, I get back dups on the Business.

Any help would be gr8.

Thanks

By: canton Posted on 06-12-2009 3:30

Just a minor point to note here.

Using Result Transformer with query cache has to be very careful. Executing 2 identical queries with different result transformer *instances* makes Query Cache not working (as I expected).

IQuery query1 = IQuery.GetCacheEnabledQuery().SetResultTransformer(new DistinctRootEntityResultTransformer()).List();

IQuery query2 = IQuery.GetCacheEnabledQuery().SetResultTransformer(new DistinctRootEntityResultTransformer()).List();

query2 can't use the cache from query1 just because the result transofrmer *instances* are different.

IQuery query3 = IQuery.GetCacheEnabledQuery().SetResultTransformer(CriteriaSpecification.DistinctRootEntity).List();

IQuery query4 = IQuery.GetCacheEnabledQuery().SetResultTransformer(CriteriaSpecification.DistinctRootEntity).List();

query4 will get the result from the cache resulted in query3.

View All
Powered by Community Server (Commercial Edition), by Telligent Systems