How should we handle the JPQL DISTINCT keyword

3 messages Options
Embed this post
Permalink
Michael Dick

How should we handle the JPQL DISTINCT keyword

Reply Threaded More More options
Print post
Permalink
Hi all,

Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
DISTINCT keyword. So the following
JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
may result in the following SQL
SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0 WHERE
t0.title = 'Gone Sailing'"

This works fine for most queries, but when I was looking into OPENJPA-894 I
noticed a problem with some relationships and the JOIN FETCH clause. A JOIN
FETCH looking like this :
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
Has two effects :
   1. It eagerly loads b.subjects
   2. It returns (potentially) multiple references to the same book. One
reference to Book(id=1) for every subject associated with Book(id=1).

The resulting SQL may look like this (in this case Book is MxM with Subject)
:
SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name FROM
LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
Fishing'"

Lets say that the Book with title = 'Gone Sailing' has two subjects :
Outdoors and Sportsman. In that case the SQL will return two rows that look
something like this :

OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
----   ---------- ---------      ---------- ---- -----------
3      2009-11-11 Gone Fishing   3          12   Outdoors
3      2009-11-11 Gone Fishing   3          13   Sportsman

The fix for OPENJPA-894 generates a result list with two references to the
same Book(id=3).

If you only wanted eager fetching of b.subjects, and didn't want duplicates
a good first guess would be to add the DISTINCT keyword (I'm finally getting
back to the subject)

JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title =
'Gone Sailing'"
SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid,
t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
t0.title = 'Gone Fishing'"

We'll still get the same two rows because the SQL DISTINCT keyword is
applied to all permutations of the columns - not just the oid (and if it did
only apply to the oid we wouldn't be eagerly loading b.subjects anyway). As
a result I think we'll have to use a SetBackedList (or otherwise enforce
distinct results after getting rows from SQL) as our ResultList.

I have a patch that does this, and resolves the use case I described above
(it's in the TestLibService unit test), but I'm open to any other ideas for
ways to resolve the problem.

Thanks,
-mike
Pinaki Poddar

Re: How should we handle the JPQL DISTINCT keyword

Reply Threaded More More options
Print post
Permalink
Hi Mike,
  Very good description of the problem.
  JOIN FETCH is one place where the assumption that a object-oriented query is same as a row-based query shows its strain. DISTINCT in JPQL referred to b, while DISTINCT in SQL referred to the row it selects -- and b is not a row but the root of an object graph!

  SetBackedList is indeed a good idea.

  Other option is to drop the JOIN FETCH clauses altogether from part of the query. But to add them to the FetchPlan. Then query  
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone Sailing'"
is effectively
  "SELECT b from Book b WHERE b.title = 'Gone Sailing'"
  fetch.add(Book.class, "subjects");

  Do not how much trouble to tweak OpenJPA that way though!



Michael Dick wrote:
Hi all,

Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
DISTINCT keyword. So the following
JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
may result in the following SQL
SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0 WHERE
t0.title = 'Gone Sailing'"

This works fine for most queries, but when I was looking into OPENJPA-894 I
noticed a problem with some relationships and the JOIN FETCH clause. A JOIN
FETCH looking like this :
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
Has two effects :
   1. It eagerly loads b.subjects
   2. It returns (potentially) multiple references to the same book. One
reference to Book(id=1) for every subject associated with Book(id=1).

The resulting SQL may look like this (in this case Book is MxM with Subject)
:
SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name FROM
LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
Fishing'"

Lets say that the Book with title = 'Gone Sailing' has two subjects :
Outdoors and Sportsman. In that case the SQL will return two rows that look
something like this :

OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
----   ---------- ---------      ---------- ---- -----------
3      2009-11-11 Gone Fishing   3          12   Outdoors
3      2009-11-11 Gone Fishing   3          13   Sportsman

The fix for OPENJPA-894 generates a result list with two references to the
same Book(id=3).

If you only wanted eager fetching of b.subjects, and didn't want duplicates
a good first guess would be to add the DISTINCT keyword (I'm finally getting
back to the subject)

JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title =
'Gone Sailing'"
SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid,
t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
t0.title = 'Gone Fishing'"

We'll still get the same two rows because the SQL DISTINCT keyword is
applied to all permutations of the columns - not just the oid (and if it did
only apply to the oid we wouldn't be eagerly loading b.subjects anyway). As
a result I think we'll have to use a SetBackedList (or otherwise enforce
distinct results after getting rows from SQL) as our ResultList.

I have a patch that does this, and resolves the use case I described above
(it's in the TestLibService unit test), but I'm open to any other ideas for
ways to resolve the problem.

Thanks,
-mike
Pinaki
Michael Dick

Re: How should we handle the JPQL DISTINCT keyword

Reply Threaded More More options
Print post
Permalink
Hi Pinaki,

Thanks for reading through my verbose problem description and for your
comments!

This particular issue turned up when working with a WebSphere customer and
they have been reluctant to use vendor specific options like FetchPlans in
the past. I can propose FetchPlans as a solution again, but I want to have a
vendor neutral answer ready if they push back.

As a developer I appreciate the flexibility of FetchPlans, but I've found
that using them in this manner results in a additional SQL statements. With
the simple example I posted above FetchPlans generate 4 SQL statements and
take slightly longer than the JOIN FETCH's 2 statements. One developer's
laptop does not make a valid benchmark though :-)

-mike

On Wed, Oct 28, 2009 at 9:11 PM, Pinaki Poddar <[hidden email]> wrote:

>
> Hi Mike,
>  Very good description of the problem.
>  JOIN FETCH is one place where the assumption that a object-oriented query
> is same as a row-based query shows its strain. DISTINCT in JPQL referred to
> b, while DISTINCT in SQL referred to the row it selects -- and b is not a
> row but the root of an object graph!
>
>  SetBackedList is indeed a good idea.
>
>  Other option is to drop the JOIN FETCH clauses altogether from part of the
> query. But to add them to the FetchPlan. Then query
> JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> Sailing'"
> is effectively
>  "SELECT b from Book b WHERE b.title = 'Gone Sailing'"
>  fetch.add(Book.class, "subjects");
>
>  Do not how much trouble to tweak OpenJPA that way though!
>
>
>
>
> Michael Dick wrote:
> >
> > Hi all,
> >
> > Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
> > DISTINCT keyword. So the following
> > JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
> > may result in the following SQL
> > SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0
> > WHERE
> > t0.title = 'Gone Sailing'"
> >
> > This works fine for most queries, but when I was looking into OPENJPA-894
> > I
> > noticed a problem with some relationships and the JOIN FETCH clause. A
> > JOIN
> > FETCH looking like this :
> > JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> > Sailing'"
> > Has two effects :
> >    1. It eagerly loads b.subjects
> >    2. It returns (potentially) multiple references to the same book. One
> > reference to Book(id=1) for every subject associated with Book(id=1).
> >
> > The resulting SQL may look like this (in this case Book is MxM with
> > Subject)
> > :
> > SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name
> > FROM
> > LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID
> INNER
> > JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
> > Fishing'"
> >
> > Lets say that the Book with title = 'Gone Sailing' has two subjects :
> > Outdoors and Sportsman. In that case the SQL will return two rows that
> > look
> > something like this :
> >
> > OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
> > ----   ---------- ---------      ---------- ---- -----------
> > 3      2009-11-11 Gone Fishing   3          12   Outdoors
> > 3      2009-11-11 Gone Fishing   3          13   Sportsman
> >
> > The fix for OPENJPA-894 generates a result list with two references to
> the
> > same Book(id=3).
> >
> > If you only wanted eager fetching of b.subjects, and didn't want
> > duplicates
> > a good first guess would be to add the DISTINCT keyword (I'm finally
> > getting
> > back to the subject)
> >
> > JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title
> > =
> > 'Gone Sailing'"
> > SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID,
> > t2.oid,
> > t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
> > t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
> > t0.title = 'Gone Fishing'"
> >
> > We'll still get the same two rows because the SQL DISTINCT keyword is
> > applied to all permutations of the columns - not just the oid (and if it
> > did
> > only apply to the oid we wouldn't be eagerly loading b.subjects anyway).
> > As
> > a result I think we'll have to use a SetBackedList (or otherwise enforce
> > distinct results after getting rows from SQL) as our ResultList.
> >
> > I have a patch that does this, and resolves the use case I described
> above
> > (it's in the TestLibService unit test), but I'm open to any other ideas
> > for
> > ways to resolve the problem.
> >
> > Thanks,
> > -mike
> >
> >
>
>
> -----
> Pinaki
> --
> View this message in context:
> http://n2.nabble.com/How-should-we-handle-the-JPQL-DISTINCT-keyword-tp3908400p3909427.html
> Sent from the OpenJPA Developers mailing list archive at Nabble.com.
>