Re: Bug in prefetch subqueries?

2 messages Options
Embed this post
Permalink
Peter Rabbitson-2

Re: Bug in prefetch subqueries?

Reply Threaded More More options
Print post
Permalink
Rob Kinyon wrote:

> ribasushi -
>
>     In a DBIC list message, you said the following:
>
>> [1] The reason the double WHERE is needed is as follows:
>>
>> SELECT artist.*, cd.*
>>    FROM (
>>        SELECT artist.*
>>            FROM artist
>>            JOIN cd ON <some condition>
>>        WHERE XXX
>>    ) artist
>>    JOIN cd ON <some condition>
>> WHERE XXX
>>
>> In the above XXX could contain
>> 1) only artist column restrictions
>> 2) only cd column restrictions
>> 3) both
>>
>> If I could reliable determine this I would (in each of the cases):
>>
>> 1) Keep only the inner WHERE
>> 2) Keep only the outer WHERE (and maybe even remove the inner join
>> if it was of type LEFT)
>> 3) Separate the cd from the artist conditions and apply th where on
>> the relevant sides
>
> I have found what seems to be a bug. Basically, it comes down to this:
>
> artists:
> { id => 1, name => 'Metallica' }
> { id => 2, name => 'Linkin Park' }
>
> cds:
> { id => 1, artist_id => 1, name => 'Fade to Black' }
> { id => 2, artist_id => 1, name => 'Kill em All' }
> { id => 3, artist_id => 2, name => 'Something' }
>
> my $artist = $artist->search({ 'cd.name' => 'Fade to Black' }, { join
> => 'cd' })->first;
>
> my $artist_pref = $artist->search({ 'cd.col1' => 'Fade to Black' }, {
> join => 'cd', prefetch => 'cd' })->first;
>
> So, with $artist, when I do $artist->cds, I get two results back.
> $artist_pref->cds will only return 1 result given the SQL you put in
> the message to the list. Is this desired behavior? I would think that
> my search is asking for all the artists who have a CD with a name of
> 'Fade to Black'. Then, when I do $artist->cds, I'm asking for all the
> CDs ever released by that artist. How I got $artist is now irrelevant.
>
> $foo->bar should return identical results with or without prefetch -
> it should just work faster with prefetch.
>

If $foo was a resultset - I would agree. However in row context this
isn't the case. You use the variable $artist for both resultsets and
rows above (also you use .name and .col1 which makes no sense), so I
will try to rewrite what I think you are trying to say:

We start with:

$f2b_artist_rs = $schema->resultset ('Artist')
                ->search (
                  { 'cd.name' => 'Fade to Black' },
                  { join => 'cd' },
                );

The count:
$f2b_artist_rs->search_related ('cd')->count
should be 1, as the (only) query will contain the cd.name condition.

The same count will be available via:
$f2b_artist_rs->search ({}, { prefetch => 'cd' })
                ->search_related ('cd')
                  ->count
as there is still only 1 query.

Now if we get an artist, all we have is the artist id, and we lose
all kinds of conditions. I.e.:

$f2b_artist_rs->first->cds->count will return 2, as there will be 2
queries executed:
1) get me the first artist that has a cd named f2b
2) get me the count of all cds which have a FK equal to the artists
PK

And now we come to your semi-bug which reads:
$f2b_artist_rs->search ({}, { prefetch => 'cd' })
                ->first
                  ->cds
                    ->count

The reason this returns 1, is because we execute only *one* query
(the whole point of prefetch => ). The result of the query is a
single row of artist/cd data, which is used to pre-populate all the
slots for ->has_many_accessor. In order to have the above return 2
we would have to re-query the database, effectively nullifying the
effect of prefetch. Now, if you are hell-bent on making this
correct and still maintain the prefetch == single query paradigm,
you would have to alter the prefetch mechanism to detect if there
are *any* right-side join conditions, and if this is the case do
something like:

SELECT artist.*, cd2.*
  FROM artist artist
  LEFT JOIN cd cd ON artist.id = cd.artistid
  LEFT JOIN cd cd2 ON artist.id = cd2.artistid
WHERE cd.name = ?

Effectively we pull an additional "shadow" join that will be
related to the artist rs limited by the initial join, and which
will be used for the actual object inflation. Of course in order to
make this happen we need both introspectable SQLA, and sane multi-
has-many collapse (actually the collapse may not be ncessary, as we
do not select anything from the first join). And even if we did, I
am still not sure if going to such great lengths is in fact
justified. Prefetched rs == non-prefetched rs is a flawed concept
imo, as it is not quite compatible with prefetch == one query only,
which as far as I am concerned *is* the selling point of prefetch
(db-trip latencies are the most significant bottleneck I have
encountered during my work).

Thoughts?


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Rob Kinyon

Re: Bug in prefetch subqueries?

Reply Threaded More More options
Print post
Permalink
On Thu, Jul 30, 2009 at 08:17, Peter Rabbitson<[hidden email]> wrote:

> And now we come to your semi-bug which reads:
> $f2b_artist_rs->search ({}, { prefetch => 'cd' })
>                ->first
>                  ->cds
>                    ->count
>
> The reason this returns 1, is because we execute only *one* query
> (the whole point of prefetch => ). The result of the query is a
> single row of artist/cd data, which is used to pre-populate all the
> slots for ->has_many_accessor. In order to have the above return 2
> we would have to re-query the database, effectively nullifying the
> effect of prefetch. Now, if you are hell-bent on making this
> correct and still maintain the prefetch == single query paradigm,
> you would have to alter the prefetch mechanism to detect if there
> are *any* right-side join conditions, and if this is the case do
> something like:

That's my point. The result of the query should be -two- rows of artist/cd data.

> SELECT artist.*, cd2.*
>  FROM artist artist
>  LEFT JOIN cd cd ON artist.id = cd.artistid
>  LEFT JOIN cd cd2 ON artist.id = cd2.artistid
> WHERE cd.name = ?

Or this:
SELECT artist.*, cd.*
FROM (
SELECT artist.*
FROM artist artist
LEFT JOIN cd cd ON artist.id = cd.artistid
WHERE cd.name = ?
) artist
LEFT JOIN cd cd ON artist.id = cd.artistid

> Effectively we pull an additional "shadow" join that will be
> related to the artist rs limited by the initial join, and which
> will be used for the actual object inflation. Of course in order to
> make this happen we need both introspectable SQLA, and sane multi-
> has-many collapse (actually the collapse may not be ncessary, as we
> do not select anything from the first join). And even if we did, I
> am still not sure if going to such great lengths is in fact
> justified. Prefetched rs == non-prefetched rs is a flawed concept
> imo, as it is not quite compatible with prefetch == one query only,
> which as far as I am concerned *is* the selling point of prefetch
> (db-trip latencies are the most significant bottleneck I have
> encountered during my work).

We don't need introspectable SQLA and multi-has-many collapse is an
orthogonal issue. We just need to stop conflating the prefetched
results with the main resultset.

And I didn't say "Prefetched rs == non-prefetched rs". I said that the
rows from a search should behave identically regardless of whether:
    * the rows in question came from a prefetched search or not
    * the relation being accessed was prefetched or not

And the current implementation of prefetch violates that. Consider the
following:

----------------
sub print_cds {
    my $artist = shift;

    return sprintf "%s", join( ", ", sort map { $_->name } $artist->cds ), $/;
}

my $f2b_rs = Artist->search({
    'cd.name' => 'Fade to Black'
});

my $cds = print_cds( $f2b_rs->first );

my $f2b_pref_rs = $f2b_rs->search( {}, { prefetch => 'cds' } );

my $pref_cds = print_cds( $f2b_pref_rs->first );

is( $cds, $pref_cds );
----------------

That test will currently fail and it shouldn't. Prefetch is purely an
optimization. Optimizations should never change behavior.

Rob

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...