forking within a single DBIx::Class transaction -- possible?

3 messages Options
Embed this post
Permalink
Leandro Hermida

forking within a single DBIx::Class transaction -- possible?

Reply Threaded More More options
Print post
Permalink
Hi everyone,

Been a long time since I've posted on this list, but been using DBIx::Class for a couple years now and love it... great software.

Anywho, I've wrriten this code which do parallel processing (using Parallel::Forker) within a single DBIx::Class transaction.  Something is not working as it throws lock wait timeout errors.  I want to know, is it possible to use for fork() in general within a single DBIx::Class transaction?  Each of my child processes is working on different data in the database, but I want to rollback everything if something fails in any child.

thanks for any insight,
leandro


use My::Schema;

eval {

    my $schema = My::Schema->connect(...);
    $schema->txn_do(sub {
       
        ... do some database stuff here before forking...
       
        my $forker = Parallel::Forker->new(use_sig_child => 1, max_proc => $num_procs);

        $SIG{CHLD} = sub { Parallel::Forker::sig_child($forker); };
        $SIG{TERM} = sub { $forker->kill_tree_all('TERM') if $forker && $forker->in_parent; };
        my @studies = $schema->resultset('Study')->all();
        for my $study (@studies) {
            $forker->schedule(run_on_start => sub {
                   
                ... here in child process code do some heavy processing and
                    then database inserts, deletes using $schema...

               
            })->ready();
        }
        # wait for all remaining child processes to finish
        $forker->wait_all();
    });
};
if ($@) {
    my $message = "Database transaction failed";
    $message .= " and ROLLBACK FAILED" if $@ =~ /rollback failed/i;
    die "\n\n$message: $@\n\n";
}



_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Toby Corkindale-2

Re: forking within a single DBIx::Class transaction -- possible?

Reply Threaded More More options
Print post
Permalink
Leandro Hermida wrote:

> Hi everyone,
>
> Been a long time since I've posted on this list, but been using
> DBIx::Class for a couple years now and love it... great software.
>
> Anywho, I've wrriten this code which do parallel processing (using
> Parallel::Forker) within a single DBIx::Class transaction.  Something is
> not working as it throws lock wait timeout errors.  I want to know, is
> it possible to use for fork() in general within a single DBIx::Class
> transaction?  Each of my child processes is working on different data in
> the database, but I want to rollback everything if something fails in
> any child.

Hi Leandro,
I'm afraid it really won't work; the database connection is not designed
to be arbitrarily multiplexed like that.
At best it won't work, and at worst you'll get horrible data corruption.

Also, even if it worked, the database performance will drop if you have
multiple simultaneous queries. (Since the DB has to make sure your
queries are not interfering with each other.)

So unless your processing is particularly CPU intensive, and you have a
multi-core system, then I recommend doing it all in a single process.

If your processing *does* meet those requirements, then look into a
different methodology.. Try retrieving the data, then giving the raw
data to the your children to process, then take back the results,
aggregate it, and store it to the DB in the parent.


If you're doing the sort of large dataset processing that needs that
behaviour, then you may want to look into using the Apache Hadoop framework.


Cheers,
Toby

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Leandro Hermida

Re: forking within a single DBIx::Class transaction -- possible?

Reply Threaded More More options
Print post
Permalink
Hi Toby,

Thank you, after writing my post yesterday I read further and saw that
multiprocessing and multithreading need new DBI connections for each
process/thread, DBIx::Class is nice in that it automatically notices
new process/thread context and sets $dbh->{InactiveDestroy},
disconnects, and reconnects with new dbh.  As you said, unfortunately
though the requirement to scope everything in one db transaction is
totally not possible no matter what.

I started refactoring the code just as you recommended, doing all of
the intensive processing in parallel and storing the child process
results in a shared memory data structure (Cache::FastMmap for
example) and then only after all children are finished creating a new
DBIx::Class schema and opening a transaction to do the database stuff.

leandro


On Wed, Oct 14, 2009 at 6:59 AM, Toby Corkindale
<[hidden email]> wrote:

> Leandro Hermida wrote:
>>
>> Hi everyone,
>>
>> Been a long time since I've posted on this list, but been using
>> DBIx::Class for a couple years now and love it... great software.
>>
>> Anywho, I've wrriten this code which do parallel processing (using
>> Parallel::Forker) within a single DBIx::Class transaction.  Something is not
>> working as it throws lock wait timeout errors.  I want to know, is it
>> possible to use for fork() in general within a single DBIx::Class
>> transaction?  Each of my child processes is working on different data in the
>> database, but I want to rollback everything if something fails in any child.
>
> Hi Leandro,
> I'm afraid it really won't work; the database connection is not designed to
> be arbitrarily multiplexed like that.
> At best it won't work, and at worst you'll get horrible data corruption.
>
> Also, even if it worked, the database performance will drop if you have
> multiple simultaneous queries. (Since the DB has to make sure your queries
> are not interfering with each other.)
>
> So unless your processing is particularly CPU intensive, and you have a
> multi-core system, then I recommend doing it all in a single process.
>
> If your processing *does* meet those requirements, then look into a
> different methodology.. Try retrieving the data, then giving the raw data to
> the your children to process, then take back the results, aggregate it, and
> store it to the DB in the parent.
>
>
> If you're doing the sort of large dataset processing that needs that
> behaviour, then you may want to look into using the Apache Hadoop framework.
>
>
> Cheers,
> Toby
>
> _______________________________________________
> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
> Searchable Archive:
> http://www.grokbase.com/group/dbix-class@...
>

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...