|
|
| 1 2 |
|
Nikos Alexandris
|
In reply to this post
by Roger Bivand
(Cc to Even Roualt ; Apologies to Even since he is not subscribed in the
list) Roger: > >> Three minutes instead of thirty+ suggests that the OGR > >> plugin has trouble with SQLite as the DB format. So maybe > >> the default for plugin= should be FALSE, not NULL and automatic > >> use if present? --%<-- > Could you, Nikos, > make a script generating a similar table in spearfish, and two small > scripts exercising the problem (export to R with the plugin, and with > the temporary shapefile. * The "problem" exists also with the default DBF as a back-end. I created 1000 random points, filled less than half of the records with random numbers and readVECT6("x", plugin=TRUE) takes again too much. I broke the process since it was running for more than 20 mins. * A script is pasted on the bottom which has a small "bug" (details below) :-) First some results for 1000 rows by 500 columns: > system.time(random_points <- readVECT6("random_points_1000", plugin=TRUE)) OGR data source with driver: GRASS Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head", layer: "1" with 1000 rows and 501 columns ^C ### This was running for more than 10 hours !!! ### > system.time(random_points <- readVECT6("random_points_1000", plugin=FALSE)) Exporting 1000 points/lines... 100% 1000 features written OGR data source with driver: ESRI Shapefile Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer: "random_p" with 1000 rows and 501 columns Feature type: wkbPoint with 2 dimensions user system elapsed 62.515 9.256 74.013 > system.time(random_points <- read.csv("random_points_1000_table.csv")) user system elapsed 0.192 0.000 0.192 * A script to generate "some" random points, add columns and some R-code to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and read.csv. * The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) + 1 ]" runs ok under the CLI, it doesn't work from within the bash script!? So I've commented the respective line and use a fixed number instead. --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- #!/bin/bash # example that readVECT6 ( x , plugin = TRUE ) is too slow # (also) using the default DBF driver # first enter in spearfish60/user1 # try with a different back-end? # db.connect driver=sqlite database= # set numbers here: RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111 # create RANDOM_POINTS random points v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo ${RANDOM_POINTS}` # add in database v.db.addtable random_points_`echo ${RANDOM_POINTS}` # add $"{RANDOM_POINTS}" columns echo "\n* Adding ${RANDOM_POINTS} columns" for x in `seq 1 ${RANDOM_POINTS}` ; do v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}" integer" done ; echo "\n* ${RANDOM_POINTS} columns added" # check if columns are added v.info -c random_points_${RANDOM_POINTS} ## WARNING: double loop below takes too long! # --%<-- # It is simpler and faster to use a single loop with a fixed value instead, e.g.: #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" value=222 #done # --%<-- # fill some columns/cats with random numbers between 1 and 100 # alter sequence as desired ; more numbers = more time to load in R for COL in `seq 1 10 ${RANDOM_POINTS}` ; do for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do # this is ok in the command line but NOT when running the script? #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]" v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" value=${NUMBER} where="cat="${CAT}"" done done # [optional] fill in some "-999" values to use as NAs in R? #NAN=-999 #for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do # for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do # v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}"" value=$"{NAN}" where="cat="${CAT}"" # done #done # check with v.db.select # v.db.select random_points_${RANDOM_POINTS} | head -25 # export table as .csv file db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV dsn=/geo/grassdb/spearfish60/random_points_csv_files db_table=random_points_${RANDOM_POINTS}.csv ### end of bash script ### ## launch R R ### R code # load in R with: library(spgrass6) ; G <- gmeta6() #a. readVECT6() system.time ( random_points <- readVECT6 ( "random_points_100" , plugin = FALSE ) ) #b. plugin=TRUE system.time ( random_points <- readVECT6 ( "random_points_100" , plugin = TRUE ) ) #c. as a csv table # adjust as required setwd("/geo/grassdb/spearfish60/random_points_csv_files") table_to_read <- dir ( pattern = "^random.*.csv$" ) system.time ( random_points <- read.csv ( table_to_read ) ) str(random_points) --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats |
||||||||||||||||
|
Nikos Alexandris
|
On Tue, 2009-05-26 at 12:48 +0200, Nikos Alexandris wrote:
> First some results for 1000 rows by 500 columns: > > > system.time(random_points <- readVECT6("random_points_1000", > plugin=TRUE)) > OGR data source with driver: GRASS > Source: > "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head", > layer: "1" > with 1000 rows and 501 columns > ^C > ### This was running for more than 10 hours !!! ### This was NOT running for more than 10 hours!! It was the double loop that was running for more than 10 hours to generate 1000 by 500 random numbers. Anyway, the above readVECT6() was running for at least 20 minutes while the readVECT6(plugin=FALSE) and read.csv do the job within acceptable times. Apologies again, Nikos P.S. Script attached as a .sh file with lots of empty lines between code to make it easy to read. _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats |
||||||||||||||||
|
Roger Bivand
|
In reply to this post
by Nikos Alexandris
On Tue, 26 May 2009, Nikos Alexandris wrote:
> (Cc to Even Roualt ; Apologies to Even since he is not subscribed in the > list) > > Roger: >>>> Three minutes instead of thirty+ suggests that the OGR >>>> plugin has trouble with SQLite as the DB format. So maybe >>>> the default for plugin= should be FALSE, not NULL and automatic >>>> use if present? > > --%<-- >> Could you, Nikos, >> make a script generating a similar table in spearfish, and two small >> scripts exercising the problem (export to R with the plugin, and with >> the temporary shapefile. > > * The "problem" exists also with the default DBF as a back-end. I > created 1000 random points, filled less than half of the records with > random numbers and readVECT6("x", plugin=TRUE) takes again too much. I > broke the process since it was running for more than 20 mins. OK. With 250 rows and 250 columns, I see an order of magnitude saving with plugin=FALSE. In plugin=FALSE, the times are split equally between writing the temporary file from GRASS with v.out.ogr, and reading it into R with readOGR(), as one might expect (that is all readVECT6(..., plugin=FALSE) is doing). Even on a small vector (bugsites, 90 points, 2 attribute columns), plugin=FALSE is faster than plugin=TRUE by about 0.75 : 1.35, not quite twice. Which way does the problem scale, in numbers of features, numbers of attribute columns, or both? Next script in R generating increasing NR and NC cases through writeVECT6() to test plugin=FALSE/plugin=TRUE ratios? Roger > > * A script is pasted on the bottom which has a small "bug" (details > below) :-) > > > First some results for 1000 rows by 500 columns: > >> system.time(random_points <- readVECT6("random_points_1000", > plugin=TRUE)) > OGR data source with driver: GRASS > Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head", > layer: "1" > with 1000 rows and 501 columns > ^C > ### This was running for more than 10 hours !!! ### > > >> system.time(random_points <- readVECT6("random_points_1000", > plugin=FALSE)) > Exporting 1000 points/lines... > 100% > 1000 features written > OGR data source with driver: ESRI Shapefile > Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer: > "random_p" > with 1000 rows and 501 columns > Feature type: wkbPoint with 2 dimensions > user system elapsed > 62.515 9.256 74.013 > > >> system.time(random_points <- read.csv("random_points_1000_table.csv")) > user system elapsed > 0.192 0.000 0.192 > > > > * A script to generate "some" random points, add columns and some R-code > to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and > read.csv. > > * The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) + > 1 ]" runs ok under the CLI, it doesn't work from within the bash > script!? So I've commented the respective line and use a fixed number > instead. > > --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- > #!/bin/bash > > # example that readVECT6 ( x , plugin = TRUE ) is too slow > # (also) using the default DBF driver > # first enter in spearfish60/user1 > > # try with a different back-end? > # db.connect driver=sqlite database= > > # set numbers here: > RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111 > > # create RANDOM_POINTS random points > v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo > ${RANDOM_POINTS}` > > # add in database > v.db.addtable random_points_`echo ${RANDOM_POINTS}` > > > # add $"{RANDOM_POINTS}" columns > echo "\n* Adding ${RANDOM_POINTS} columns" > for x in `seq 1 ${RANDOM_POINTS}` ; do > v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}" > integer" > done ; echo "\n* ${RANDOM_POINTS} columns added" > > > # check if columns are added > v.info -c random_points_${RANDOM_POINTS} > > > > ## WARNING: double loop below takes too long! > # --%<-- > # It is simpler and faster to use a single loop with a fixed value > instead, e.g.: > #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do > # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" > value=222 > #done > # --%<-- > > > # fill some columns/cats with random numbers between 1 and 100 > # alter sequence as desired ; more numbers = more time to load in R > for COL in `seq 1 10 ${RANDOM_POINTS}` ; do > for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do > # this is ok in the command line but NOT when running the script? > #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]" > v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" > value=${NUMBER} where="cat="${CAT}"" > done > done > > > # [optional] fill in some "-999" values to use as NAs in R? > #NAN=-999 > #for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do > # for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do > # v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}"" > value=$"{NAN}" where="cat="${CAT}"" > # done > #done > > # check with v.db.select > # v.db.select random_points_${RANDOM_POINTS} | head -25 > > # export table as .csv file > db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV > dsn=/geo/grassdb/spearfish60/random_points_csv_files > db_table=random_points_${RANDOM_POINTS}.csv > > ### end of bash script ### > > > ## launch R > R > ### R code > > # load in R with: > library(spgrass6) ; G <- gmeta6() > > #a. readVECT6() > system.time ( random_points <- readVECT6 ( "random_points_100" , plugin > = FALSE ) ) > > #b. plugin=TRUE > system.time ( random_points <- readVECT6 ( "random_points_100" , plugin > = TRUE ) ) > > #c. as a csv table > # adjust as required > setwd("/geo/grassdb/spearfish60/random_points_csv_files") > table_to_read <- dir ( pattern = "^random.*.csv$" ) > system.time ( random_points <- read.csv ( table_to_read ) ) > str(random_points) > --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- > > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [hidden email] _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats Roger Bivand
Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway |
||||||||||||||||
|
Roger Bivand
|
On Wed, 27 May 2009, Roger Bivand wrote:
> On Tue, 26 May 2009, Nikos Alexandris wrote: > >> (Cc to Even Roualt ; Apologies to Even since he is not subscribed in the >> list) >> >> Roger: >>>>> Three minutes instead of thirty+ suggests that the OGR >>>>> plugin has trouble with SQLite as the DB format. So maybe >>>>> the default for plugin= should be FALSE, not NULL and automatic >>>>> use if present? >> >> --%<-- >>> Could you, Nikos, >>> make a script generating a similar table in spearfish, and two small >>> scripts exercising the problem (export to R with the plugin, and with >>> the temporary shapefile. >> >> * The "problem" exists also with the default DBF as a back-end. I >> created 1000 random points, filled less than half of the records with >> random numbers and readVECT6("x", plugin=TRUE) takes again too much. I >> broke the process since it was running for more than 20 mins. > > OK. With 250 rows and 250 columns, I see an order of magnitude saving with > plugin=FALSE. In plugin=FALSE, the times are split equally between writing > the temporary file from GRASS with v.out.ogr, and reading it into R with > readOGR(), as one might expect (that is all readVECT6(..., plugin=FALSE) is > doing). Even on a small vector (bugsites, 90 points, 2 attribute columns), > plugin=FALSE is faster than plugin=TRUE by about 0.75 : 1.35, not quite > twice. Which way does the problem scale, in numbers of features, numbers of > attribute columns, or both? > > Next script in R generating increasing NR and NC cases through writeVECT6() > to test plugin=FALSE/plugin=TRUE ratios? And we also need to check whether the same applies to use of the plugin in other settings - I'm pretty certain this isn't just differential behaviour in readOGR() between the OGR shapefile driver and the OGR GRASS vector driver. Could someone test v.out.ogr against ogr2ogr using the plugin? Roger > > Roger > >> >> * A script is pasted on the bottom which has a small "bug" (details >> below) :-) >> >> >> First some results for 1000 rows by 500 columns: >> >>> system.time(random_points <- readVECT6("random_points_1000", >> plugin=TRUE)) >> OGR data source with driver: GRASS >> Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head", >> layer: "1" >> with 1000 rows and 501 columns >> ^C >> ### This was running for more than 10 hours !!! ### >> >> >>> system.time(random_points <- readVECT6("random_points_1000", >> plugin=FALSE)) >> Exporting 1000 points/lines... >> 100% >> 1000 features written >> OGR data source with driver: ESRI Shapefile >> Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer: >> "random_p" >> with 1000 rows and 501 columns >> Feature type: wkbPoint with 2 dimensions >> user system elapsed >> 62.515 9.256 74.013 >> >> >>> system.time(random_points <- read.csv("random_points_1000_table.csv")) >> user system elapsed >> 0.192 0.000 0.192 >> >> >> >> * A script to generate "some" random points, add columns and some R-code >> to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and >> read.csv. >> >> * The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) + >> 1 ]" runs ok under the CLI, it doesn't work from within the bash >> script!? So I've commented the respective line and use a fixed number >> instead. >> >> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- >> #!/bin/bash >> >> # example that readVECT6 ( x , plugin = TRUE ) is too slow >> # (also) using the default DBF driver >> # first enter in spearfish60/user1 >> >> # try with a different back-end? >> # db.connect driver=sqlite database= >> >> # set numbers here: >> RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111 >> >> # create RANDOM_POINTS random points >> v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo >> ${RANDOM_POINTS}` >> >> # add in database >> v.db.addtable random_points_`echo ${RANDOM_POINTS}` >> >> >> # add $"{RANDOM_POINTS}" columns >> echo "\n* Adding ${RANDOM_POINTS} columns" >> for x in `seq 1 ${RANDOM_POINTS}` ; do >> v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}" >> integer" >> done ; echo "\n* ${RANDOM_POINTS} columns added" >> >> >> # check if columns are added >> v.info -c random_points_${RANDOM_POINTS} >> >> >> >> ## WARNING: double loop below takes too long! >> # --%<-- >> # It is simpler and faster to use a single loop with a fixed value >> instead, e.g.: >> #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do >> # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" >> value=222 >> #done >> # --%<-- >> >> >> # fill some columns/cats with random numbers between 1 and 100 >> # alter sequence as desired ; more numbers = more time to load in R >> for COL in `seq 1 10 ${RANDOM_POINTS}` ; do >> for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do >> # this is ok in the command line but NOT when running the script? >> #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]" >> v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}"" >> value=${NUMBER} where="cat="${CAT}"" >> done >> done >> >> >> # [optional] fill in some "-999" values to use as NAs in R? >> #NAN=-999 >> #for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do >> # for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do >> # v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}"" >> value=$"{NAN}" where="cat="${CAT}"" >> # done >> #done >> >> # check with v.db.select >> # v.db.select random_points_${RANDOM_POINTS} | head -25 >> >> # export table as .csv file >> db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV >> dsn=/geo/grassdb/spearfish60/random_points_csv_files >> db_table=random_points_${RANDOM_POINTS}.csv >> >> ### end of bash script ### >> >> >> ## launch R >> R >> ### R code >> >> # load in R with: >> library(spgrass6) ; G <- gmeta6() >> >> #a. readVECT6() >> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin >> = FALSE ) ) >> >> #b. plugin=TRUE >> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin >> = TRUE ) ) >> >> #c. as a csv table >> # adjust as required >> setwd("/geo/grassdb/spearfish60/random_points_csv_files") >> table_to_read <- dir ( pattern = "^random.*.csv$" ) >> system.time ( random_points <- read.csv ( table_to_read ) ) >> str(random_points) >> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<- >> >> > > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [hidden email] _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats Roger Bivand
Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway |
||||||||||||||||
|
hamish-2
|
In reply to this post
by Nikos Alexandris
Roger wrote: > Next script in R generating increasing NR and NC cases through > writeVECT6() to test plugin=FALSE/plugin=TRUE ratios? Does R have any built in profiling tools? as grass is just a collection of small C programs the normal ones work fine with it: http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool or use a profiling tool at the command line while running ogr2ogr with input=grass and output=shapefile? Nikos wrt your 10hr script: v.db.update must open and close the DB for every time you call it. That is very slow and inefficient. Better is to write all SQL update commands to a file (end each line with a ';') then use that file as input to db.execute so it opens the DB, updates all fields, closes DB again in a single step. see the db.execute man page and v.in.garmin script for an example (where it made a huge improvement). Hamish _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats |
||||||||||||||||
|
Roger Bivand
|
On Wed, 27 May 2009, Hamish wrote:
> > Roger wrote: >> Next script in R generating increasing NR and NC cases through >> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios? > > > Does R have any built in profiling tools? as grass is just a collection > of small C programs the normal ones work fine with it: Yes, at the R level, so they won't help here. readVECT6() banches on plugin - if TRUE, it just calls readOGR() on the GRASS driver, if FALSE, it does (something like) v.out.ogr with shapefile driver to a temporary file and readOGR() on the shapefile. The C/C++ level code is in the (same) GDAL shared object for v.out.ogr and readOGR(), so I think the only difference is in the use or not of the plugin. My second post (testing v.out.ogr to shapefile against ogr2ogr from GRASS plugin to shapefile for a many-column vector) should reveal where the problem is - my present feeling is that the plugin and v.out.ogr handle access to the GRASS vector and its attribute data differently in one way or another. Roger > > http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool > > or use a profiling tool at the command line while running ogr2ogr with > input=grass and output=shapefile? > > > Nikos wrt your 10hr script: > v.db.update must open and close the DB for every time you call it. That > is very slow and inefficient. Better is to write all SQL update commands > to a file (end each line with a ';') then use that file as input to > db.execute so it opens the DB, updates all fields, closes DB again in > a single step. see the db.execute man page and v.in.garmin script for an > example (where it made a huge improvement). > > > > Hamish > > > > > > _______________________________________________ > grass-stats mailing list > [hidden email] > http://lists.osgeo.org/mailman/listinfo/grass-stats > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [hidden email] _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats Roger Bivand
Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway |
||||||||||||||||
|
Roger Bivand
|
On Wed, 27 May 2009, Roger Bivand wrote:
> On Wed, 27 May 2009, Hamish wrote: > >> >> Roger wrote: >>> Next script in R generating increasing NR and NC cases through >>> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios? >> >> >> Does R have any built in profiling tools? as grass is just a collection >> of small C programs the normal ones work fine with it: > > Yes, at the R level, so they won't help here. readVECT6() banches on plugin - > if TRUE, it just calls readOGR() on the GRASS driver, if FALSE, it does > (something like) v.out.ogr with shapefile driver to a temporary file and > readOGR() on the shapefile. The C/C++ level code is in the (same) GDAL shared > object for v.out.ogr and readOGR(), so I think the only difference is in the > use or not of the plugin. > > My second post (testing v.out.ogr to shapefile against ogr2ogr from GRASS > plugin to shapefile for a many-column vector) should reveal where the problem > is - my present feeling is that the plugin and v.out.ogr handle access to the > GRASS vector and its attribute data differently in one way or another. The outcome was that for the 250 by 250 case, v.out.ogr ran at about 1 sec, and ogr2ogr using the plugin at 0.2 secs. In readVECT6() - running at 20 secs on the same data, the culprit is the C/C++ ogrDataFrame() function called by readOGR() in the rgdal package, which takes almost all of the 20 secs with the GRASS driver, but < 1 sec with the shapefile driver on the same data. I'll try to investigate further - there is an interaction that I don't understand. It is possible that ogrDataFrame() is inefficient in that it reads by column, sending the driver out by feature for each field. I'll look at alternatives. Roger > > Roger > >> >> http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool >> >> or use a profiling tool at the command line while running ogr2ogr with >> input=grass and output=shapefile? >> >> >> Nikos wrt your 10hr script: >> v.db.update must open and close the DB for every time you call it. That >> is very slow and inefficient. Better is to write all SQL update commands >> to a file (end each line with a ';') then use that file as input to >> db.execute so it opens the DB, updates all fields, closes DB again in >> a single step. see the db.execute man page and v.in.garmin script for an >> example (where it made a huge improvement). >> >> >> >> Hamish >> >> >> >> >> >> _______________________________________________ >> grass-stats mailing list >> [hidden email] >> http://lists.osgeo.org/mailman/listinfo/grass-stats >> > > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [hidden email] _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats Roger Bivand
Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway |
||||||||||||||||
|
Nikos Alexandris
|
In reply to this post
by hamish-2
Hamish: > Nikos wrt your 10hr script: > v.db.update must open and close the DB for every time you call it. > That is very slow and inefficient. > Better is to write all SQL update commands to a file (end each line > with a ';') then use that file as input to db.execute so it opens the > DB, updates all fields, closes DB again in a single step. see the > db.execute man page and v.in.garmin script for an example (where it > made a huge improvement). Thank a lot Hamish :-) I'll check-it out. _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats |
||||||||||||||||
|
Nikos Alexandris
|
In reply to this post
by Roger Bivand
Roger: > >> Next script in R generating increasing NR and NC cases through > >> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios? Hamish: > > Does R have any built in profiling tools? as grass is just a > collection > > of small C programs the normal ones work fine with it: Roger: > Yes, at the R level, so they won't help here. readVECT6() banches on > plugin - if TRUE, it just calls readOGR() on the GRASS driver, if > FALSE, it does (something like) v.out.ogr with shapefile driver to a > temporary file and readOGR() on the shapefile. The C/C++ level code is > in the (same) GDAL shared object for v.out.ogr and readOGR(), so I > think the only difference is in the use or not of the plugin. > My second post (testing v.out.ogr to shapefile against ogr2ogr from > GRASS plugin to shapefile for a many-column vector) should reveal > where the problem is - my present feeling is that the plugin and > v.out.ogr handle access to the GRASS vector and its attribute data > differently in one way or another. Good (and bad :-p) to see that there is room for improvement. C/C++ code is unknown territory from end-user level ( =requires time to explore... within this short life-time ;-) ). Kindest regards, Nikos _______________________________________________ grass-stats mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/grass-stats |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |