Loading a point-vector table with 466 columns

29 messages Options
Embed this post
Permalink
1 2
Nikos Alexandris

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
In reply to this post by Roger Bivand
(Cc to Even Roualt ; Apologies to Even since he is not subscribed in the
list)

Roger:
> >> Three minutes instead of thirty+ suggests that the OGR
> >> plugin has trouble with SQLite as the DB format. So maybe
> >> the default for plugin= should be FALSE, not NULL and automatic
> >> use if present?

--%<--
> Could you, Nikos,
> make a script generating a similar table in spearfish, and two small
> scripts exercising the problem (export to R with the plugin, and with
> the temporary shapefile.

* The "problem" exists also with the default DBF as a back-end. I
created 1000 random points, filled less than half of the records with
random numbers and readVECT6("x", plugin=TRUE) takes again too much. I
broke the process since it was running for more than 20 mins.

* A script is pasted on the bottom which has a small "bug" (details
below) :-)


First some results for 1000 rows by 500 columns:

> system.time(random_points <- readVECT6("random_points_1000",
plugin=TRUE))
OGR data source with driver: GRASS
Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head",
layer: "1"
with  1000  rows and  501  columns
^C
### This was running for more than 10 hours !!! ###


> system.time(random_points <- readVECT6("random_points_1000",
plugin=FALSE))
Exporting 1000 points/lines...
 100%
1000 features written
OGR data source with driver: ESRI Shapefile
Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer:
"random_p"
with  1000  rows and  501  columns
Feature type: wkbPoint with 2 dimensions
   user  system elapsed
 62.515   9.256  74.013


> system.time(random_points <- read.csv("random_points_1000_table.csv"))
   user  system elapsed
  0.192   0.000   0.192



* A script to generate "some" random points, add columns and some R-code
to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and
read.csv.

* The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) +
1 ]" runs ok under the CLI, it doesn't work from within the bash
script!? So I've commented the respective line and use a fixed number
instead.

--%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-
#!/bin/bash

# example that  readVECT6 ( x , plugin = TRUE )  is too slow
# (also) using the default DBF driver
# first enter in spearfish60/user1

# try with a different back-end?
# db.connect driver=sqlite database=

# set numbers here:
RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111

# create RANDOM_POINTS random points
v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo
${RANDOM_POINTS}`

# add in database
v.db.addtable random_points_`echo ${RANDOM_POINTS}`


# add   $"{RANDOM_POINTS}"   columns
echo "\n* Adding ${RANDOM_POINTS} columns"
for x in `seq 1 ${RANDOM_POINTS}` ; do
 v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}"
integer"
done ; echo "\n* ${RANDOM_POINTS} columns added"


# check if columns are added
v.info -c random_points_${RANDOM_POINTS}



## WARNING: double loop below takes too long!
# --%<--
# It is simpler and faster to use a single loop with a fixed value
instead, e.g.:
  #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do
  # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
value=222
  #done
# --%<--


# fill some columns/cats with random numbers between 1 and 100
# alter sequence as desired ; more numbers = more time to load in R
for COL in `seq 1 10 ${RANDOM_POINTS}` ; do
 for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do
  # this is ok in the command line but NOT when running the script?
  #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]"
  v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
value=${NUMBER} where="cat="${CAT}""
 done
done


# [optional] fill in some "-999" values to use as NAs in R?
#NAN=-999
#for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do
# for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do
#  v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}""
value=$"{NAN}" where="cat="${CAT}""
# done
#done

# check with v.db.select
# v.db.select random_points_${RANDOM_POINTS} | head -25

# export table as .csv file
db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV
dsn=/geo/grassdb/spearfish60/random_points_csv_files
db_table=random_points_${RANDOM_POINTS}.csv

### end of bash script ###


## launch R
R
### R code

# load in R with:
library(spgrass6) ; G <- gmeta6()

 #a. readVECT6()
 system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
= FALSE ) )

 #b. plugin=TRUE
 system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
= TRUE ) )

 #c. as a csv table
 # adjust as required
 setwd("/geo/grassdb/spearfish60/random_points_csv_files")
 table_to_read <- dir ( pattern = "^random.*.csv$" )
 system.time ( random_points <- read.csv ( table_to_read ) )
 str(random_points)
--%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
On Tue, 2009-05-26 at 12:48 +0200, Nikos Alexandris wrote:

> First some results for 1000 rows by 500 columns:
>
> > system.time(random_points <- readVECT6("random_points_1000",
> plugin=TRUE))
> OGR data source with driver: GRASS
> Source:
> "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head",
> layer: "1"
> with  1000  rows and  501  columns
> ^C
Correction:
> ### This was running for more than 10 hours !!! ###
This was NOT running for more than 10 hours!! It was the double loop
that was running for more than 10 hours to generate 1000 by 500 random
numbers.

Anyway, the above readVECT6() was running for at least 20 minutes while
the readVECT6(plugin=FALSE) and read.csv do the job within acceptable
times.

Apologies again,
Nikos

P.S. Script attached as a .sh file with lots of empty lines between code
to make it easy to read.


_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats

random_vector_points_for_spgrass6.sh (2K) Download Attachment
Roger Bivand

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris
On Tue, 26 May 2009, Nikos Alexandris wrote:

> (Cc to Even Roualt ; Apologies to Even since he is not subscribed in the
> list)
>
> Roger:
>>>> Three minutes instead of thirty+ suggests that the OGR
>>>> plugin has trouble with SQLite as the DB format. So maybe
>>>> the default for plugin= should be FALSE, not NULL and automatic
>>>> use if present?
>
> --%<--
>> Could you, Nikos,
>> make a script generating a similar table in spearfish, and two small
>> scripts exercising the problem (export to R with the plugin, and with
>> the temporary shapefile.
>
> * The "problem" exists also with the default DBF as a back-end. I
> created 1000 random points, filled less than half of the records with
> random numbers and readVECT6("x", plugin=TRUE) takes again too much. I
> broke the process since it was running for more than 20 mins.

OK. With 250 rows and 250 columns, I see an order of magnitude saving with
plugin=FALSE. In plugin=FALSE, the times are split equally between writing
the temporary file from GRASS with v.out.ogr, and reading it into R with
readOGR(), as one might expect (that is all readVECT6(..., plugin=FALSE)
is doing). Even on a small vector (bugsites, 90 points, 2 attribute
columns), plugin=FALSE is faster than plugin=TRUE by about 0.75 : 1.35,
not quite twice. Which way does the problem scale, in numbers of features,
numbers of attribute columns, or both?

Next script in R generating increasing NR and NC cases through
writeVECT6() to test plugin=FALSE/plugin=TRUE ratios?

Roger

>
> * A script is pasted on the bottom which has a small "bug" (details
> below) :-)
>
>
> First some results for 1000 rows by 500 columns:
>
>> system.time(random_points <- readVECT6("random_points_1000",
> plugin=TRUE))
> OGR data source with driver: GRASS
> Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head",
> layer: "1"
> with  1000  rows and  501  columns
> ^C
> ### This was running for more than 10 hours !!! ###
>
>
>> system.time(random_points <- readVECT6("random_points_1000",
> plugin=FALSE))
> Exporting 1000 points/lines...
> 100%
> 1000 features written
> OGR data source with driver: ESRI Shapefile
> Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer:
> "random_p"
> with  1000  rows and  501  columns
> Feature type: wkbPoint with 2 dimensions
>   user  system elapsed
> 62.515   9.256  74.013
>
>
>> system.time(random_points <- read.csv("random_points_1000_table.csv"))
>   user  system elapsed
>  0.192   0.000   0.192
>
>
>
> * A script to generate "some" random points, add columns and some R-code
> to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and
> read.csv.
>
> * The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) +
> 1 ]" runs ok under the CLI, it doesn't work from within the bash
> script!? So I've commented the respective line and use a fixed number
> instead.
>
> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-
> #!/bin/bash
>
> # example that  readVECT6 ( x , plugin = TRUE )  is too slow
> # (also) using the default DBF driver
> # first enter in spearfish60/user1
>
> # try with a different back-end?
> # db.connect driver=sqlite database=
>
> # set numbers here:
> RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111
>
> # create RANDOM_POINTS random points
> v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo
> ${RANDOM_POINTS}`
>
> # add in database
> v.db.addtable random_points_`echo ${RANDOM_POINTS}`
>
>
> # add   $"{RANDOM_POINTS}"   columns
> echo "\n* Adding ${RANDOM_POINTS} columns"
> for x in `seq 1 ${RANDOM_POINTS}` ; do
> v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}"
> integer"
> done ; echo "\n* ${RANDOM_POINTS} columns added"
>
>
> # check if columns are added
> v.info -c random_points_${RANDOM_POINTS}
>
>
>
> ## WARNING: double loop below takes too long!
> # --%<--
> # It is simpler and faster to use a single loop with a fixed value
> instead, e.g.:
>  #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do
>  # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
> value=222
>  #done
> # --%<--
>
>
> # fill some columns/cats with random numbers between 1 and 100
> # alter sequence as desired ; more numbers = more time to load in R
> for COL in `seq 1 10 ${RANDOM_POINTS}` ; do
> for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do
>  # this is ok in the command line but NOT when running the script?
>  #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]"
>  v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
> value=${NUMBER} where="cat="${CAT}""
> done
> done
>
>
> # [optional] fill in some "-999" values to use as NAs in R?
> #NAN=-999
> #for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do
> # for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do
> #  v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}""
> value=$"{NAN}" where="cat="${CAT}""
> # done
> #done
>
> # check with v.db.select
> # v.db.select random_points_${RANDOM_POINTS} | head -25
>
> # export table as .csv file
> db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV
> dsn=/geo/grassdb/spearfish60/random_points_csv_files
> db_table=random_points_${RANDOM_POINTS}.csv
>
> ### end of bash script ###
>
>
> ## launch R
> R
> ### R code
>
> # load in R with:
> library(spgrass6) ; G <- gmeta6()
>
> #a. readVECT6()
> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
> = FALSE ) )
>
> #b. plugin=TRUE
> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
> = TRUE ) )
>
> #c. as a csv table
> # adjust as required
> setwd("/geo/grassdb/spearfish60/random_points_csv_files")
> table_to_read <- dir ( pattern = "^random.*.csv$" )
> system.time ( random_points <- read.csv ( table_to_read ) )
> str(random_points)
> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Roger Bivand

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
On Wed, 27 May 2009, Roger Bivand wrote:

> On Tue, 26 May 2009, Nikos Alexandris wrote:
>
>> (Cc to Even Roualt ; Apologies to Even since he is not subscribed in the
>> list)
>>
>> Roger:
>>>>> Three minutes instead of thirty+ suggests that the OGR
>>>>> plugin has trouble with SQLite as the DB format. So maybe
>>>>> the default for plugin= should be FALSE, not NULL and automatic
>>>>> use if present?
>>
>> --%<--
>>> Could you, Nikos,
>>> make a script generating a similar table in spearfish, and two small
>>> scripts exercising the problem (export to R with the plugin, and with
>>> the temporary shapefile.
>>
>> * The "problem" exists also with the default DBF as a back-end. I
>> created 1000 random points, filled less than half of the records with
>> random numbers and readVECT6("x", plugin=TRUE) takes again too much. I
>> broke the process since it was running for more than 20 mins.
>
> OK. With 250 rows and 250 columns, I see an order of magnitude saving with
> plugin=FALSE. In plugin=FALSE, the times are split equally between writing
> the temporary file from GRASS with v.out.ogr, and reading it into R with
> readOGR(), as one might expect (that is all readVECT6(..., plugin=FALSE) is
> doing). Even on a small vector (bugsites, 90 points, 2 attribute columns),
> plugin=FALSE is faster than plugin=TRUE by about 0.75 : 1.35, not quite
> twice. Which way does the problem scale, in numbers of features, numbers of
> attribute columns, or both?
>
> Next script in R generating increasing NR and NC cases through writeVECT6()
> to test plugin=FALSE/plugin=TRUE ratios?

And we also need to check whether the same applies to use of the plugin in
other settings - I'm pretty certain this isn't just differential behaviour
in readOGR() between the OGR shapefile driver and the OGR GRASS vector
driver. Could someone test v.out.ogr against ogr2ogr using the plugin?

Roger

>
> Roger
>
>>
>> * A script is pasted on the bottom which has a small "bug" (details
>> below) :-)
>>
>>
>> First some results for 1000 rows by 500 columns:
>>
>>> system.time(random_points <- readVECT6("random_points_1000",
>> plugin=TRUE))
>> OGR data source with driver: GRASS
>> Source: "/geo/grassdb/spearfish60/user1/vector/random_points_1000/head",
>> layer: "1"
>> with  1000  rows and  501  columns
>> ^C
>> ### This was running for more than 10 hours !!! ###
>>
>>
>>> system.time(random_points <- readVECT6("random_points_1000",
>> plugin=FALSE))
>> Exporting 1000 points/lines...
>> 100%
>> 1000 features written
>> OGR data source with driver: ESRI Shapefile
>> Source: "/geo/grassdb/spearfish60/user1/.tmp/vertical", layer:
>> "random_p"
>> with  1000  rows and  501  columns
>> Feature type: wkbPoint with 2 dimensions
>>   user  system elapsed
>> 62.515   9.256  74.013
>>
>>
>>> system.time(random_points <- read.csv("random_points_1000_table.csv"))
>>   user  system elapsed
>>  0.192   0.000   0.192
>>
>>
>>
>> * A script to generate "some" random points, add columns and some R-code
>> to load with readVECT6( plugin = TRUE ), readVECT6( plugin = FALSE ) and
>> read.csv.
>>
>> * The "bug" is that while the variable NUMBER="$[ ( $RANDOM % 100 ) +
>> 1 ]" runs ok under the CLI, it doesn't work from within the bash
>> script!? So I've commented the respective line and use a fixed number
>> instead.
>>
>> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-
>> #!/bin/bash
>>
>> # example that  readVECT6 ( x , plugin = TRUE )  is too slow
>> # (also) using the default DBF driver
>> # first enter in spearfish60/user1
>>
>> # try with a different back-end?
>> # db.connect driver=sqlite database=
>>
>> # set numbers here:
>> RANDOM_POINTS=100 ; RANDOM_POINTS_CATS=100 ; NUMBER=111
>>
>> # create RANDOM_POINTS random points
>> v.random --o output=random_points_`echo ${RANDOM_POINTS}` n=`echo
>> ${RANDOM_POINTS}`
>>
>> # add in database
>> v.db.addtable random_points_`echo ${RANDOM_POINTS}`
>>
>>
>> # add   $"{RANDOM_POINTS}"   columns
>> echo "\n* Adding ${RANDOM_POINTS} columns"
>> for x in `seq 1 ${RANDOM_POINTS}` ; do
>> v.db.addcol random_points_`echo ${RANDOM_POINTS}` column="col_"${x}"
>> integer"
>> done ; echo "\n* ${RANDOM_POINTS} columns added"
>>
>>
>> # check if columns are added
>> v.info -c random_points_${RANDOM_POINTS}
>>
>>
>>
>> ## WARNING: double loop below takes too long!
>> # --%<--
>> # It is simpler and faster to use a single loop with a fixed value
>> instead, e.g.:
>>  #for COL in `seq 1 5 ${RANDOM_POINTS}` ; do
>>  # v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
>> value=222
>>  #done
>> # --%<--
>>
>>
>> # fill some columns/cats with random numbers between 1 and 100
>> # alter sequence as desired ; more numbers = more time to load in R
>> for COL in `seq 1 10 ${RANDOM_POINTS}` ; do
>> for CAT in `seq 1 10 ${RANDOM_POINTS_CATS}` ; do
>>  # this is ok in the command line but NOT when running the script?
>>  #NUMBER="$[ ( $RANDOM % 100 ) + 1 ]"
>>  v.db.update random_points_${RANDOM_POINTS} column="col_"${COL}""
>> value=${NUMBER} where="cat="${CAT}""
>> done
>> done
>>
>>
>> # [optional] fill in some "-999" values to use as NAs in R?
>> #NAN=-999
>> #for COL in `seq 1 5 $"{RANDOM_POINTS}"` ; do
>> # for CAT in `seq 1 5 $"{RANDOM_POINTS_CATS}"` ; do
>> #  v.db.update random_points_$"{RANDOM_POINTS}" column="col_"${COL}""
>> value=$"{NAN}" where="cat="${CAT}""
>> # done
>> #done
>>
>> # check with v.db.select
>> # v.db.select random_points_${RANDOM_POINTS} | head -25
>>
>> # export table as .csv file
>> db.out.ogr in=random_points_${RANDOM_POINTS} format=CSV
>> dsn=/geo/grassdb/spearfish60/random_points_csv_files
>> db_table=random_points_${RANDOM_POINTS}.csv
>>
>> ### end of bash script ###
>>
>>
>> ## launch R
>> R
>> ### R code
>>
>> # load in R with:
>> library(spgrass6) ; G <- gmeta6()
>>
>> #a. readVECT6()
>> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
>> = FALSE ) )
>>
>> #b. plugin=TRUE
>> system.time ( random_points <- readVECT6 ( "random_points_100" , plugin
>> = TRUE ) )
>>
>> #c. as a csv table
>> # adjust as required
>> setwd("/geo/grassdb/spearfish60/random_points_csv_files")
>> table_to_read <- dir ( pattern = "^random.*.csv$" )
>> system.time ( random_points <- read.csv ( table_to_read ) )
>> str(random_points)
>> --%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<---%<-
>>
>>
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
hamish-2

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris

Roger wrote:
> Next script in R generating increasing NR and NC cases through
> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios?


Does R have any built in profiling tools? as grass is just a collection
of small C programs the normal ones work fine with it:

  http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool

or use a profiling tool at the command line while running ogr2ogr with
input=grass and output=shapefile?


Nikos wrt your 10hr script:
v.db.update must open and close the DB for every time you call it. That
is very slow and inefficient. Better is to write all SQL update commands
to a file (end each line with a ';') then use that file as input to
db.execute so it opens the DB, updates all fields, closes DB again in
a single step. see the db.execute man page and v.in.garmin script for an
example (where it made a huge improvement).



Hamish



     

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
On Wed, 27 May 2009, Hamish wrote:

>
> Roger wrote:
>> Next script in R generating increasing NR and NC cases through
>> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios?
>
>
> Does R have any built in profiling tools? as grass is just a collection
> of small C programs the normal ones work fine with it:

Yes, at the R level, so they won't help here. readVECT6() banches on
plugin - if TRUE, it just calls readOGR() on the GRASS driver, if FALSE,
it does (something like) v.out.ogr with shapefile driver to a temporary
file and readOGR() on the shapefile. The C/C++ level code is in the
(same) GDAL shared object for v.out.ogr and readOGR(), so I think the only
difference is in the use or not of the plugin.

My second post (testing v.out.ogr to shapefile against ogr2ogr from GRASS
plugin to shapefile for a many-column vector) should reveal where the
problem is - my present feeling is that the plugin and v.out.ogr handle
access to the GRASS vector and its attribute data differently in one way
or another.

Roger

>
>  http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool
>
> or use a profiling tool at the command line while running ogr2ogr with
> input=grass and output=shapefile?
>
>
> Nikos wrt your 10hr script:
> v.db.update must open and close the DB for every time you call it. That
> is very slow and inefficient. Better is to write all SQL update commands
> to a file (end each line with a ';') then use that file as input to
> db.execute so it opens the DB, updates all fields, closes DB again in
> a single step. see the db.execute man page and v.in.garmin script for an
> example (where it made a huge improvement).
>
>
>
> Hamish
>
>
>
>
>
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Roger Bivand

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
On Wed, 27 May 2009, Roger Bivand wrote:

> On Wed, 27 May 2009, Hamish wrote:
>
>>
>> Roger wrote:
>>> Next script in R generating increasing NR and NC cases through
>>> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios?
>>
>>
>> Does R have any built in profiling tools? as grass is just a collection
>> of small C programs the normal ones work fine with it:
>
> Yes, at the R level, so they won't help here. readVECT6() banches on plugin -
> if TRUE, it just calls readOGR() on the GRASS driver, if FALSE, it does
> (something like) v.out.ogr with shapefile driver to a temporary file and
> readOGR() on the shapefile. The C/C++ level code is in the (same) GDAL shared
> object for v.out.ogr and readOGR(), so I think the only difference is in the
> use or not of the plugin.
>
> My second post (testing v.out.ogr to shapefile against ogr2ogr from GRASS
> plugin to shapefile for a many-column vector) should reveal where the problem
> is - my present feeling is that the plugin and v.out.ogr handle access to the
> GRASS vector and its attribute data differently in one way or another.

The outcome was that for the 250 by 250 case, v.out.ogr ran at about 1
sec, and ogr2ogr using the plugin at 0.2 secs. In readVECT6() - running at
20 secs on the same data, the culprit is the C/C++ ogrDataFrame() function
called by readOGR() in the rgdal package, which takes almost all of the 20
secs with the GRASS driver, but < 1 sec with the shapefile driver on the
same data. I'll try to investigate further - there is an interaction that
I don't understand. It is possible that ogrDataFrame() is inefficient in
that it reads by column, sending the driver out by feature for each field.
I'll look at alternatives.

Roger

>
> Roger
>
>>
>>  http://grass.osgeo.org/wiki/Bugs#Using_a_profiling_tool
>>
>> or use a profiling tool at the command line while running ogr2ogr with
>> input=grass and output=shapefile?
>>
>>
>> Nikos wrt your 10hr script:
>> v.db.update must open and close the DB for every time you call it. That
>> is very slow and inefficient. Better is to write all SQL update commands
>> to a file (end each line with a ';') then use that file as input to
>> db.execute so it opens the DB, updates all fields, closes DB again in
>> a single step. see the db.execute man page and v.in.garmin script for an
>> example (where it made a huge improvement).
>>
>>
>>
>> Hamish
>>
>>
>>
>>
>>
>> _______________________________________________
>> grass-stats mailing list
>> [hidden email]
>> http://lists.osgeo.org/mailman/listinfo/grass-stats
>>
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Nikos Alexandris

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
In reply to this post by hamish-2

Hamish:
> Nikos wrt your 10hr script:
> v.db.update must open and close the DB for every time you call it.
> That is very slow and inefficient.

> Better is to write all SQL update commands to a file (end each line
> with a ';') then use that file as input to db.execute so it opens the
> DB, updates all fields, closes DB again in a single step. see the
> db.execute man page and v.in.garmin script for an example (where it
> made a huge improvement).

Thank a lot Hamish :-) I'll check-it out.

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Loading a point-vector table with 466 columns

Reply Threaded More More options
Print post
Permalink
In reply to this post by Roger Bivand

Roger:
> >> Next script in R generating increasing NR and NC cases through
> >> writeVECT6() to test plugin=FALSE/plugin=TRUE ratios?

Hamish:
> > Does R have any built in profiling tools? as grass is just a
> collection
> > of small C programs the normal ones work fine with it:

Roger:
> Yes, at the R level, so they won't help here. readVECT6() banches on
> plugin - if TRUE, it just calls readOGR() on the GRASS driver, if
> FALSE, it does (something like) v.out.ogr with shapefile driver to a
> temporary file and readOGR() on the shapefile. The C/C++ level code is
> in the (same) GDAL shared object for v.out.ogr and readOGR(), so I
> think the only difference is in the use or not of the plugin.

> My second post (testing v.out.ogr to shapefile against ogr2ogr from
> GRASS plugin to shapefile for a many-column vector) should reveal
> where the problem is - my present feeling is that the plugin and
> v.out.ogr handle access to the GRASS vector and its attribute data
> differently in one way or another.

Good (and bad :-p) to see that there is room for improvement. C/C++ code
is unknown territory from end-user level ( =requires time to explore...
within this short life-time ;-) ).

Kindest regards, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
1 2