[GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

6 messages Options
Embed this post
Permalink
Dylan Beaudette-2

[GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
On Tuesday 07 October 2008, Roger Bivand wrote:

> On Mon, 6 Oct 2008, Dylan Beaudette wrote:
> > Hi,
> >
> > I have noticed that saving data to files that include a DBF, result in
> > bogus data where there were NA. Using the write.dbf() function from
> > the foreign package seems to work a little better, but I still get odd
> > results in numeric columns. Writing to GRASS with the methods in the
> > spgrass6 package results in some thing that looks like this:
>
> Dylan,
>
> I'm afraid that there is no good solution for this at all. DBF does not
> seem to have a clear and uniform NA treatment (or even !is.finite()
> treatment). The only work-around is to preprocess the data.frame in the
> output object to insert known NODATA values, and to replace those flags
> manually on the GRASS side. This could possibly be written as a wrapper
> around writeVECT6(). The help page does say:
>
>      "Please note that the OGR drivers used may not handle missing data
>       gracefully, and be prepared to have to correct for this manually.
>       For example use of the 'readOGR' PostGIS driver directly may
>       perform better than moving the data through the DBF driver used in
>       this function - or a PostgreSQL driver used directly or through
>       ODBC may be a solution. Do not rely on missing values of vector
>       data moving smoothly across the interface."
>
> I did try to look at the SQLite driver on the GRASS side, which might be
> more robust, but did not see how to proceed.
>
> One possibility is not to recode, but to build an NA mask on the R side,
> and then loop over fields on the GRASS side for the chosen driver
> inserting NAs in the correct rows (whatever the syntax for that might be).
> Would this be db.execute with an insertion of SQL NULL?
>
> Can we redirect this discussion to the statgrass list, because GRASS
> developers follow that list?
>
> Best wishes,
>
> Roger

Sorry for the cross-posting. Wanted to clarify where this thread is
going/went.

Hi Roger--

It looks like the limiting factor in this equation is the code used in
v.out.ogr.

From the GRASS-dev + Frank W's help:

> > Sounds good :)
> > Does anyone know how to fix
> >  vector/v.out.ogr/main.c
> > to support NULLs? I see db_set_value_null() in
> >  lib/db/dbmi_base/value.c
> > which might be relevant.
>
> Markus,
>
> Once you establish which GRASS attributes are NULL, you can ensure they
> are pushed out to OGR as null by just skipping the step that sets them.
> Perhaps that will help a bit.


So, once v.out.ogr is fixed, this should clear up several issues:

1. import of vector data into R via spgrass6 methods
2. better compatibility of vector data exported from GRASS

I still do not know why writeOGR() does not create correct DBF files... it may
be related to the code in v.out.ogr....

Cheers,

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
hamish-2

Re: [GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
Dylan:
> It looks like the limiting factor in this equation is the
> code used in v.out.ogr.

maybe a silly question, but is a 3rd party format even needed here?

$ ogrinfo --formats | grep -i grass
  -> "GRASS" (readonly)

at least in the one direction.


?

Hamish



     

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Dylan Beaudette-2

Re: [GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
On Tue, Oct 7, 2008 at 6:26 PM, Hamish <[hidden email]> wrote:

> Dylan:
>> It looks like the limiting factor in this equation is the
>> code used in v.out.ogr.
>
> maybe a silly question, but is a 3rd party format even needed here?
>
> $ ogrinfo --formats | grep -i grass
>  -> "GRASS" (readonly)
>
> at least in the one direction.
>

Excellent question. I had also wondered about this. It looks like
there is a new argument in readVECT():

# read in directly with GDAL/OGR -- no intermediate file:
x <- readVECT6('xxx', plugin=TRUE)

This is quite fast and depends on the GDAL-GRASS plugin... However,
NULL data in a GRASS table is not imported correctly-- character
fields are imported as '', and numeric fields as 0.

So... Is the error in GDAL itself?

I tried inspecting a vector from GRASS with NULL data in some of the
columns from the table, using ogrinfo -al
location/mapset/vector/xxx/head

OGRFeature(1):23
  cat (Integer) = 24
  cat_ (Integer) = 24
  str1 (String) = (null)
  xyz (Real) = (null)
  abc (Integer) = (null)
  POINT (591583 4925280 0)

... which seems to correctly report the NULL values.

This leads me to suspect that something in readOGR() and writeOGR are
at fault in handling of NULL values.

Unfortunately looking at the rgdal source code wasn't very productive
(my fault).

Cheers,

Dylan
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: [GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
On Tue, 7 Oct 2008, Dylan Beaudette wrote:

> On Tue, Oct 7, 2008 at 6:26 PM, Hamish <[hidden email]> wrote:
>> Dylan:
>>> It looks like the limiting factor in this equation is the
>>> code used in v.out.ogr.

Following Dylan's posting on the GDAL list, and Frank's response, I
suggest the following simple patch to vector/v.out.ogr/main.c (here 6.3):

$ diff -u main.c_old main.c
--- main.c_old  2008-10-09 10:54:19.000000000 +0200
+++ main.c      2008-10-09 11:30:34.000000000 +0200
@@ -625,7 +625,9 @@
                         colsqltype = db_get_column_sqltype(Column);
                         colctype = db_sqltype_to_Ctype ( colsqltype );
                         G_debug (2, "  colctype = %d", colctype );
-                       switch ( colctype ) {
+/* RSB 081009 emit unset fields */
+                        if (!db_test_value_isnull(Value)) {
+                         switch ( colctype ) {
                              case DB_C_TYPE_INT:
                                 OGR_F_SetFieldInteger( Ogr_feature, j, db_get_va
lue_int(Value) );
                                 break;
@@ -639,7 +641,8 @@
                                 db_convert_column_value_to_string (Column,
&dbst
ring);
                                 OGR_F_SetFieldString( Ogr_feature, j, db_get_str
ing (&dbstring) );
                                 break;
-                       }
+                         }
+                        } /* RSB */
                     }
                 }
             }

In 6.4 this is after line 717. Essentially it just uses
db_test_value_isnull() not to set values in OGR fields if the DB field
value is NULL, and follows Frank's suggestion.

This matches code near line 939 in OGRGRASSLayer::SetAttributes()
gdal/ogr/ogr_frmts/grass/ogrgrasslayer.cpp in the vector plugin, which
uses:

if ( !db_test_value_isnull(value) )

I'm sure the patch needs checking, but with changes in the R rgdal package
to support vector null data correctly, it ought to improve the interface.

Best wishes,

Roger

>>
>> maybe a silly question, but is a 3rd party format even needed here?
>>
>> $ ogrinfo --formats | grep -i grass
>>  -> "GRASS" (readonly)
>>
>> at least in the one direction.
>>
>
> Excellent question. I had also wondered about this. It looks like
> there is a new argument in readVECT():
>
> # read in directly with GDAL/OGR -- no intermediate file:
> x <- readVECT6('xxx', plugin=TRUE)
>
> This is quite fast and depends on the GDAL-GRASS plugin... However,
> NULL data in a GRASS table is not imported correctly-- character
> fields are imported as '', and numeric fields as 0.
>
> So... Is the error in GDAL itself?
>
> I tried inspecting a vector from GRASS with NULL data in some of the
> columns from the table, using ogrinfo -al
> location/mapset/vector/xxx/head
>
> OGRFeature(1):23
>  cat (Integer) = 24
>  cat_ (Integer) = 24
>  str1 (String) = (null)
>  xyz (Real) = (null)
>  abc (Integer) = (null)
>  POINT (591583 4925280 0)
>
> ... which seems to correctly report the NULL values.
>
> This leads me to suspect that something in readOGR() and writeOGR are
> at fault in handling of NULL values.
>
> Unfortunately looking at the rgdal source code wasn't very productive
> (my fault).
>
> Cheers,
>
> Dylan
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Roger Bivand

Re: [GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
In reply to this post by hamish-2
On Tue, 7 Oct 2008, Hamish wrote:

> Dylan:
>> It looks like the limiting factor in this equation is the
>> code used in v.out.ogr.
>
> maybe a silly question, but is a 3rd party format even needed here?
>
> $ ogrinfo --formats | grep -i grass
>  -> "GRASS" (readonly)
>
> at least in the one direction.
>

Right - if the user has the plugin installed, it handles null vector
values correctly. However, up to rgdal 0.5-26, null vector values were
broken both for reading and writing OGR, and v.out.ogr in 6.3 and 6.4
doesn't handle them properly either. The plugin is autodetected and used
if present in spgrass6, so from rgdal > 0.5-26, it will be the default for
reading from GRASS.

Roger

>
> ?
>
> Hamish
>
>
>
>
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Dylan Beaudette-2

Re: [GRASS-stats] Re: [R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Reply Threaded More More options
Print post
Permalink
In reply to this post by Roger Bivand
On Thursday 09 October 2008, Roger Bivand wrote:

> On Tue, 7 Oct 2008, Dylan Beaudette wrote:
> > On Tue, Oct 7, 2008 at 6:26 PM, Hamish <[hidden email]> wrote:
> >> Dylan:
> >>> It looks like the limiting factor in this equation is the
> >>> code used in v.out.ogr.
>
> Following Dylan's posting on the GDAL list, and Frank's response, I
> suggest the following simple patch to vector/v.out.ogr/main.c (here 6.3):
>
> $ diff -u main.c_old main.c
> --- main.c_old  2008-10-09 10:54:19.000000000 +0200
> +++ main.c      2008-10-09 11:30:34.000000000 +0200
> @@ -625,7 +625,9 @@
>                          colsqltype = db_get_column_sqltype(Column);
>                          colctype = db_sqltype_to_Ctype ( colsqltype );
>                          G_debug (2, "  colctype = %d", colctype );
> -                       switch ( colctype ) {
> +/* RSB 081009 emit unset fields */
> +                        if (!db_test_value_isnull(Value)) {
> +                         switch ( colctype ) {
>                               case DB_C_TYPE_INT:
>                                  OGR_F_SetFieldInteger( Ogr_feature, j,
> db_get_va lue_int(Value) );
>                                  break;
> @@ -639,7 +641,8 @@
>                                  db_convert_column_value_to_string (Column,
> &dbst
> ring);
>                                  OGR_F_SetFieldString( Ogr_feature, j,
> db_get_str ing (&dbstring) );
>                                  break;
> -                       }
> +                         }
> +                        } /* RSB */
>                      }
>                  }
>              }
>
> In 6.4 this is after line 717. Essentially it just uses
> db_test_value_isnull() not to set values in OGR fields if the DB field
> value is NULL, and follows Frank's suggestion.

OK-- I am using GRASS 6.4 SVN. Since this is the current (stable) developer
release it might be good to stick with it. I am using r33792 .

Before making the suggested changes, I have tried exporting some point data
with v.out.ogr:

# known working GRASS file with categories
v.out.ogr -e -c in=a_temp dsn=v.out.ogr_example layer=a_temp type=point

Exporting 25 points/lines...
 100%
0 features written
WARNING: 25 features found without category skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This looks like a new problem (?)... I know that these points have categories,
why is v.out.ogr ignoring them?


Do these lines have anything to do with this:

v.out.ogr:434
Vect_cat_get(Cats, field, &cat);
            if (cat < 0 && !donocat) { /* Do not export not labeled */
                nocatskip++;
                continue;
            }

.... NO!
There is some kind of logic-related bug:

# does not export any features,
# complaining that they don't have categories
v.out.ogr -c in=xxx dsn=some_folder layer=new_shpfile

# exported expected features
v.out.ogr -c in=xxx dsn=new_shpfile


OK- posted here:
http://trac.osgeo.org/grass/ticket/327

The suggested fix appears to result in the creation of a DBF file (in the case
of shapefile) with properly encoded NULL.


Attached is a patch that applies this change to the develbranch-64 tree.

Should I make a ticket for this patch?

Cheers,

Dylan

> This matches code near line 939 in OGRGRASSLayer::SetAttributes()
> gdal/ogr/ogr_frmts/grass/ogrgrasslayer.cpp in the vector plugin, which
> uses:
>
> if ( !db_test_value_isnull(value) )
>
> I'm sure the patch needs checking, but with changes in the R rgdal package
> to support vector null data correctly, it ought to improve the interface.
>
> Best wishes,
>
> Roger
>
> >> maybe a silly question, but is a 3rd party format even needed here?
> >>
> >> $ ogrinfo --formats | grep -i grass
> >>  -> "GRASS" (readonly)
> >>
> >> at least in the one direction.
> >
> > Excellent question. I had also wondered about this. It looks like
> > there is a new argument in readVECT():
> >
> > # read in directly with GDAL/OGR -- no intermediate file:
> > x <- readVECT6('xxx', plugin=TRUE)
> >
> > This is quite fast and depends on the GDAL-GRASS plugin... However,
> > NULL data in a GRASS table is not imported correctly-- character
> > fields are imported as '', and numeric fields as 0.
> >
> > So... Is the error in GDAL itself?
> >
> > I tried inspecting a vector from GRASS with NULL data in some of the
> > columns from the table, using ogrinfo -al
> > location/mapset/vector/xxx/head
> >
> > OGRFeature(1):23
> >  cat (Integer) = 24
> >  cat_ (Integer) = 24
> >  str1 (String) = (null)
> >  xyz (Real) = (null)
> >  abc (Integer) = (null)
> >  POINT (591583 4925280 0)
> >
> > ... which seems to correctly report the NULL values.
> >
> > This leads me to suspect that something in readOGR() and writeOGR are
> > at fault in handling of NULL values.
> >
> > Unfortunately looking at the rgdal source code wasn't very productive
> > (my fault).
> >
> > Cheers,
> >
> > Dylan


--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

[attachment removed]
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats