Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

4 messages Options
Embed this post
Permalink
Nikos Alexandris

Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

Reply Threaded More More options
Print post
Permalink
Dear grass-users and developers,

I was hesitant to send this on-line since I am a lousy scripter and by
no means a programmer. I posted this off-list to some grass-devs. I
apologise to them for this action.

I need some advice with respect to python scripting (within grass) and
if the code is working, naturally, I would like to share the script
(...maybe my first useful script that I can share) on the grass-wiki.

Ultimately, I wish to see in the future a (native if possible) grass
program (written by a real programmer) that will implement the "Pareto
Boundary", useful for the accuracy assessment of low resolution thematic
maps. I have filed a trac ticket: http://trac.osgeo.org/grass/ticket/804



About:

I wrote some code (for me it was/is a lot of code) in order to get
omission and commission errors out of a series of low resolution
dichotomic (classification derived from MODIS imagery) maps based on a
high(-er) resolution reference map (Landsat7) and plot then (using R
with a custom function) the "famous" Pareto Boundary [1] to compare and
judge which classification map is best (and successively which
classification method from the ones used performs better).



Question:

 How is merging/ calling the several scripts in/ from within just one
script done best? What strategy is best? I really have trouble to get
this done.



What is problematic:

  (a) Well, due to my ignorance and inexperience I have had some hard
time to decide which of the stuff can be hardcoded and which should not.
I have several hardcoded work-arounds since I don't know (yet) how to
deal with them properly.

  (b) The part where vector points are counted takes TOO much. I stole
the idea described in grass-wiki that uses v.what.vect and, if I am not
wrong, the bottleneck is v.distance (as already discussed in the list
some times in the past).

  (c) I am sure there are more problems... but I can't see them :-)



Working code:

I have 8 python scripts + 2 R which I have tested with small samples
from the spearfish60 dataset [*]. I have no idea how I should proceed:

  - make it one single file (it's a bit hard since I still learn python
stuff);

  - keep the separate and join them as modules in one module-script;

  - source variables, (grass) options and flags, etc. from a
"pareto_options_and_flags.py" module in which variables and the rest can
be set and sourced later to the module that will do the job



Attachement in grass-trac [2] contains:

### I took quite some time to comment (almost) everything so the code is
self-explained. Hopefully it really is like that - not sure since I am
no programmer. ###

  - the 8 scripts [in 2] (step - by - step to extract the
omission/commission errors and export them as csv);

  - R code to plot the boundary and the Oe,Ce of classification(s) [also
in 2]
(but I doubt my code is efficient/ smart).



Dr. Nikos Koutsias helped me understand the Pareto Boundary. Whether I
really did or not is, of course, my responsibility but I wouldn't have
done anything without his help.

Best regards, Nikos



-------------------------------------------------------------
[*] Test using spearfish60 data:
-------------------------------------------------------------

# testing with spearfish60 data
grass64 /geo/grassdb/spearfish60/user1/

# set region to
g.region s=4925000 e=593500 n=4927000 w=590000 res=30 -p

# use sqlite db-backend
db.connect driver=sqlite
database=/geo/grassdb/spearfish60/PERMANENT/sqlite.db
db.connect -p


### prepare input files

# high resolution reference raster map will be
g.copy rast=landcover.30m,landcover_ref

# high redolution reference Class of Interest (to be assessed) will be
r.mapcalc "landcover_refcoi = if(landcover.30m == 51 || landcover.30m ==
71 || landcover.30m == 81 || landcover.30m == 92, 2, null())"

# low resolution classification map 1 (to be assessed for accuracy) will
be
g.region res=100 -pa
r.mapcalc "pareto_classification_rangeland_1 = if (vegcover == 2, 2,
null())"

# prepare a 2nd classification map
g.copy
rast=pareto_classification_rangeland_1,pareto_classification_rangeland_2

# edit/ change 2nd map
d.rast.edit pareto_classification_rangeland_2
out=pareto_classification_rangeland_2_edited
g.remove pareto_classification_rangeland_2
g.rename
rast=pareto_classification_rangeland_2_edited,pareto_classification_rangeland_2


### extract omission and commission errors for pareto-optimal maps and
classifications

#### --- optionally SKIP steps and use the "ONE-LINER" below --- ####
#
# step 1
#python pareto_1_vectorise_rasters.py reference_raster=landcover_ref
reference_coi_rasters=landcover_refcoi
classification_rasters=pareto_classification_rangeland_1,pareto_classification_rangeland_2 --o

# step 2
#python pareto_2_create_lowres_vector_grid.py highres=30 lowres=100 --o
&& python

# step 3
#pareto_3_count_pixels_within_gridcells.py --v && python

# step 4
#pareto_4_calculate_coi_percentages.py --v && python

# step 5
#pareto_5_populate_thresholds_and_classifications.py lowres=100 --v

# step 6
#python pareto_6_populate_pareto_errors.py --v

# step 7
#python pareto_7_populate_classification_errors.py --v

# step 8
#python pareto_8_export_csv___spearfish60.py --o
#
#### --- optionally SKIP steps and use the "ONE-LINER" below --- ####


# ONE-LINER (copy-paste carefully)
python pareto_1_vectorise_rasters.py reference_raster=landcover_ref
reference_coi_rasters=landcover_refcoi
classification_rasters=pareto_classification_rangeland_1,pareto_classification_rangeland_2 --o --q    &&    python pareto_2_create_lowres_vector_grid.py highres=30 lowres=100 --o --q    &&    python pareto_3_count_pixels_within_gridcells.py --q     &&     python pareto_4_calculate_coi_percentages.py --q    &&    python pareto_5_populate_thresholds_and_classifications.py lowres=100 --q    &&    python pareto_6_populate_pareto_errors.py --q    &&    python pareto_7_populate_classification_errors.py --q    &&    python pareto_8_export_csv___spearfish60.py --q
# to remove pareto related vector files later execute "g.mremove
vect=pareto* -f"


### use epoxrted csv files to plot the Pareto Boundary (using the R
scripts within R)
-------------------------------------------------------------



P.S.

In my case I used the scripts for burned area maps. But this "boundary"
could be useful for all kinds of dichotomic maps and theoretically even
maps with multiple classes. Probably there are better/ smarter ways to
implement all this.

This was the first contact with python :-). All of it is related
directly with my questions about python scripting posted in grass-user
last month.


---
[1] Analysis of the conflict between omission and commission in low
spatial resolution dichotomic thematic products: The Pareto Boundary
Luigi Boschetti, Stephane P. Flasse, Pietro A. Brivio; published in
Remote Sensing of Environment 91 (2004) 280 – 292


My notes from the paper:

  • A set of Pareto-optimal classified images, with the properties
defined in the previous paragraph, could be obtained starting from the
high-resolution reference map (our ‘true’ data).

  • Each optimal classification will have a pair of error rates (Oe,Ce),
that uniquely identify a point in the omission/commission
Oe/Ce={(Oe,Ce)aR Â ROea[0,1], Cea[0,1]} bi-dimensional space (henceforth
noted Oe/Ce).

  • The line joining all these points is the Pareto Boundary related to
the specific high-resolution reference map and to a specific
low-resolution pixel size.

  • As all the points belonging to the Pareto Boundary represent the
error level of efficient classification, by the definition of efficiency
given in the previous paragraph, it follows that the Pareto Boundary
divides the Oe/Ce in two regions.

  • Any possible low-resolution thematic products can be associated only
to the points in the region above or, at the most, on the Pareto
Boundary.

  • The point (0,0), with neither omission nor commission error, will
unfortunately lie in the unreachable region!

  • The terms ‘efficient’ or ‘optimal’ will be used as synonyms and will
always mean efficient (optimal) under Pareto’s definition.



[2] Python scripts and R code:
http://trac.osgeo.org/grass/attachment/ticket/804/pareto_boundary_grass_python_R_scripts.tar.gz

python scritps:
- pareto_1_vectorise_rasters.py
- pareto_2_create_lowres_vector_grid.py
- pareto_3_count_pixels_within_gridcells.py
- pareto_4_calculate_coi_percentages.py
- pareto_5_populate_thresholds_and_classifications.py
- pareto_6_populate_pareto_errors.py
- pareto_7_populate_classification_errors.py
- pareto_8_export_csv__spearfish60.py

The steps implemented in the scripts are (with respect to my burned area
maps):

  1 Prepare low resolution grids
  2 Count burned (or Class of Interest) pixels in each low resolution
grid-cell
  3 Convert cells to points ( gridcell -> centroids -> point)
  4 Calculate percentages of burned (or Class of Interest) pixels for
each low resolution
grid-cell
  5 Add and populate threshold columns (txx) based on burned pixels
percentage
  6 Add classification results (Update from classification maps)
  7 Add and update columns with Oe, Ce (Omission error, Commission
error)
  8 Export Oe,Ce pairs as CSV

R code [The 1st code is supposed to be ready to become a function.
Currently all to be used "manually"!]
- pareto_boundary.R
- plot_pareto_boundary_plus_domination_regions.R

_______________________________________________
grass-user mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-user
Glynn Clements

Re: Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

Reply Threaded More More options
Print post
Permalink
Νίκος   Αλεξανδρής wrote:

>  How is merging/ calling the several scripts in/ from within just one
> script done best? What strategy is best? I really have trouble to get
> this done.

It's hard to say without understanding likely workflows.

However, if you want to make most of the code into Python modules, one
suggestion is to have the body of the script (the code in the
"if __name__ == '__main__':" block) read the various fields from the
"options" and "flags" variables and pass them to the functions as
parameters, rather than having the functions read global variables
(Python doesn't actually have global variables; "global" variables are
local to a module).

BTW, this is wrong:

            flags = "-o",\

It happens to work at present, but the correct method to pass the
"--o" flag is to use "overwrite = True" (analogous to "quiet = True"
for the "--q" flag and "verbose = True" for the "--v" flag).

If you want to enable these flags throughout the script, you can use:

        os.environ['GRASS_OVERWRITE'] = '1' # --o
        os.environ['GRASS_VERBOSE'] = '0' # --q
        os.environ['GRASS_VERBOSE'] = '3' # --v

In general, it's best to leave these settings to the user. If the user
runs the script with --o, --q, and/or --v, the corresponding
environment variables will be set by grass.parser(). If the user has
those variables set in their environment, the values will be inherited
by the script.

If you want to examine the settings use the overwrite() and/or
verbosity() functions in grass.script.

Also, for writing log messages, string.Template is preferable when
there are multiple substitutions. E.g. rather than:

        print """\n     + Vector grid of size %d rows x %d ( %s cells) columns created"""\
        % (rows, cols, cells)

using:
        print string.Template(
                """\n     + Vector grid of size ${rows} rows x ${cols} (${cells} cells) columns created"""
                ).substitute(rows = rows, cols = cols, cells = cells)

makes it easier to correlate the fields with the values. It also makes
it easier to localise the message, as the fields can be re-ordered if
needed.

If you do a lot of this, consider a utility function, e.g.:

        def log(msg, **kwargs):
                grass.message(string.Template(msg).substitute(**kwargs))

I'm wondering whether grass.message() etc should be extended in this
manner.

--
Glynn Clements <[hidden email]>
_______________________________________________
grass-user mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-user
Nikos Alexandris

Re: Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

Reply Threaded More More options
Print post
Permalink
Νίκος Αλεξανδρής wrote:
> >  How is merging/ calling the several scripts in/ from within just one
> > script done best? What strategy is best? I really have trouble to get
> > this done.

On Sun, 2009-11-08 at 08:19 +0000, Glynn Clements wrote:
> It's hard to say without understanding likely workflows.
>
> However, if you want to make most of the code into Python modules, one
> suggestion is to have the body of the script (the code in the
> "if __name__ == '__main__':" block) read the various fields from the
> "options" and "flags" variables and pass them to the functions as
> parameters, rather than having the functions read global variables
> (Python doesn't actually have global variables; "global" variables are
> local to a module).

OK.


> BTW, this is wrong:
>
>             flags = "-o",\
>
> It happens to work at present, but the correct method to pass the
> "--o" flag is to use "overwrite = True" (analogous to "quiet = True"
> for the "--q" flag and "verbose = True" for the "--v" flag).

OK, thanks! Maybe this can/ should be added in the wiki(?).


> If you want to enable these flags throughout the script, you can use:
>
> os.environ['GRASS_OVERWRITE'] = '1' # --o
> os.environ['GRASS_VERBOSE'] = '0' # --q
> os.environ['GRASS_VERBOSE'] = '3' # --v
>
> In general, it's best to leave these settings to the user.

Sometimes it is required to use them and prevent the user from _not_
using e.g. the "--o" flag (I think... ?).


> If the user
> runs the script with --o, --q, and/or --v, the corresponding
> environment variables will be set by grass.parser(). If the user has
> those variables set in their environment, the values will be inherited
> by the script.
>
> If you want to examine the settings use the overwrite() and/or
> verbosity() functions in grass.script.

# note-to-self:
# check functions overwrite(), verbosity() in grass.script


> Also, for writing log messages, string.Template is preferable when
> there are multiple substitutions. E.g. rather than:
>
> print """\n     + Vector grid of size %d rows x %d ( %s cells) columns created"""\
> % (rows, cols, cells)
>
> using:
> print string.Template(
> """\n     + Vector grid of size ${rows} rows x ${cols} (${cells} cells) columns created"""
> ).substitute(rows = rows, cols = cols, cells = cells)
>
> makes it easier to correlate the fields with the values. It also makes
> it easier to localise the message, as the fields can be re-ordered if
> needed.


That is (very) true. Cool!

> If you do a lot of this, consider a utility function, e.g.:
>
> def log(msg, **kwargs):
> grass.message(string.Template(msg).substitute(**kwargs))

This is supposed to reduce typing effort, right?

> I'm wondering whether grass.message() etc should be extended in this
> manner.

Thank you for your time Glynn, Nikos

_______________________________________________
grass-user mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-user
Glynn Clements

Re: Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

Reply Threaded More More options
Print post
Permalink

Νίκος   Αλεξανδρής wrote:

> > If you want to enable these flags throughout the script, you can use:
> >
> > os.environ['GRASS_OVERWRITE'] = '1' # --o
> > os.environ['GRASS_VERBOSE'] = '0' # --q
> > os.environ['GRASS_VERBOSE'] = '3' # --v
> >
> > In general, it's best to leave these settings to the user.
>
> Sometimes it is required to use them and prevent the user from _not_
> using e.g. the "--o" flag (I think... ?).

The presence or absence of --o should definitely be honoured.

Ideally, a script should try to use unique names for any temporary
maps it creates, so that it doesn't inadvertently overwrite the user's
maps (using "tmp" and the PID in the mapname is sufficient, e.g.
mapname.tmp.<pid>; if the user gives their own maps such names,
tough).

Using overwrite=True to overwrite temporary maps created within the
script is okay, but the final output maps should honour the --o and
$GRASS_OVERWRITE settings, i.e. don't complain about overwriting maps
if --o is used, and don't overwrite maps if it wasn't.

The script shouldn't need to handle this explicitly. If the --o flag
was given, GRASS_OVERWRITE will have been set to "1" by grass.parser(),
and this will automatically be inherited by any child processes. The
only thing that the script needs to do is to *not* explicitly use
overwrite=True on any commands which create the final output maps, but
allow their overwrite behaviour be controlled by $GRASS_OVERWRITE.

> > If the user
> > runs the script with --o, --q, and/or --v, the corresponding
> > environment variables will be set by grass.parser(). If the user has
> > those variables set in their environment, the values will be inherited
> > by the script.
> >
> > If you want to examine the settings use the overwrite() and/or
> > verbosity() functions in grass.script.
>
> # note-to-self:
> # check functions overwrite(), verbosity() in grass.script

It's seldom necessary to check these explicitly. The settings will be
inherited by child processes, and if you use grass.message(),
grass.verbose(), etc for messages, whether or not the messages are
displayed depends upon the $GRASS_VERSBOSE setting.

> > If you do a lot of this, consider a utility function, e.g.:
> >
> > def log(msg, **kwargs):
> > grass.message(string.Template(msg).substitute(**kwargs))
>
> This is supposed to reduce typing effort, right?

Yes, although it may also make the code easier to read and modify.

--
Glynn Clements <[hidden email]>
_______________________________________________
grass-user mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-user