Scatterplot "thinning" (points reduction)?

8 messages Options
Embed this post
Permalink
Markus Neteler

Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
Hi,

I am plotting elevation against temperature and have the problem that
including all points leads to heavy slow graphs... Reducing the raster
resolution is not a solution since it does not maintain the characteristics
of the graph (since GRASS is using nearest neighbor).

Since I am plotting in many cases one point almost over the other
a reduction should be reasonable - question is how to do that?

I am speaking about 3 plots in one graph, say "original", "interim"
and "final" (so, one plot() and two points()).

Any advice welcome,
Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Dylan Beaudette-2

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
Hi Markus,

Can you sample the original maps with v.random + v.what.rast first?
Or, after reading in the raster data into R, only compare every nth
cell? You can generate a sequence to index every nth cell like this:

s <- seq(from=1, to=n, by=10)

Then, subset your spatial data frame like this:

spdf[s,]

Another approach would be to use a density estimate in parameter space:

library(MASS)
x <- rnorm(1000)
y <- rnorm(1000)
dd <- kde2d(x,y)
contour(dd)
persp(dd, theta=-30, phi=30, d=5)
image(dd)


Cheers,

Dylan

On Sun, Aug 16, 2009 at 10:45 AM, Markus Neteler<[hidden email]> wrote:

> Hi,
>
> I am plotting elevation against temperature and have the problem that
> including all points leads to heavy slow graphs... Reducing the raster
> resolution is not a solution since it does not maintain the characteristics
> of the graph (since GRASS is using nearest neighbor).
>
> Since I am plotting in many cases one point almost over the other
> a reduction should be reasonable - question is how to do that?
>
> I am speaking about 3 plots in one graph, say "original", "interim"
> and "final" (so, one plot() and two points()).
>
> Any advice welcome,
> Markus
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Markus Neteler

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
Hi Dylan,

On Sun, Aug 16, 2009 at 8:02 PM, Dylan
Beaudette<[hidden email]> wrote:
> Hi Markus,
>
> Can you sample the original maps with v.random + v.what.rast first?

This gave me the idea: spsample() - I just read page 118 in "Applied
spatial data analysis with R" (great book, http://www.asdar-book.org/ !).

But apparently I am missing something... (I am certainly reading too fast):

# Spearfish example
> mydata <- readRAST6(c("elevation.dem","soils"))
> subdata <- spsample(mydata, 500, type= "random")

Looks good but...

> str(subdata)
Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ coords     : num [1:497, 1:2] 600572 596804 599446 597732 600336 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox       : num [1:2, 1:2] 590022 4914167 608953 4927981
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
  .. .. ..@ projargs: chr " +proj=utm +zone=13 +a=6378206.4
+rf=294.9786982 +no_defs
+nadgrids=/home/neteler/grass65/dist.x86_64-unknown-linux-gnu/etc/nad"|
__truncated__

... where are the data?

The original looks like:
> str(mydata)
Formal class 'SpatialGridDataFrame' [package "sp"] with 6 slots
  ..@ data       :'data.frame': 2654802 obs. of  2 variables:
  .. ..$ elevation.dem: int [1:2654802] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ soils        : int [1:2654802] NA NA NA NA NA NA NA NA NA NA ...
  ..@ grid       :Formal class 'GridTopology' [package "sp"] with 3 slots
  .. .. ..@ cellcentre.offset: Named num [1:2] 590015 4914025
  .. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"
  .. .. ..@ cellsize         : num [1:2] 10 10
  .. .. ..@ cells.dim        : int [1:2] 1899 1398
  ..@ grid.index : int(0)
  ..@ coords     : num [1:2, 1:2] 590015 608995 4914025 4927995
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox       : num [1:2, 1:2] 590010 4914020 609000 4928000
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
  .. .. ..@ projargs: chr " +proj=utm +zone=13 +a=6378206.4
+rf=294.9786982 +no_defs
+nadgrids=/home/neteler/grass65/dist.x86_64-unknown-linux-gnu/etc/nad"|
__truncated__

According to
showMethods("spsample")
grids are supported... I guess I am missing a trivial bit, might be
the time here...

thanks
Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Markus Neteler
On Sun, 16 Aug 2009, Markus Neteler wrote:

> Hi,
>
> I am plotting elevation against temperature and have the problem that
> including all points leads to heavy slow graphs... Reducing the raster
> resolution is not a solution since it does not maintain the characteristics
> of the graph (since GRASS is using nearest neighbor).

One point initially. I'm assuming that you are using a Linux platform - on
this platform, there is an order of magnitude speedup if you plot on
screen without "cairo", the default x11 type= - try using type="Xlib",
which is much faster but not so refined.

Given that, consider the cex= argument for varying symbol size, and maybe
the pch="." possibility for using a single pt. point. They still all get
drawn, so there is no time saving, but they may be more visible.

For very large data sets, consider hexbin() in the hexbin package - I'm
not sure how best to display three data sets. For single scatterplots, it
is very powerful. Maybe contours of 2D densities of the extra data sets
could be overlaid over a base hexbin plot? There is an informative
vignette in hexbin.

Hope this helps,

Roger

>
> Since I am plotting in many cases one point almost over the other
> a reduction should be reasonable - question is how to do that?
>
> I am speaking about 3 plots in one graph, say "original", "interim"
> and "final" (so, one plot() and two points()).
>
> Any advice welcome,
> Markus
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Markus Neteler

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
On Mon, Aug 17, 2009 at 9:33 AM, Roger Bivand<[hidden email]> wrote:

> On Sun, 16 Aug 2009, Markus Neteler wrote:
>
>> Hi,
>>
>> I am plotting elevation against temperature and have the problem that
>> including all points leads to heavy slow graphs... Reducing the raster
>> resolution is not a solution since it does not maintain the
>> characteristics
>> of the graph (since GRASS is using nearest neighbor).
>
> One point initially. I'm assuming that you are using a Linux platform - on
> this platform, there is an order of magnitude speedup if you plot on screen
> without "cairo", the default x11 type= - try using type="Xlib", which is
> much faster but not so refined.

(yes, Linux)
I have searched around bit I am not entirely sure to which function
this type parameter belongs.

> Given that, consider the cex= argument for varying symbol size, and maybe
> the pch="." possibility for using a single pt. point. They still all get
> drawn, so there is no time saving, but they may be more visible.

I am currently plotting like this:
plot(data$dem ~ data$raw)
points(data$dem ~ data$filt2, col="yellow", cex=0.5, pch=3)
points(data$dem ~ data$rst, col="green", xlab="LST value [°C]",
ylab="elevation [m]", pch=2)
abline(lm(data$dem ~ data$raw))
abline(lm(data$dem ~ data$filt2), col="yellow")
abline(lm(data$dem ~ data$rst), col="green", xlab="LST value [°C]",
ylab="elevation [m]")

So the backgound (largest) cloud comes in back circles,
the interim (smaller) in yellow crosses with many of them in the circles,
and the upper point could (smallest) in green triangles.
I guess the real problem are the 826896 * 3 points in the plot.

> For very large data sets, consider hexbin() in the hexbin package - I'm not
> sure how best to display three data sets. For single scatterplots, it is
> very powerful. Maybe contours of 2D densities of the extra data sets could
> be overlaid over a base hexbin plot? There is an informative vignette in
> hexbin.

Oh, this is interesting! Thanks,
Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
On Mon, 17 Aug 2009, Markus Neteler wrote:

> On Mon, Aug 17, 2009 at 9:33 AM, Roger Bivand<[hidden email]> wrote:
>> On Sun, 16 Aug 2009, Markus Neteler wrote:
>>
>>> Hi,
>>>
>>> I am plotting elevation against temperature and have the problem that
>>> including all points leads to heavy slow graphs... Reducing the raster
>>> resolution is not a solution since it does not maintain the
>>> characteristics
>>> of the graph (since GRASS is using nearest neighbor).
>>
>> One point initially. I'm assuming that you are using a Linux platform - on
>> this platform, there is an order of magnitude speedup if you plot on screen
>> without "cairo", the default x11 type= - try using type="Xlib", which is
>> much faster but not so refined.
>
> (yes, Linux)
> I have searched around bit I am not entirely sure to which function
> this type parameter belongs.
In x11() to open the screen graphics device - by default it opens by
itself with type="cairo" when needed, you you have to open it manually
with the non-default type, or use use X11.options() to have the
automatically opened devices used "Xlib". Generally, "cairo" is
preferable, but slower. I'd probably leave "cairo", and use hexbin()
instead.

Roger

>
>> Given that, consider the cex= argument for varying symbol size, and maybe
>> the pch="." possibility for using a single pt. point. They still all get
>> drawn, so there is no time saving, but they may be more visible.
>
> I am currently plotting like this:
> plot(data$dem ~ data$raw)
> points(data$dem ~ data$filt2, col="yellow", cex=0.5, pch=3)
> points(data$dem ~ data$rst, col="green", xlab="LST value [°C]",
> ylab="elevation [m]", pch=2)
> abline(lm(data$dem ~ data$raw))
> abline(lm(data$dem ~ data$filt2), col="yellow")
> abline(lm(data$dem ~ data$rst), col="green", xlab="LST value [°C]",
> ylab="elevation [m]")
>
> So the backgound (largest) cloud comes in back circles,
> the interim (smaller) in yellow crosses with many of them in the circles,
> and the upper point could (smallest) in green triangles.
> I guess the real problem are the 826896 * 3 points in the plot.
>
>> For very large data sets, consider hexbin() in the hexbin package - I'm not
>> sure how best to display three data sets. For single scatterplots, it is
>> very powerful. Maybe contours of 2D densities of the extra data sets could
>> be overlaid over a base hexbin plot? There is an informative vignette in
>> hexbin.
>
> Oh, this is interesting! Thanks,
> Markus
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Dylan Beaudette

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Markus Neteler
On Monday 17 August 2009, Markus Neteler wrote:

> On Mon, Aug 17, 2009 at 9:33 AM, Roger Bivand<[hidden email]> wrote:
> > On Sun, 16 Aug 2009, Markus Neteler wrote:
> >> Hi,
> >>
> >> I am plotting elevation against temperature and have the problem that
> >> including all points leads to heavy slow graphs... Reducing the raster
> >> resolution is not a solution since it does not maintain the
> >> characteristics
> >> of the graph (since GRASS is using nearest neighbor).
> >
> > One point initially. I'm assuming that you are using a Linux platform -
> > on this platform, there is an order of magnitude speedup if you plot on
> > screen without "cairo", the default x11 type= - try using type="Xlib",
> > which is much faster but not so refined.
>
> (yes, Linux)
> I have searched around bit I am not entirely sure to which function
> this type parameter belongs.
>
> > Given that, consider the cex= argument for varying symbol size, and maybe
> > the pch="." possibility for using a single pt. point. They still all get
> > drawn, so there is no time saving, but they may be more visible.
>
> I am currently plotting like this:
> plot(data$dem ~ data$raw)
> points(data$dem ~ data$filt2, col="yellow", cex=0.5, pch=3)
> points(data$dem ~ data$rst, col="green", xlab="LST value [°C]",
> ylab="elevation [m]", pch=2)
> abline(lm(data$dem ~ data$raw))
> abline(lm(data$dem ~ data$filt2), col="yellow")
> abline(lm(data$dem ~ data$rst), col="green", xlab="LST value [°C]",
> ylab="elevation [m]")

just a quick note on style:

# simpler notation:
plot(dem ~ raw, data=data, xlab="LST value [°C]", ylab="elevation [m]")
points(dem ~ filt2, data=data, col="yellow", cex=0.5, pch=3)
points(dem ~ rst, data=data, col="green", pch=2)


also note that 'xlab' and other related commands need to be put into a  
high-level plotting command like 'plot()'

finally, you might be able to make the plot in one command with some
incantation of xyplot() from the lattice package.

did you have a chance to try the kde2() function from MASS?

Cheers,
Dylan



> So the backgound (largest) cloud comes in back circles,
> the interim (smaller) in yellow crosses with many of them in the circles,
> and the upper point could (smallest) in green triangles.
> I guess the real problem are the 826896 * 3 points in the plot.
>
> > For very large data sets, consider hexbin() in the hexbin package - I'm
> > not sure how best to display three data sets. For single scatterplots, it
> > is very powerful. Maybe contours of 2D densities of the extra data sets
> > could be overlaid over a base hexbin plot? There is an informative
> > vignette in hexbin.
>
> Oh, this is interesting! Thanks,
> Markus
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats



--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Markus Neteler

Re: Scatterplot "thinning" (points reduction)?

Reply Threaded More More options
Print post
Permalink
On Mon, Aug 17, 2009 at 6:55 PM, Dylan Beaudette<[hidden email]> wrote:
> On Monday 17 August 2009, Markus Neteler wrote:
>> On Mon, Aug 17, 2009 at 9:33 AM, Roger Bivand<[hidden email]> wrote:
>> > On Sun, 16 Aug 2009, Markus Neteler wrote:
...
> just a quick note on style:
>
> # simpler notation:
> plot(dem ~ raw, data=data, xlab="LST value [°C]", ylab="elevation [m]")
> points(dem ~ filt2, data=data, col="yellow", cex=0.5, pch=3)
> points(dem ~ rst, data=data, col="green", pch=2)

Good point, thanks.

> also note that 'xlab' and other related commands need to be put into a
> high-level plotting command like 'plot()'

Sorry, I messed it up in my previous email: of course it works as you
indicate above...

> finally, you might be able to make the plot in one command with some
> incantation of xyplot() from the lattice package.
>
> did you have a chance to try the kde2() function from MASS?

Will try!

Best
Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats