Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

40 messages Options
Embed this post
Permalink
1 2
Nikos Alexandris

Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink

Apologies for starting a new thread. My post is long and I thought it's
better to start an extra-testing thread. I did some tests and I present
them below in the following order:

 1. Briefly about PCA
 2. The modules/functions that implement PCA in GRASS & R
 3. My claims (Entitled Comments)
 4. Evidence (=the numbers derived from i.pca, prcomp, princomp,
m.eigensystem using some MODIS surface reflectance bands).


Finally all is clear _but_ one thing: the only "unknown" variable (to
me) is still the Eigenvalue provided by i.pca. I can't nail this one:
*** how is it calculated? *** Looks like it is some _weighted_
variance... !??

The only thing I noticed with some testing (not thoroughly though) is
that the Eigenvalue (=Variance) reported by "i.pca" is ~0.58 of the
variance reported by "prcomp()" (sdev^2) !? That is:

     sqrt(Eigenvalue(i.pca)) / sdev(prcomp) = 0.58xxx


Hamish, if you have the time (or anyone , could you please "translate"
the code in "grass6_dev/imagery/i.pca/main.c", concerning the "eigval"
variable, in some pseudocode. It is really the last thing to clarify ( I
think ).


Kindest regards, Nikos


------------------------
# 1. PCA ### ### ### ###

*. PCA is performed via:

   - method 1 (eigenvectors) : by determining the eigenvalues and
eigenvectors of a given covariance or correlation matrix.

         - covariance matrix -> non-standardised
         - correlation matrix -> standardised

   - method 2 (SVD): based on by a singular value decomposition of the
data matrix. This is a more general solution to PCA than the
"eigenvectors".

     ...The difference between the two methods is explained in [1]...
     ...SVD is used for numerical accuracy... (R Documentation)
     ...Results between the two methos can differ a bit...

     ...Data centering reduces the square mean error of
*approximating* the input data
     ...Data scaling


*. PCA returns:

   - eigenvalues (=variance _or_ sdev^2)
   - eigenvectors (=loadings, rotation, weighting coefficient)



-------------------------------------------
# 2. Implementations of PCA ### ### ### ###

* R's implementation of:

   - method 1 (eigen) is the _princomp()_ function

      -*applies*:
         \data centering
         \data scaling

      -returns:
         \sdev  -  standard deviation of principal components (that is:
the sqrt(Variance)
         \loadings  -  eigenvectors


   - method 2 (SVD) is the _prcomp()_ function

      -options:
         \data centering (center= TRUE | FALSE) [default is TRUE]
         \data scaling (scale=TRUE | FALSE) [default is FALSE]

      -returns:
         \sdev (as above)
         \rotation (same as loadings)



* GRASS' implementation of:

   - method 1 (eigen) is the _m.eigensystem_ module

      -returns:
         E -> eigenvalue -> Variance(E) -> sdev(E)^2
         V -> eigenvectors associated with E
         N -> normalized eigenvectors V --> This is _data centering_.
         W -> N vector multiplied by the square root of the magnitude of
      the eigenvalue (E), in other words: W = N * sdev(E)
         

   - method 2 (SVD) is the   _i.pca_ module

      -returns:
         \Eigenvalues (latest version by Hamish)
         \Eigenvectors



-----------------------------
# 3. Comments ### ### ### ###

# 1: it is (?) meaningless to compare the numerical results
(eigenvectors) of "i.pca" with the results of "m.eigensystem" since they
implement two different algebraic solutions to PCA with different
options (data centering is NOT performed by i.pca). The different
algebraic solutions don't give big differences. Data centering does.


# 2: "i.pca" corresponds to "prcomp( ,center=FALSE,scale=FALSE)" and
"m.eigensystem" corresponds to "princomp()"


# 3: The standard deviations (sdev) reported by prcomp(,center=TRUE,
scale=FALSE) _match_ the variances (=eigenvalues = sdev^2) reported by
"m.eigensystem".

# 4: Obviously, "prcomp()" with center=TRUE and scaling=FALSE by default
gives (almost) same results as "princomp()".



------------------------------------------
# 4. Comparing PCA results ### ### ### ###

# prcomp() #############################################################

### Details for prcomp {stats} from R Documentation
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using eigen on the
covariance matrix. This is generally the preferred method for numerical
accuracy.
## more details in [2]
#Default settings of prcomp(): _center=TRUE_ and _scale=FALSE_

### with center=FALSE, scale=FALSE
prcomp(mod07, center=FALSE, scale=FALSE) <<== this corresponds to i.pca

Standard deviations:
[1] 4288.3788  476.8904  114.3971

Rotation:
                                    PC1        PC2        PC3
MOD2007_242_500_sur_refl_b02 -0.6353238  0.7124070 -0.2980602
MOD2007_242_500_sur_refl_b06 -0.6485551 -0.2826985  0.7067234
MOD2007_242_500_sur_refl_b07 -0.4192135 -0.6423066 -0.6416403


### with center=TRUE, scale=FALSE
prcomp(mod07, center=TRUE, scale=FALSE)
Standard deviations:
[1] 857.5749 436.0928 108.5085

Rotation:
                                   PC1         PC2        PC3
MOD2007_242_500_sur_refl_b02 0.4184468  0.83881578 -0.3482677
MOD2007_242_500_sur_refl_b06 0.7249935 -0.07751872  0.6843794
MOD2007_242_500_sur_refl_b07 0.5470710 -0.53886820 -0.6405735


### with center=TRUE, scale=TRUE
prcomp(mod07, center=TRUE, scale=TRUE)
Standard deviations:
[1] 1.5030740 0.8397807 0.1885121

Rotation:
                                   PC1        PC2        PC3
MOD2007_242_500_sur_refl_b02 0.4807060  0.8202848 -0.3099267
MOD2007_242_500_sur_refl_b06 0.6561893 -0.1020540  0.7476634
MOD2007_242_500_sur_refl_b07 0.5816677 -0.5627768 -0.5873201


### with center=FALSE, scale=TRUE
prcomp(mod07, center=FALSE, scale=TRUE)
Standard deviations:
[1] 1.71889117 0.20787871 0.04689961

Rotation:
                                    PC1         PC2        PC3
MOD2007_242_500_sur_refl_b02 -0.5747640  0.74015136 -0.3490305
MOD2007_242_500_sur_refl_b06 -0.5812896 -0.06907305  0.8107597
MOD2007_242_500_sur_refl_b07 -0.5759763 -0.66888331 -0.4699430


# i.pca ##############################################################

# perform pca
i.pca input=MOD07_b02,MOD07_b06,MOD07_b07 output=TEST

Eigen values, (vectors), and [percent importance]:
PC1 6307563.04 ( -0.64 -0.65 -0.42 ) [ 98.71% ]
PC2 78023.63 ( -0.71 0.28 0.64 ) [ 1.22% ]
PC3 4504.60 ( -0.30 0.71 -0.64 ) [ 0.07% ]

### Comment: Comparing with prcomp()'s results it is obvious that
"i.pca" implements the SVD method _without_ centering and _without_
scaling the data prior to the analysis. ###




# princomp() - cov #####################################################

# Details for princomp {stats} from R Documentation
The calculation is done using eigen on the correlation or covariance
matrix


### using _covariance_ matrix
# prints only standard deviations
princomp(mod07)

Call:
princomp(x = mod07)

Standard deviations:
  Comp.1   Comp.2   Comp.3
857.5737 436.0922 108.5083

 3  variables and  350596 observations.


# get loadings (=rotation = eigenvectors)
(princomp(mod07))$loadings

Loadings:
                             Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.418  0.839  0.348
MOD2007_242_500_sur_refl_b06 -0.725        -0.684
MOD2007_242_500_sur_refl_b07 -0.547 -0.539  0.641

               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000



# m.eigensystem - cov #################################################

# Determine eigen values and vectors of _covariance_ matrix
## Marked with "#" are the _N_'s that correspond to the eigenvectors
derived from R's princomp() ##
(echo 3; r.covar MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem

r.covar: complete ...
100%
E    778244.0258462029          .0000000000    79.20
V          .5006581842          .0000000000
V          .8256483300          .0000000000
V          .6155834548          .0000000000
N      #   .4372107421   #      .0000000000
N      #   .7210155161   #      .0000000000
N      #   .5375717557   #      .0000000000
W       385.6991853500          .0000000000
W       636.0664787886          .0000000000
W       474.2358050886          .0000000000
E    192494.5769628266          .0000000000    19.59
V         -.8689798010          .0000000000
V          .0996340298          .0000000000
V          .5731134848          .0000000000
N      #  -.8309940700   #      .0000000000
N      #   .0952787255   #      .0000000000
N      #   .5480609638   #      .0000000000
W      -364.5920328433          .0000000000
W        41.8027823088          .0000000000
W       240.4573848757          .0000000000
E     11876.4548199713          .0000000000     1.21
V          .2872248982          .0000000000
V         -.5731591248          .0000000000
V          .5351449518          .0000000000
N      #   .3439413070   #      .0000000000
N      #  -.6863370819   #      .0000000000
N      #   .6408165005   #      .0000000000
W        37.4824307850          .0000000000
W       -74.7964308085          .0000000000
W        69.8356366100          .0000000000



# princomp() - cov #####################################################

### using _correlation_ matrix
princomp(mod07, cor=TRUE)

Call:
princomp(x = mod07, cor = TRUE)

Standard deviations:
   Comp.1    Comp.2    Comp.3
1.5030740 0.8397807 0.1885121

 3  variables and  350596 observations.


# get loadings
(princomp(mod07, cor=TRUE))$loadings

Loadings:
                             Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.481  0.820  0.310
MOD2007_242_500_sur_refl_b06 -0.656 -0.102 -0.748
MOD2007_242_500_sur_refl_b07 -0.582 -0.563  0.587

               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000




# m.eigensystem - corr #################################################

# Determine eigen values and vectors of _correlation_ matrix
## Marked with "#" are the _N_'s that correspond to the eigenvectors
derived from R's princomp() ##
(echo 3; r.covar -r MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem

r.covar: complete ...
 100%
E         2.2915877718          .0000000000    76.39
V         -.5755655569          .0000000000
V         -.7660355041          .0000000000
V         -.6809380186          .0000000000
 N     #   -.4896413269   #      .0000000000
 N     #   -.6516766616   #      .0000000000
 N     #   -.5792830912   #      .0000000000
W         -.7412186091          .0000000000
W         -.9865075560          .0000000000
W         -.8769182329          .0000000000
E          .6740687010          .0000000000    22.47
V          .8667178982          .0000000000
V         -.1116525720          .0000000000
V         -.6069908335          .0000000000
 N      #   .8145815825   #      .0000000000
 N      #  -.1049362531   #      .0000000000
 N      #  -.5704780699   #      .0000000000
W          .6687852213          .0000000000
W         -.0861544341          .0000000000
W         -.4683721194          .0000000000
E          .0343435272          .0000000000     1.14
V          .2486404469          .0000000000
V         -.6006166822          .0000000000
V          .4655120098          .0000000000
 N      #   .3109794470   #      .0000000000
 N      #  -.7512029762   #      .0000000000
 N      #   .5822249325   #      .0000000000
W          .0576307320          .0000000000
W         -.1392129859          .0000000000
W          .1078979635          .0000000000


[1] http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf

[2] Copy-pasted from R Documentation

### prcomp() returns the following:

   1. sdev
          the standard deviations of the principal components (i.e.,
          the square roots of the eigenvalues of the cov./cor. matrix,
          though the calculation is actually done with the singular
          values of the data matrix).

   2. rotation
          the matrix of variable loadings (i.e., a matrix whose
          columns contain the eigenvectors). The function princomp
          returns this in the element loadings.

   * Note

          The signs of the columns of the rotation matrix are
          arbitrary, and so may differ between different programs
          for PCA, and even between different builds of R.

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
On Tue, 2009-03-03 at 07:27 +0100, Nikos Alexandris wrote:
> # 2. Implementations of PCA ### ### ### ###
>
> * R's implementation of:
>
>    - method 1 (eigen) is the _princomp()_ function
>
>       -*applies*:
>          \data centering
>          \data scaling

Hmm... sorry, this should be _only_ data centering.

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
hamish-2

Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris

Nikos:

> Finally all is clear _but_ one thing: the only "unknown" variable (to
> me) is still the Eigenvalue provided by i.pca. I can't nail this one:
> *** how is it calculated? *** Looks like it is some _weighted_
> variance... !??
....
> Hamish, if you have the time (or anyone , could you please "translate"
> the code in "grass6_dev/imagery/i.pca/main.c", concerning the "eigval"
> variable, in some pseudocode. It is really the last thing to clarify ( I
> think ).

I don't really have the time, I am just about to leave the office for a
few weeks.

but quickly:

for i.pca they are calculated by the eigen() function in lib/gmath/eigen.c
see also lib/gmath/eigen_tools.c  

unfortunately the programmer decided to use all one letter variables
there and no comments.. but maybe the math is clear enough to tell which
method is used.


it will be really good to finally have all this documented.


Hamish



     

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Markus Neteler

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
On Tue, Mar 3, 2009 at 12:26 PM, Hamish <[hidden email]> wrote:
...
>
> it will be really good to finally have all this documented.

I find it hard to follow these long mails. Why not enjoying the
GRASS Wiki to stabilize the documentation and comparisons?

Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:
> On Tue, Mar 3, 2009 at 12:26 PM, Hamish <[hidden email]> wrote:
> ...
> >
> > it will be really good to finally have all this documented.
>
> I find it hard to follow these long mails. Why not enjoying the
> GRASS Wiki to stabilize the documentation and comparisons?
>
> Markus

Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
before we are sure about the one last and very important variable
(=eigenvalues reported by  i.pca)?

Kind regards, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by hamish-2

Nikos:
> Eigenvalue provided by i.pca. I can't nail this one:
> > *** how is it calculated? *** Looks like it is some _weighted_
> > variance... !??
> ....
> > Hamish, if you have the time (or anyone , could you please "translate"
> > the code in "grass6_dev/imagery/i.pca/main.c", concerning the "eigval"
> > variable, in some pseudocode. It is really the last thing to clarify ( I
> > think ).

Hamish:
> for i.pca they are calculated by the eigen() function in lib/gmath/eigen.c
> see also lib/gmath/eigen_tools.c  


I am trying to get a grip on eigen.c. The variables that refer to the
eigenvalues are symbolised as:

   double *lambda  OR  lambda

which is the

   double d[]  as defined  IN  G_tqli()  IN  eigen_tools.c

And then there is this "fabs" in the way :-) I suppose it returns the
absolute value of a signed number... It took me so long to figure this
simple thing out. Well, there are so many variables... it's
hard(=time-consuming) for the unexperienced eye to trace down the real
math.

As suggested by Markus, I will try to move this in the Wiki.
Apologies for the long posts again, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris
On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote:

> On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:
> > On Tue, Mar 3, 2009 at 12:26 PM, Hamish <[hidden email]> wrote:
> > ...
> > >
> > > it will be really good to finally have all this documented.
> >
> > I find it hard to follow these long mails. Why not enjoying the
> > GRASS Wiki to stabilize the documentation and comparisons?
> >
> > Markus
>
> Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
> before we are sure about the one last and very important variable
> (=eigenvalues reported by  i.pca)?
>
> Kind regards, Nikos

I've started the PCA grass-wiki page. I am trying to build it... step by
step. Expert advice is always welcome and highly appreciated.

http://grass.osgeo.org/wiki/Principal_Component_Analysis

Kind regards, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
On Tue, 10 Mar 2009, Nikos Alexandris wrote:

> On Tue, 2009-03-03 at 14:26 +0100, Nikos Alexandris wrote:
>> On Tue, 2009-03-03 at 13:18 +0100, Markus Neteler wrote:
>>> On Tue, Mar 3, 2009 at 12:26 PM, Hamish <[hidden email]> wrote:
>>> ...
>>>>
>>>> it will be really good to finally have all this documented.
>>>
>>> I find it hard to follow these long mails. Why not enjoying the
>>> GRASS Wiki to stabilize the documentation and comparisons?
>>>
>>> Markus
>>
>> Yes, why not? Hmmm, you mean to put these stuff already in the Wiki
>> before we are sure about the one last and very important variable
>> (=eigenvalues reported by  i.pca)?
>>
>> Kind regards, Nikos
>
> I've started the PCA grass-wiki page. I am trying to build it... step by
> step. Expert advice is always welcome and highly appreciated.
>
> http://grass.osgeo.org/wiki/Principal_Component_Analysis

Good, thanks. There you say that you are using "some MODIS surface
reflectance products". I guess it will be easier to check things if the
data (GRASS location) are available, so that others can try the same
calculations. Would it be possible to make one or more test sets
available, and link them from thw Wiki?

Roger

>
> Kind regards, Nikos
>
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink

 Nikos:
> > I've started the PCA grass-wiki page. I am trying to build it... step by
> > step. Expert advice is always welcome and highly appreciated.
> > http://grass.osgeo.org/wiki/Principal_Component_Analysis


 Roger:
> Good, thanks. There you say that you are using "some MODIS surface
> reflectance products". I guess it will be easier to check things if the
> data (GRASS location) are available, so that others can try the same
> calculations. Would it be possible to make one or more test sets
> available, and link them from thw Wiki?


Absolutely! I need some time though (=some days?, currently
overloaded :-).

Kindest regards, Nikos

P.S. I was informed today that I have to wait another 2 weeks to get the
book "Applied Spatial Data Analysis with R" :-(

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
Roger:
> > Good, thanks. There you say that you are using "some MODIS surface
> > reflectance products". I guess it will be easier to check things if the
> > data (GRASS location) are available, so that others can try the same
> > calculations. Would it be possible to make one or more test sets
> > available, and link them from thw Wiki?

Finally,

I am, sort of, free again :-). I have prepaired a grass-location
containing the 3 modis bands I've used for all PCA related examples. I
will eventually upload them in gregis [1] but if there is another place
(GRASS-wiki?) to upload them would be also ok.

Nikos:
> P.S. I was informed today that I have to wait another 2 weeks to get the
> book "Applied Spatial Data Analysis with R" :-(

I have received a copy of the book. Just excellent. I would like to see
more books like this one :-). Congratulations to the authors.

Cheers, Nikos
---

[1] http://gregis.berlios.de

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink

Roger:
> > > Good, thanks. There you say that you are using "some MODIS surface
> > > reflectance products". I guess it will be easier to check things if the
> > > data (GRASS location) are available, so that others can try the same
> > > calculations. Would it be possible to make one or more test sets
> > > available, and link them from thw Wiki?

Nikos:
> Finally,
> I am, sort of, free again :-).

Sorry for being late. Couldn't make it earlier. I've updated the
wiki-page [1].


> I have prepaired a grass-location
> containing the 3 modis bands I've used for all PCA related examples. I
> will eventually upload them in gregis [check previous post] but if there is another place
> (GRASS-wiki?) to upload them would be also ok.

I have currently some access problem in gregis this why I can't upload
the MODIS bands there. Is there any other place to put them for now?


Now, I am more confused than before. Or maybe not...

There seems to be no problem with the SPOT images, that is, the results
of i.pca are (almost) identical with the results of R's prcomp(x,
center=TRUE) *with* centering. This is completely different than with
the example using MODIS data.

Perhaps it depends on the range of the input data?! The SPOT bands are
up to max. 255. The MODIS bands are:

# band 2
r.info -r mod_b2

min=0
max=5504

# band 6
r.info -r mod_b6

min=93
max=5488

# band 7
r.info -r mod_b7

min=0
max=4133


I ran the same PCA tests (in R) with the _rescaled_ as like with the
"original" MODIS bands (using r.rescale) but no luck, I get the same
contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_.

I suspect that the range of the input data is the reason... but I am not
sure.  This is why I tried to repeat the test with _rescaled_ MODIS
bands. Given that the "r.rescale" products are kind of a "rules" file
under the "cats" directory, is "r.rescale" safe? Should I instead
rescale the data with r.mapcalc or is there no difference between
r.rescale and r.mapcalc?

All explained in the *under construction* wiki. Kind regards, Nikos
---

[1] http://grass.osgeo.org/wiki/Principal_Component_Analysis

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Edzer Pebesma-2

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
I tried to improve the wiki page, but never got access.
--
Edzer


>
> Roger:
>> > > Good, thanks. There you say that you are using "some MODIS surface
>> > > reflectance products". I guess it will be easier to check things if
>> the
>> > > data (GRASS location) are available, so that others can try the same
>> > > calculations. Would it be possible to make one or more test sets
>> > > available, and link them from thw Wiki?
>
> Nikos:
>> Finally,
>> I am, sort of, free again :-).
>
> Sorry for being late. Couldn't make it earlier. I've updated the
> wiki-page [1].
>
>
>> I have prepaired a grass-location
>> containing the 3 modis bands I've used for all PCA related examples. I
>> will eventually upload them in gregis [check previous post] but if there
>> is another place
>> (GRASS-wiki?) to upload them would be also ok.
>
> I have currently some access problem in gregis this why I can't upload
> the MODIS bands there. Is there any other place to put them for now?
>
>
> Now, I am more confused than before. Or maybe not...
>
> There seems to be no problem with the SPOT images, that is, the results
> of i.pca are (almost) identical with the results of R's prcomp(x,
> center=TRUE) *with* centering. This is completely different than with
> the example using MODIS data.
>
> Perhaps it depends on the range of the input data?! The SPOT bands are
> up to max. 255. The MODIS bands are:
>
> # band 2
> r.info -r mod_b2
>
> min=0
> max=5504
>
> # band 6
> r.info -r mod_b6
>
> min=93
> max=5488
>
> # band 7
> r.info -r mod_b7
>
> min=0
> max=4133
>
>
> I ran the same PCA tests (in R) with the _rescaled_ as like with the
> "original" MODIS bands (using r.rescale) but no luck, I get the same
> contradictory results: _i.pca_ matches with _prcomp(x, center=FALSE)_.
>
> I suspect that the range of the input data is the reason... but I am not
> sure.  This is why I tried to repeat the test with _rescaled_ MODIS
> bands. Given that the "r.rescale" products are kind of a "rules" file
> under the "cats" directory, is "r.rescale" safe? Should I instead
> rescale the data with r.mapcalc or is there no difference between
> r.rescale and r.mapcalc?
>
> All explained in the *under construction* wiki. Kind regards, Nikos
> ---
>
> [1] http://grass.osgeo.org/wiki/Principal_Component_Analysis
>
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>


_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris
Agustin:
> Nikos, I'm probably missing a message (I do not see which is
> exactly the difference between the spot and the modis example),

Hi Agus. The difference is:

Using SPOT (range up to max. 255):
i.pca == prcomp(x, center=TRUE, scale=FALSE)

Using MODIS (range varries between bands, up to max. ~5500):
i.pca == prcomp(x, center=FALSE, scale=FALSE)


>  but Perhaps you must use r.mapcalc to rescale and not r.rescale, or
> if using r.rescale you must use r.mapcalc afterwards to rewrite the
> output raster, as r.rescale is modifying the categories only and not
> the actual raster values which is what i.pca is probably using. Maybe,
> as i.pca is quite old, it considers the categories only up to 255. In
> any case, using r.mapcalc you know better what you are doing.

That's the confirmation I needed. I think rescaling the MODIS data to
0~255 will confirm that:

IF "r.info -r" =< max. 255
THEN i.pca == prcomp(x, center=TRUE, scale=FALSE)


> Also, note that you must rescale
> using the mean, not max min values. In other words
> r.mapcalc "band1c = band1 - mean(band1)"

Thanks for the tip. Maybe this can be added in the grass-wiki FAQ !?

Cheers, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Edzer Pebesma-2
On Tue, 2009-03-31 at 12:56 +0200, Edzer Pebesma wrote:
> I tried to improve the wiki page, but never got access.
> --
> Edzer

Edzer, it definitely needs to be improved. Hopefully you'll get access
while the thing is boiling.

@admins: Please, we need access ;-)
Kindest regards, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Markus Metz-2

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris

Nikos Alexandris wrote:

> Agustin:
>  
>> Nikos, I'm probably missing a message (I do not see which is
>> exactly the difference between the spot and the modis example),
>>    
>
> Hi Agus. The difference is:
>
> Using SPOT (range up to max. 255):
> i.pca == prcomp(x, center=TRUE, scale=FALSE)
>
> Using MODIS (range varries between bands, up to max. ~5500):
>  
Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day
L3 Global 500m.
As I would use the max possible for SPOT (255) I would also use the max
possible for other sensors, not the observed max, otherwise statistics
that don't do proper normalization on the fly are not comparable between
different sensors.
But any PCA should do normalization beforehand, not sure what R or i.pca
are doing.
Please excuse my random comments :-)

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Edzer Pebesma-2

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Nikos Alexandris
With Markus' help I got access, and corrected/extended quite a few things,
including why R sometimes doesn't print loadings (when they're close to
0).

I do worry about the wiki that states results are "similar" between R and
grass, when the differences appear e.g. in the third or second digit of a
loading (eigenvector value). Clearly something is is going on, here.
--
Edzer

> On Tue, 2009-03-31 at 12:56 +0200, Edzer Pebesma wrote:
>> I tried to improve the wiki page, but never got access.
>> --
>> Edzer
>
> Edzer, it definitely needs to be improved. Hopefully you'll get access
> while the thing is boiling.
>
> @admins: Please, we need access ;-)
> Kindest regards, Nikos
>
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>


_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Markus Metz-2

Agustin:  
> >> Nikos, I'm probably missing a message (I do not see which is
> >> exactly the difference between the spot and the modis example),
> >>    

Nikos:
> > Hi Agus. The difference is:
> > Using SPOT (range up to max. 255):
> > i.pca == prcomp(x, center=TRUE, scale=FALSE)
> > Using MODIS (range varries between bands, up to max. ~5500):
[...]

Markus M:
> Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day
> L3 Global 500m.

I am referring to the range of the specific MODIS bands that I've used,
that is:
# band 2
r.info -r mod_b2

min=0
max=5504

# band 6
r.info -r mod_b6

min=93
max=5488

# band 7
r.info -r mod_b7

min=0
max=4133


> As I would use the max possible for SPOT (255) I would also use the max
> possible for other sensors, not the observed max, otherwise statistics
> that don't do proper normalization on the fly are not comparable between
> different sensors.

> But any PCA should do normalization beforehand, not sure what R or i.pca
> are doing.

That's the question _now_ (if I am right with my tests): why isn't
normalisation (=data centering in this case) performed with the MODIS
bands I use?


> Please excuse my random comments :-)

Markus, your comments are required :-)
Cheers, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Markus Neteler

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Markus Metz-2
On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz
<[hidden email]> wrote:
> Nikos Alexandris wrote:
>> Using MODIS (range varries between bands, up to max. ~5500):
>>
>
> Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3
> Global 500m.

Remember that you have to rescale these data first:

https://lpdaac.usgs.gov/lpdaac/products/modis_products_table
-> MOD09Q1   Terra   Surface Reflectance Bands 1–2   Tile   250m   8 Day
    https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra
   -> layers

... MULTIPLY BY SCALE FACTOR ...

best,
Markus
_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
On Tue, 2009-03-31 at 17:13 +0200, Markus Neteler wrote:

> On Tue, Mar 31, 2009 at 5:00 PM, Markus Metz
> <[hidden email]> wrote:
> > Nikos Alexandris wrote:
> >> Using MODIS (range varries between bands, up to max. ~5500):
> >>
> >
> > Sure? Valid range: -100 - 16000 for MODIS e.g. Surface Reflectance 8-Day L3
> > Global 500m.
>
> Remember that you have to rescale these data first:
>
> https://lpdaac.usgs.gov/lpdaac/products/modis_products_table
> -> MOD09Q1   Terra   Surface Reflectance Bands 1–2   Tile   250m   8 Day
>     https://lpdaac.usgs.gov/lpdaac/products/modis_products_table/surface_reflectance/8_day_l3_global_250m/v5/terra
>    -> layers
>
> ... MULTIPLY BY SCALE FACTOR ...

> best,
> Markus

Yep, I know that. But why is it wrong to just use them as they are?
Regards, Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Nikos Alexandris

Re: Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply Threaded More More options
Print post
Permalink
In reply to this post by Edzer Pebesma-2
Edzer:
> With Markus' help I got access, and corrected/extended quite a few things,
> including why R sometimes doesn't print loadings (when they're close to
> 0).

Great!


> I do worry about the wiki that states results are "similar" between R and
> grass, when the differences appear e.g. in the third or second digit of a
> loading (eigenvector value). Clearly something is is going on, here.

So you think it's really important? Isn't it a matter of pixel-inclusion
in the various algorithms/programs?

Nikos

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
1 2