[GRASS-stats] clustering

2 messages Options
Embed this post
Permalink
Jarosław Jasiewicz

[GRASS-stats] clustering

Reply Threaded More More options
Print post
Permalink
Hi

question about clustering : I try to use R to clustering for large
raster data set (40000 px at least, commonly 2 500 000) but from package
cluster only` clara `works (works fine and fast about 2 seconds on
largest data set) and no other methods (daisy first of all) work. The
message I recive is that `vector is too long` probably it means too large...

Question is simple is there any alternative for cluster package (I think
about fuzzy clasifications) for R? Generally in R are more than few
culustering packages, but before I try to test them, with such dataset I
shall rather look for another tool?


Jarek


_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand

Re: [GRASS-stats] clustering

Reply Threaded More More options
Print post
Permalink
On Wed, 5 Mar 2008, Jarek Jasiewicz wrote:

> Hi
>
> question about clustering : I try to use R to clustering for large raster
> data set (40000 px at least, commonly 2 500 000) but from package cluster
> only` clara `works (works fine and fast about 2 seconds on largest data set)
> and no other methods (daisy first of all) work. The message I recive is that
> `vector is too long` probably it means too large...
>
> Question is simple is there any alternative for cluster package (I think
> about fuzzy clasifications) for R? Generally in R are more than few
> culustering packages, but before I try to test them, with such dataset I
> shall rather look for another tool?
>

Have you looked at the Cluster Task View?

http://cran.r-project.org/web/views/Cluster.html

Admittedly, many of the methods in these packages are not written to scale
up, but to illustrate the implementation of methods in principle. If you
try to start a list of methods that don't fail for 40K, and contact the
maintainer of the task view (Bettina Gruen), I expect that she and
Friedrich Leisch would consider including a paragraph on the suitability
of the methods they list for larger data sets. I would also look to see
whether the Bioconductor statistics task view:

http://www.bioconductor.org/packages/release/Statistics.html

which has a clustering subview, is relevant - gene array data are also
typically very large. In general, I think that "cluster" as a word is
often used with "machine learning" and "pattern recognition", so the
references and links may be rather disorganised.

Hope this helps,

Roger

>
> Jarek
>
>
> _______________________________________________
> grass-stats mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-stats
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
grass-stats mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-stats
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway