vector libs: file based spatial index

29 messages Options
Embed this post
Permalink
1 2
Martin Landa

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Hi Markus,

2009/7/7 Markus Metz <[hidden email]>:

[...]

> For the time being, the only reasonable way to deal with these massive
> datasets is to *not* build topology. It's not not only the spatial index
> that is getting out of hand, also topology itself and the category
> index. The grass vector libs must be told that there is nothing special
> about point datasets (to cite Hamish) which means rewriting major parts
> of the vector libs, and that takes time.

BTW, are you planning to commit your changes in sidx to trunk?

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Markus Metz-2

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Martin Landa wrote:

> Hi Markus,
>
> 2009/7/7 Markus Metz <[hidden email]>:
>
> [...]
>
>  
>> For the time being, the only reasonable way to deal with these massive
>> datasets is to *not* build topology. It's not not only the spatial index
>> that is getting out of hand, also topology itself and the category
>> index. The grass vector libs must be told that there is nothing special
>> about point datasets (to cite Hamish) which means rewriting major parts
>> of the vector libs, and that takes time.
>>    
>
> BTW, are you planning to commit your changes in sidx to trunk?
>  
Yes, after I'm done with testing. I have the probably unrealistic aim to
get building the new file-based spatial index as fast as the current
memory-based index, and I still have to implement the new memory-based
version, currently it's all file-based. I have polished the file-based
version with R*-tree methods and speed optimizations (amongst others a
custom quicksort), but adjusting the grass vector libs to use either the
memory-based version or the file-based version is really a lot of work.
It will take me at least another week to get it right, e.g. decide what
tasks should be done by Vlib and what tasks should be done by diglib,
and how the two should work together. I can send you a detailed
technical report if you want, but I'm afraid it will be very technical
and potentially boring unless you are interested in performance
differences between Toni Guttmann's RTree and Norbert Beckmann's
R*-tree. I would need some help to get random file access optimized,
it's not too bad in my tests but I don't know if it can get better.

Markus M
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Markus Metz-2

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
In reply to this post by Moritz Lennert
Moritz Lennert wrote:

> On 25/06/09 08:51, Markus GRASS wrote:
>> I would suggest that I first implement a new version were the spatial
>> index is always written out when a new or modifed vector is closed.
>> Intermediate data are still stored in memory. Opening an old vector in
>> read-only mode would then be faster, opening an old vector in update
>> mode would be the same like currently done, the spatial index is loaded
>> to memory. This can then be tested and polished, and once that is
>> stable, an env var could be added to keep the spatial index in file when
>> modifying (Vect_open_new or Vect_open_update). This would only be needed
>> for massive vectors.
>
> +1
Now in trunk r38390, time to make distclean again...

To work with an existing vector in grass7, topology needs to be rebuilt
because a support file is missing, the spatial index. After that
everything is fine and grass6 can read the vector again as it is.

The vector spatial index is now built in memory and written out to file,
like topology and the category index. When opening an old vector, only
the header of the spatial index file is loaded, searches are done in
file. When opening an old vector for update, the spatial index is loaded
from file to memory, modifed there and then written out, like topology
and the category index.

The new spatial index algorithm (R*-tree) is a bit faster than the old
algorithm (RTree), breaking lines profits from it and thus v.in.ogr and
v.clean.

v.build is now a bit faster, sometimes same speed, sometimes twice as
fast, generally better performance for more complicated geometry.
v.what is now generally faster by a factor of 6 to 30, depending on the
vector.

The authors of the R*-tree claim that the R*-tree's search performance
is better particularly for massive point datasets. Using
elev_lid792_bepts in nc_spm_08 (not really massive), v.what takes now
here about 0.16s instead of 4.3s, ~25x faster, combined improvement of
file-based index and better index algorithm.

Still, for massive point datasets I would recommend not to build
topology, because all three support structures, topology, spatial index
and category index, can become massive. Keeping one in file and loading
the other two to memory doesn't help much.

I hope I didn't mess up too much...

Markus M


_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Martin Landa

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Hi,

2009/7/13 Markus Metz <[hidden email]>:
> To work with an existing vector in grass7, topology needs to be rebuilt
> because a support file is missing, the spatial index. After that
> everything is fine and grass6 can read the vector again as it is.

great!

Just trying to build vector map 'bridges' from nc_spm. The module ends
up with the 'position mismatch' error.

$ v.build bridges
Building topology for vector map <bridges>...
[...]
Number of nodes: 10938
Number of primitives: 10938
Number of points: 10938
Number of lines: 0
Number of boundaries: 0
Number of centroids: 0
Number of areas: 0
Number of isles: 0
ERROR: position mismatch

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Martin Landa

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Hi,

2009/7/13 Martin Landa <[hidden email]>:

[...]

> Just trying to build vector map 'bridges' from nc_spm. The module ends
> up with the 'position mismatch' error.
>
> $ v.build bridges
> Building topology for vector map <bridges>...
[...]
> ERROR: position mismatch

Some debug info...

D4/5: dig_Wr_P_line() line = 10932
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10933
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10934
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10935
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10936
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10937
D5/5:     line type  1 -> 1
D4/5: dig_Wr_P_line() line = 10938
D5/5:     line type  1 -> 1
D2/5: topo body offset 142
D1/5: Vect_save_spatial_index()
D1/5: Open sidx: /home/martin/grassdata/nc_spm_08/PERMANENT/vector/bridges/sidx
D1/5: dig_Wr_spidx()
D3/5: spidx offset node = 0 line = 0, area = 0 isle = 0
D1/5: spidx body offset 113
ERROR: position mismatch

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Markus Metz-2

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
In reply to this post by Martin Landa
Martin Landa wrote:

> Hi,
>
> 2009/7/13 Markus Metz <[hidden email]>:
>  
>> To work with an existing vector in grass7, topology needs to be rebuilt
>> because a support file is missing, the spatial index. After that
>> everything is fine and grass6 can read the vector again as it is.
>>    
>
> great!
>
> Just trying to build vector map 'bridges' from nc_spm. The module ends
> up with the 'position mismatch' error.
>
> $ v.build bridges
> Building topology for vector map <bridges>...
> [...]
> Number of nodes: 10938
> Number of primitives: 10938
> Number of points: 10938
> Number of lines: 0
> Number of boundaries: 0
> Number of centroids: 0
> Number of areas: 0
> Number of isles: 0
> ERROR: position mismatch
>  
Fixed in r38397. No need for you to make distclean again, just recompile
diglib.

Markus M

_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Martin Landa

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Hi,

2009/7/13 Markus Metz <[hidden email]>:

[...]

> Fixed in r38397. No need for you to make distclean again, just recompile
> diglib.

Thanks for quick fix.

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Moritz Lennert

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
In reply to this post by Markus Metz-2
On 13/07/09 14:44, Markus Metz wrote:
> Now in trunk r38390, time to make distclean again...
>

Some testing:


1) ssbel: 20025 areas, 74674 primitives

time v.what --q -a map=ssbel@mlennert
east_north=213355.121152,112569.565623 distance=3555.157725


GRASS6.4:
real 0m3.401s
user 0m3.316s
sys 0m0.084s

GRASS7:
real 0m0.436s
user 0m0.400s
sys 0m0.040s

2) nuts5: 124658 areas, 481648 primitives

GRASS6.4:
real 0m23.595s
user 0m22.297s
sys 0m0.192s


GRASS7:
time v.what --q -a map=nuts5@mlennert
east_north=-151298.509242,-196456.483661 distance=71395.131107

real 0m1.216s
user 0m0.956s
sys 0m0.120s


3) erm_roads: 1883345 lines, 1883345 primitives

time v.build erm_roads

GRASS6.4:
real 1m54.298s
user 1m49.107s
sys 0m2.888s

GRASS7:
real 2m54.266s
user 2m40.606s
sys 0m6.688s

(Note the fact that here GRASS6.4 is significantly faster !)

time v.what --q map=erm_roads east_north=0,0 distance=10000

GRASS6.4:
real 1m23.439s
user 1m21.389s
sys 0m0.968s

real 0m5.330s
user 0m2.884s
sys 0m0.468s



Thanks a lot !!

Moritz


_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Markus Metz-2

Re: vector libs: file based spatial index

Reply Threaded More More options
Print post
Permalink
Moritz Lennert wrote:
>
> Some testing:
[...]

>
>
> 3) erm_roads: 1883345 lines, 1883345 primitives
>
> time v.build erm_roads
>
> GRASS6.4:
> real    1m54.298s
> user    1m49.107s
> sys    0m2.888s
>
> GRASS7:
> real    2m54.266s
> user    2m40.606s
> sys    0m6.688s
>
> (Note the fact that here GRASS6.4 is significantly faster !)
Hmm yes I noticed that too. v.build in grass70 is sometimes slower,
sometimes faster, sometimes very similar to v.build in grass64. What is
really weird that this is true only on one of my two test systems, the
other one is always at least as fast, often faster than grass64. This
would be Linux 32 vs. Linux 64, both are Fedora 8. On Linux 64, grass70
is at least as fast as grass64, on Linux 32 grass70 v.build is sometimes
slower, sometimes faster than grass64 v.build.

I have a few handles on the speed of the spatial index and can make the
speed of v.build in grass70 very similar to grass64. Both the gain and
the loss would disappear. This only applies to v.build, v.what doesn't
show this weird behaviour and is to my dismay very robust against
tweaking the new spatial index, different settings give near identical
speed results.

Markus M

_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
1 2