Lucene: Faster indexing

8 messages Options
Embed this post
Permalink
Juan Carlos Méndez

Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
Hi.  I'm having some issues related to metadata indexing speed:
It takes more than 20 seconds to index a metadata in a collection of 33.000 records...  
It seems that indexing time degrades over time... and I need to index 180.000 records ...

Is there any way to improve speed of metadata indexing?

thanks for your help

Juan Carlos Méndez


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Heikki Doeleman

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
hi Juan Carlos,

this is an interesting subject. There is a lot of information to be found on the web about improving Lucene index speed - see e.g. http://search-lucene.blogspot.com/2008/08/indexing-speed-factors.html. If you try anything would you please share the results with us ?

Kind regards
Heikki Doeleman



2009/10/23 Juan Carlos Méndez <[hidden email]>
Hi.  I'm having some issues related to metadata indexing speed:
It takes more than 20 seconds to index a metadata in a collection of 33.000 records...  
It seems that indexing time degrades over time... and I need to index 180.000 records ...

Is there any way to improve speed of metadata indexing?

thanks for your help

Juan Carlos Méndez


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
James Wilson

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
In reply to this post by Juan Carlos Méndez
Is the problem within Lucene, or within the shapefile that GeoNetwork creates using geotools? In my (limited) experience with shapefiles / geotools, adding to a shapefile using a transaction gets progressively slower as the number of features in the shapefile grows. I believe geotools copies the file to a temporary location, then reindexes whole file. This should scale worse than linearly, but I'm guessing not much worse than  N log N.

It might be worth trying to insert some logging info into SearchManager to push out timings for different bits of the indexing processing.

Interested to see how you get on with that volume of records.

James

Juan Carlos Méndez wrote:
Hi.  I'm having some issues related to metadata indexing speed:
It takes more than 20 seconds to index a metadata in a collection of 33.000
records...
It seems that indexing time degrades over time... and I need to index
180.000 records ...

Is there any way to improve speed of metadata indexing?

thanks for your help

Juan Carlos Méndez

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Charlotte Declercq

System configuration save problem

Reply Threaded More More options
Print post
Permalink
Hello,

I have installed geonetwork from trunk and when I want to save my system
configuration I get this error in eclipse:

SettingManager: Unable to find Settings row to save
system/localrating/enable to.
SettingManager: Unable to find Settings row to save
system/clickablehyperlinks/enable to.

Any ideas?

--
Charlotte DECLERCQ

ALKANTE SAS
Ingénieur R&D SIG
1, rue du Chêne Morand
35 510 Cesson-Sévigné
Bur: + 33 (0) 2 99 22 25 70
fax : + 33 (0) 2 99 32 12 76





------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
David Neufeld

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
In reply to this post by James Wilson
We've had some concerns about the scalability of shapefiles and have
done some successful prototyping of inserting the metadata extents into
Oracle Spatial.  Is anyone else working or thinking along these lines?

Thanks, Dave

David Neufeld
Enterprise Data Systems Group
NOAA, NGDC, CIRES
(303) 497-6507
[hidden email]



James Wilson wrote:

> Is the problem within Lucene, or within the shapefile that GeoNetwork creates
> using geotools? In my (limited) experience with shapefiles / geotools,
> adding to a shapefile using a transaction gets progressively slower as the
> number of features in the shapefile grows. I believe geotools copies the
> file to a temporary location, then reindexes whole file. This should scale
> worse than linearly, but I'm guessing not much worse than  N log N.
>
> It might be worth trying to insert some logging info into SearchManager to
> push out timings for different bits of the indexing processing.
>
> Interested to see how you get on with that volume of records.
>
> James
>
>
> Juan Carlos Méndez wrote:
>  
>> Hi.  I'm having some issues related to metadata indexing speed:
>> It takes more than 20 seconds to index a metadata in a collection of
>> 33.000
>> records...
>> It seems that indexing time degrades over time... and I need to index
>> 180.000 records ...
>>
>> Is there any way to improve speed of metadata indexing?
>>
>> thanks for your help
>>
>> Juan Carlos Méndez
>>
>> ------------------------------------------------------------------------------
>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
>> http://p.sf.net/sfu/devconference
>> _______________________________________________
>> GeoNetwork-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
>> GeoNetwork OpenSource is maintained at
>> http://sourceforge.net/projects/geonetwork
>>
>>    
>
>  


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Juan Carlos Méndez

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
In reply to this post by Juan Carlos Méndez
I inserted some debugging code in SearchManager and these are the results:
Total of currently indexed metadata:
41489


2009-10-27 14:24:39,392 INFO  [geonetwork.index] - Indexing record (51212)
2009-10-27 14:24:39,407 INFO  [geonetwork.index] - record schema (xxxxxx)
2009-10-27 14:24:39,407 INFO  [geonetwork.index] - record createDate (2003-06-03T00:00:00)
2009-10-27 14:24:39,407 INFO  [geonetwork.index] - Begin - Collect info
2009-10-27 14:24:40,907 INFO  [geonetwork.index] - Begin - Lucene Indexing
2009-10-27 14:25:10,814 INFO  [geonetwork.index] - Begin - Spatial Indexing
2009-10-27 14:25:10,845 INFO  [geonetwork.index] - END

2009-10-27 14:29:16,001 INFO  [geonetwork.index] - - record (51217)
2009-10-27 14:29:16,001 INFO  [geonetwork.index] - Indexing record (51217)
2009-10-27 14:29:16,017 INFO  [geonetwork.index] - record schema (xxxxx)
2009-10-27 14:29:16,017 INFO  [geonetwork.index] - record createDate (2003-06-03T00:00:00)
2009-10-27 14:29:16,032 INFO  [geonetwork.index] - Begin - Collect info
2009-10-27 14:29:17,657 INFO  [geonetwork.index] - Begin - Lucene Indexing
2009-10-27 14:29:54,689 INFO  [geonetwork.index] - Begin - Spatial Indexing
2009-10-27 14:29:54,735 INFO  [geonetwork.index] - END

2009-10-27 14:30:32,048 INFO  [geonetwork.index] - - record (51219)
2009-10-27 14:30:32,048 INFO  [geonetwork.index] - Indexing record (51219)
2009-10-27 14:30:32,142 INFO  [geonetwork.index] - record schema (xxxxx)
2009-10-27 14:30:32,142 INFO  [geonetwork.index] - record createDate (2003-06-03T00:00:00)
2009-10-27 14:30:32,142 INFO  [geonetwork.index] - Begin - Collect info
2009-10-27 14:30:33,970 INFO  [geonetwork.index] - Begin - Lucene Indexing
2009-10-27 14:31:10,751 INFO  [geonetwork.index] - Begin - Spatial Indexing
2009-10-27 14:31:10,782 INFO  [geonetwork.index] - END

In most cases Lucene indexing is taking more than 30 seconds
Lucene index filesize: _1rtg.cfs -> 195MB

thanks for your help



Juan Carlos Méndez
---------- Forwarded message ----------
From: James Wilson <[hidden email]>
To: [hidden email]
Date: Mon, 26 Oct 2009 02:06:24 -0700 (PDT)
Subject: Re: [GeoNetwork-devel] Lucene: Faster indexing

Is the problem within Lucene, or within the shapefile that GeoNetwork creates
using geotools? In my (limited) experience with shapefiles / geotools,
adding to a shapefile using a transaction gets progressively slower as the
number of features in the shapefile grows. I believe geotools copies the
file to a temporary location, then reindexes whole file. This should scale
worse than linearly, but I'm guessing not much worse than  N log N.

It might be worth trying to insert some logging info into SearchManager to
push out timings for different bits of the indexing processing.

Interested to see how you get on with that volume of records.

James


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Francois Prunayre

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
In reply to this post by David Neufeld
Hello David,

2009/10/26 David Neufeld <[hidden email]>:
> We've had some concerns about the scalability of shapefiles and have
> done some successful prototyping of inserting the metadata extents into
> Oracle Spatial.  Is anyone else working or thinking along these lines?
Maybe an option could be to use shapefile by default and add an option
to use db resource when spatial option is available (ie. Oracle and
PostGIS) ?
Moving from a ShapefileDataStore to an other type of Datastore should
not be so difficult. Did you use the OracleDataStore ?

Cheers.

Francois



> Thanks, Dave
>
> David Neufeld
> Enterprise Data Systems Group
> NOAA, NGDC, CIRES
> (303) 497-6507
> [hidden email]
>
>
>
> James Wilson wrote:
>> Is the problem within Lucene, or within the shapefile that GeoNetwork creates
>> using geotools? In my (limited) experience with shapefiles / geotools,
>> adding to a shapefile using a transaction gets progressively slower as the
>> number of features in the shapefile grows. I believe geotools copies the
>> file to a temporary location, then reindexes whole file. This should scale
>> worse than linearly, but I'm guessing not much worse than  N log N.
>>
>> It might be worth trying to insert some logging info into SearchManager to
>> push out timings for different bits of the indexing processing.
>>
>> Interested to see how you get on with that volume of records.
>>
>> James
>>
>>
>> Juan Carlos Méndez wrote:
>>
>>> Hi.  I'm having some issues related to metadata indexing speed:
>>> It takes more than 20 seconds to index a metadata in a collection of
>>> 33.000
>>> records...
>>> It seems that indexing time degrades over time... and I need to index
>>> 180.000 records ...
>>>
>>> Is there any way to improve speed of metadata indexing?
>>>
>>> thanks for your help
>>>
>>> Juan Carlos Méndez
>>>
>>> ------------------------------------------------------------------------------
>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
>>> is the only developer event you need to attend this year. Jumpstart your
>>> developing skills, take BlackBerry mobile applications to market and stay
>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
>>> http://p.sf.net/sfu/devconference
>>> _______________________________________________
>>> GeoNetwork-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
>>> GeoNetwork OpenSource is maintained at
>>> http://sourceforge.net/projects/geonetwork
>>>
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> GeoNetwork-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
> GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
David Neufeld

Re: Lucene: Faster indexing

Reply Threaded More More options
Print post
Permalink
Hi Francois,

I started out using geotools OracleDataStore, but had some issues (it's
an unsupported branch) and reverted to working directly with the Oracle
Spatial libraries.  The PostGISDatastore should go smoothly.

The option you describe below sounds great.

Dave

Francois Prunayre wrote:

> Hello David,
>
> 2009/10/26 David Neufeld <[hidden email]>:
>  
>> We've had some concerns about the scalability of shapefiles and have
>> done some successful prototyping of inserting the metadata extents into
>> Oracle Spatial.  Is anyone else working or thinking along these lines?
>>    
> Maybe an option could be to use shapefile by default and add an option
> to use db resource when spatial option is available (ie. Oracle and
> PostGIS) ?
> Moving from a ShapefileDataStore to an other type of Datastore should
> not be so difficult. Did you use the OracleDataStore ?
>
> Cheers.
>
> Francois
>
>
>
>  
>> Thanks, Dave
>>
>> David Neufeld
>> Enterprise Data Systems Group
>> NOAA, NGDC, CIRES
>> (303) 497-6507
>> [hidden email]
>>
>>
>>
>> James Wilson wrote:
>>    
>>> Is the problem within Lucene, or within the shapefile that GeoNetwork creates
>>> using geotools? In my (limited) experience with shapefiles / geotools,
>>> adding to a shapefile using a transaction gets progressively slower as the
>>> number of features in the shapefile grows. I believe geotools copies the
>>> file to a temporary location, then reindexes whole file. This should scale
>>> worse than linearly, but I'm guessing not much worse than  N log N.
>>>
>>> It might be worth trying to insert some logging info into SearchManager to
>>> push out timings for different bits of the indexing processing.
>>>
>>> Interested to see how you get on with that volume of records.
>>>
>>> James
>>>
>>>
>>> Juan Carlos Méndez wrote:
>>>
>>>      
>>>> Hi.  I'm having some issues related to metadata indexing speed:
>>>> It takes more than 20 seconds to index a metadata in a collection of
>>>> 33.000
>>>> records...
>>>> It seems that indexing time degrades over time... and I need to index
>>>> 180.000 records ...
>>>>
>>>> Is there any way to improve speed of metadata indexing?
>>>>
>>>> thanks for your help
>>>>
>>>> Juan Carlos Méndez
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
>>>> is the only developer event you need to attend this year. Jumpstart your
>>>> developing skills, take BlackBerry mobile applications to market and stay
>>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
>>>> http://p.sf.net/sfu/devconference
>>>> _______________________________________________
>>>> GeoNetwork-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
>>>> GeoNetwork OpenSource is maintained at
>>>> http://sourceforge.net/projects/geonetwork
>>>>
>>>>
>>>>        
>>>      
>> ------------------------------------------------------------------------------
>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
>> http://p.sf.net/sfu/devconference
>> _______________________________________________
>> GeoNetwork-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
>> GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
>>    


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork