Hi.
(I was going to write a bunch on systems integration but
then decided to focus on search. We can talk @ google
about general sys integration.)
We have been doing some work with search integration.
We have done work with three search engines, so far:
Xapian, Google .Search Appliance (GSA) and SOLR/Lucene.
I will focus on SOLR/Lucene, since the Snow Sprint did some
work on this. We thought about how to do a integration into
SOLR/Lucene. And there are at least three way:
- Make a new ZCatalog Index, i.e.
replace SearchableText Catalog Index.
- Override the default portal_catalog w/ SOLR capabilities.
- Use events and leave it to whomever wants to use SOLR
inside of Plone to integrate with the adapters and possibly
register new subscribers, etc.
We decided to focus on the latter. Overriding the semantics of
existing Plone (and all the layers it is built on) is riddled with
decisions that the consultant closest to the customer must make.
So we have a simple event/adapter mechanism. And there is
enough for the "client programmer" must do that it is not nearly
an 'out of the box' integration. I believe this is the right approach.
I believe the goal for 'search integration' should NOT be tightly
coupled Plone with a indexing layer. But for us to write a
indexing system that can integrate with many different search
vendors: SOLR, GSA, FAST, Sphinx and Autonomy. Getting
the "content reliably into the search application" is 70% of the battle.
How you search it is up to you.
Consuming the data is rarely the problem since its a matter of
override: advanced search page and search results. And as
we move to more z3 components it will be even easier.
Again I believe integration should be thought of "reliably and
scalable performing the task the component/integration is
set out to do. documenting the problem statement and the
technology." Letting the client developer handle realization
of the integration into their customers project.
Other I/R libraries:
We have not gotten around to GSA integration. We have a GSA
API in our public svn, enfold.gsa. But we have not gotten to
integrate it with enfold.indexing.
Xapian integration should happen via Flax. Flax must grow a
REST API (I have been talking with Lemur Consulting but anyone
can hack on the code - it can be found at flaxcode.code.google.com
and I believe a 'SOLR' compatible API would work nicely).
Why multiple I/R libraries:
Customer's *must* reuse their existing infrastructure and skills.
Why support both Xapian and Lucene? I believe this depends
on how they can leverage the underlying libraries. If a customer
is invested in Java - they will probably want SOLR/Lucene. If
a customer does not care and see that FLAX can add value outside
of the Plone integration; they may want to use Xapian/FLAX.
The point is we can only reliably setup the infrastructure to integrate
into multiple engines. It is up to the consulting company to make
the call.
Thanks to Tarek && Sprint Team for working on enfold.solr.
cheers
alan
--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/phone: +1.713.942.2377x111
fax: +1.832.201.8856
_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise