Search Integration

5 messages Options
Embed this post
Permalink
Alan Runyan-3 () Search Integration
Reply Threaded More More options
Print post
Permalink
Hi.

(I was going to write a bunch on systems integration but
then decided to focus on search.  We can talk @ google
about general sys integration.)

We have been doing some work with search integration.
We have done work with three search engines, so far:
Xapian, Google .Search Appliance (GSA) and SOLR/Lucene.

I will focus on SOLR/Lucene, since the Snow Sprint did some
work on this.  We thought about how to do a integration into
SOLR/Lucene.  And there are at least three way:

    - Make a new ZCatalog Index, i.e.
    replace SearchableText Catalog Index.

    - Override the default portal_catalog w/ SOLR capabilities.

    - Use events and leave it to whomever wants to use SOLR
    inside of Plone to integrate with the adapters and possibly
    register new subscribers, etc.

We decided to focus on the latter.  Overriding the semantics of
existing Plone (and all the layers it is built on) is riddled with
decisions that the consultant closest to the customer must make.
So we have a simple event/adapter mechanism.  And there is
enough for the "client programmer" must do that it is not nearly
an 'out of the box' integration.  I believe this is the right approach.

I believe the goal for 'search integration' should NOT be tightly
coupled Plone with a indexing layer.  But for us to write a
indexing system that can integrate with many different search
vendors: SOLR, GSA, FAST, Sphinx and Autonomy.  Getting
the "content reliably into the search application" is 70% of the battle.
How you search it is up to you.

Consuming the data is rarely the problem since its a matter of
override: advanced search page and search results.  And as
we move to more z3 components it will be even easier.

Again I believe integration should be thought of "reliably and
scalable performing the task the component/integration is
set out to do.  documenting the problem statement and the
technology."  Letting the client developer handle realization
of the integration into their customers project.

Other I/R libraries:

We have not gotten around to GSA integration.  We have a GSA
API in our public svn, enfold.gsa.  But we have not gotten to
integrate it with enfold.indexing.

Xapian integration should happen via Flax.  Flax must grow a
REST API (I have been talking with Lemur Consulting but anyone
can hack on the code - it can be found at flaxcode.code.google.com
and I believe a 'SOLR' compatible API would work nicely).

Why multiple I/R libraries:

Customer's *must* reuse their existing infrastructure and skills.
Why support both Xapian and Lucene?  I believe this depends
on how they can leverage the underlying libraries.  If a customer
is invested in Java - they will probably want SOLR/Lucene.  If
a customer does not care and see that FLAX can add value outside
of the Plone integration; they may want to use Xapian/FLAX.

The point is we can only reliably setup the infrastructure to integrate
into multiple engines.  It is up to the consulting company to make
the call.

Thanks to Tarek && Sprint Team for working on enfold.solr.

cheers
alan




--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856

_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise
Reinout van Rees () Re: Search Integration
Reply Threaded More More options
Print post
Permalink
Alan Runyan schreef:

> We have been doing some work with search integration.
> We have done work with three search engines, so far:
> Xapian, Google .Search Appliance (GSA) and SOLR/Lucene.

We have a big client that uses GSA on its intranet. They're actually
having the problem that a lot of the information is password-protected.
The two plone sites we've made also have at least 50% of their content
hidden from anonymous view.

I guess there's no real solution to this. Can't hurt to ask: anyone got
a workable tip from their projects?


Reinout

--
Reinout van Rees                   [hidden email]
http://vanrees.org/weblog/            http://zestsoftware.nl/
              I can be googled, therefore I am.


_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise
Alan Runyan-3 () Re: Search Integration
Reply Threaded More More options
Print post
Permalink
> We have a big client that uses GSA on its intranet. They're actually
> having the problem that a lot of the information is password-protected.
> The two plone sites we've made also have at least 50% of their content
> hidden from anonymous view.

Well.  There are various ways to fix ithis.  Read the GSA documentation.

Some ideas:

  - Build a Plone / GSA Connector (seems quite simple)

  - Write a Zope (PAS?) SAML plug-in (Weblion guys have been looking at this)

    - You *might* get this for free if you are running on Microsoft IIS.  I cant
    tell from the documentation at msdn.

    - This is the more 'generic' approach

> I guess there's no real solution to this. Can't hurt to ask: anyone got
> a workable tip from their projects?

There are *real* solutions to this.  Every major search vendor
supports complex security requirements.  *THIS* is exactly the kinda
search requirements we need to satisfy for us to work in large organizations.

--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856

_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise
Matt Hamilton () Re: Search Integration
Reply Threaded More More options
Print post
Permalink
Alan Runyan <runyaga@...> writes:

> > I guess there's no real solution to this. Can't hurt to ask: anyone got
> > a workable tip from their projects?
>
> There are *real* solutions to this.  Every major search vendor
> supports complex security requirements.  *THIS* is exactly the kinda
> search requirements we need to satisfy for us to work in large organizations.

I guess we really need to know how these search systems work, is there any
standard that they follow for authentication (I'm guessing not)?

At the PSPS we talked about coming up with specific integration stories for
Plone, this is more talking marketing that actual client requirements, but we
could for instance say 'GSA is the enterprise search solution for Plone' and
then put some effort behind getting a really polished GSA connector for Plone
built (I've no idea what that entails, so could be talking rubbish).  Yes there
are other systems people want to integrate with for search, but if we could pick
one or two market leaders then we would be able to tick quite a few boxes.

-Matt

--
Matt Hamilton                                       [hidden email]
Netsight Internet Solutions, Ltd.        Business Vision on the Internet
http://www.netsight.co.uk                             +44 (0)117 9090901
Web Design | Zope/Plone Development & Consulting | Co-location | Hosting





_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise
Alan Runyan-3 () Re: Search Integration
Reply Threaded More More options
Print post
Permalink
>  I guess we really need to know how these search systems work, is there any
>  standard that they follow for authentication (I'm guessing not)?

No.  GSA seems to follow some sense of standards.  Writing a GSA 'connector',
like the Sharepoint connectors would cost a bit.  Our estimates were ~
240 hours.

>  At the PSPS we talked about coming up with specific integration stories for
>  Plone, this is more talking marketing that actual client requirements, but we
>  could for instance say 'GSA is the enterprise search solution for Plone' and
>  then put some effort behind getting a really polished GSA connector for Plone
>  built (I've no idea what that entails, so could be talking rubbish).  Yes there
>  are other systems people want to integrate with for search, but if we could pick
>  one or two market leaders then we would be able to tick quite a few boxes.

Lucene/SOLR (we have this integration in our public svn) is done.

Xapian/Flax (we are talking with the lemur guys) has some time to go.  You can
 see the 'state of the art' at http://www.mydecor.com/ (tags, facets,
clustering, etc)

The 'market leaders' in search (from most to least) are: Autonomy, FAST and GSA.

The 'FOSS leaders' are (from most to last): Lucene, Xapian, Sphinx.

--
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856

_______________________________________________
Enterprise mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/enterprise