TextIndexNG3 - Query on special characters

19 messages Options
Embed this post
Permalink
kees04 () TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
Hi,

I have installed TextIndexNG3 v3.2.8, which is working very well. I however have a query about special characters. When I go to /Plone/portal_catalog/Indexes/SearchableText and look through a letter, say b, I get words with 'special' characters

búsquedas
bürointerne
būtiskajām

Yet when I click on one of these words I get an error which tells me that ascii cannot decode the character.

Exception Type UnicodeDecodeError
Exception Value 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)


Has anyone seen this before and have any advice on how to either access these special characters or to fix it so it doesn't use the special characters?

Regards
Kees
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
In addition, here is how my searchable text is configured.

 TextIndexNG3 at  /Plone/portal_catalog/Indexes/SearchableText  
# indexed documents 17794
# indexed words 1735369
Languages en
Fields SearchableText
Default encoding utf-8
Additional characters recognized by the splitter part of a word _-
Splitter txng.splitters.simple
Stemming False
Autoexpand off
Autoexpand limit 4
Parser txng.parsers.en
Casefolding True
Storage txng.storages.term_frequencies
Dedicated storages False
Ranking True
Normalizer False
Stopwords  False
Index unknown languages True
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
In reply to this post by kees04


--On 19. August 2008 01:21:46 -0700 kees04 <[hidden email]> wrote:

>
> Hi,
>
> I have installed TextIndexNG3 v3.2.8, which is working very well. I
> however have a query about special characters. When I go to
> /Plone/portal_catalog/Indexes/SearchableText and look through a letter,
> say b, I get words with 'special' characters
>
> búsquedas
> bürointerne
> būtiskajām
>
> Yet when I click on one of these words I get an error which tells me that
> ascii cannot decode the character.
>
> Exception Type UnicodeDecodeError
> Exception Value 'ascii' codec can't decode byte 0xc5 in position 1:
> ordinal not in range(128)
Provide the full traceback please. Likely only a UI problem. The UI
functionality does not affect the backend functionality.

-aj

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
Andreas Jung-5 wrote:
Provide the full traceback please. Likely only a UI problem. The UI
functionality does not affect the backend functionality.

-aj
 
Hi AJ,

Here is the traceback log


Traceback (innermost last):
  Module ZPublisher.Publish, line 119, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 42, in call_object
  Module Products.Five.browser.metaconfigure, line 417, in __call__
  Module Shared.DC.Scripts.Bindings, line 313, in __call__
  Module Shared.DC.Scripts.Bindings, line 350, in _bindAndExec
  Module Products.PageTemplates.PageTemplateFile, line 129, in _exec
  Module Products.CacheSetup.patch_cmf, line 120, in PT_pt_render
  Module zope.tal.talinterpreter, line 271, in __call__
  Module zope.tal.talinterpreter, line 346, in interpret
  Module zope.tal.talinterpreter, line 855, in do_condition
  Module zope.tal.talinterpreter, line 346, in interpret
  Module zope.tal.talinterpreter, line 536, in do_optTag_tal
  Module zope.tal.talinterpreter, line 521, in do_optTag
  Module zope.tal.talinterpreter, line 516, in no_tag
  Module zope.tal.talinterpreter, line 346, in interpret
  Module zope.tal.talinterpreter, line 586, in do_setLocal_tal
  Module zope.tales.tales, line 696, in evaluate
   - URL: index
   - Line 8, Column 2
   - Expression: <PathExpr standard:'context/@@documents_for_word'>
   - Names:
      {'container': <TextIndexNG3 at /Plone/portal_catalog//SearchableText>,
       'context': <TextIndexNG3 at /Plone/portal_catalog//SearchableText>,
       'default': ,
       'here': <TextIndexNG3 at /Plone/portal_catalog//SearchableText>,
       'loop': {},
       'nothing': None,
       'options': {'args': (<Products.Five.metaclass.SimpleViewClass from /home/plone/Plone-3.1/Python-2.4/lib/python2.4/site-packages/Products.TextIndexNG3-3.2.8-py2.4.egg/Products/TextIndexNG3/pt/documentsforword.pt object at 0x1e948dd0>,)},
       'repeat': <Products.PageTemplates.Expressions.SafeMapping object at 0x1eb44dd0>,
       'request': <HTTPRequest, URL=http://THFC:8080/Plone/portal_catalog/Indexes/SearchableText/documentsforword>,
       'root': <Application at >,
       'template': <ImplicitAcquirerWrapper object at 0x1e948c50>,
       'traverse_subpath': [],
       'user': <PropertiedUser 'admin'>,
       'view': <Products.Five.metaclass.SimpleViewClass from /home/plone/Plone-3.1/Python-2.4/lib/python2.4/site-packages/Products.TextIndexNG3-3.2.8-py2.4.egg/Products/TextIndexNG3/pt/documentsforword.pt object at 0x1e948dd0>,
       'views': <zope.app.pagetemplate.viewpagetemplatefile.ViewMapper object at 0x22050290>}
  Module zope.tales.expressions, line 217, in __call__
  Module Products.PageTemplates.Expressions, line 161, in _eval
  Module Products.PageTemplates.Expressions, line 123, in render
  Module Products.TextIndexNG3.browser, line 82, in documents_for_word
  Module textindexng.lexicon, line 106, in getWordId
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 19. August 2008 01:48:02 -0700 kees04 <[hidden email]> wrote:

>
>
> Andreas Jung-5 wrote:
>>
>> Provide the full traceback please. Likely only a UI problem. The UI
>> functionality does not affect the backend functionality.
>>
>

Hm..I can not reproduce this error nor can I figure out a flaw in the code.
How can this be reproduced with a bare Plone 3 instance?

Andreas

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink

Andreas Jung-5 wrote:
Hm..I can not reproduce this error nor can I figure out a flaw in the code.
How can this be reproduced with a bare Plone 3 instance?

Andreas
 
I'm not sure how you'd reproduce this problem on your site.
I have added in an external filesystem via reflector, which hosts all of our documentation, do you think the problem may lie here?

My plone site is hosted on a Linux server and the external documentation is hosted on a windows server which is mounted on Linux via NFS.

Thanks
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 19. August 2008 02:38:05 -0700 kees04 <[hidden email]> wrote:

>
>
>
> Andreas Jung-5 wrote:
>>
>>
>> Hm..I can not reproduce this error nor can I figure out a flaw in the
>> code.
>> How can this be reproduced with a bare Plone 3 instance?
>>
>> Andreas
>>
>>
>
> I'm not sure how you'd reproduce this problem on your site.
> I have added in an external filesystem via reflector, which hosts all of
> our documentation, do you think the problem may lie here?
hm..sorry, no idea...I need something in my hands in order to perform
further investigations.

Andreas

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
Dieter Maurer () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
In reply to this post by kees04
kees04 wrote at 2008-8-19 01:48 -0700:
> ...
>  Module Products.TextIndexNG3.browser, line 82, in documents_for_word
>  Module textindexng.lexicon, line 106, in getWordId
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1:
>ordinal not in range(128)

I read this as: the lexicon is mixing unicode and "str" together.

I would try to reproduce this problem in an interactive Python interpreter
("bin/zopectl debug" under *nix), then use "pdb.pm()" to analyse:

   the parameter passed to "getWordId" (it is likely an "str")
   and the lexicon content (likely to be "unicode").



--
Dieter

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 23. August 2008 13:27:06 +0200 Dieter Maurer <[hidden email]>
wrote:

> kees04 wrote at 2008-8-19 01:48 -0700:
>> ...
>>  Module Products.TextIndexNG3.browser, line 82, in documents_for_word
>>  Module textindexng.lexicon, line 106, in getWordId
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1:
>> ordinal not in range(128)
>
> I read this as: the lexicon is mixing unicode and "str" together.

Never!

The lexicon of TXNG has a dedicated check for unicode strings.

-aj

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
On a slight side note, I have installed Flash Player 2.1 and uploaded a few FLV video files.

Is it possible to have these indexed and searchable? Or is it not possible due to them being video files?
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 26. August 2008 00:54:54 -0700 kees04 <[hidden email]> wrote:

>
> On a slight side note, I have installed Flash Player 2.1 and uploaded a
> few FLV video files.
>
> Is it possible to have these indexed and searchable? Or is it not possible
> due to them being video files?

Video files and TextIndexNG? Makes no sense to me. For File content, TXNG
will only index the textual metadata.

Andreas

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink

Andreas Jung-5 wrote:
Video files and TextIndexNG? Makes no sense to me. For File content, TXNG
will only index the textual metadata.

Andreas
 
I know you cannot index the video file itself, as it's a text index. However I would like to be able to index the text name and description I have given to the file, is that possible?

To elaborate further, I want to upload 'How To' videos for our users and have them searchable in our plone site. I have tried adding them into a category but this has not worked. Any advice on this?
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 26. August 2008 01:06:31 -0700 kees04 <[hidden email]> wrote:

>
>
>
> Andreas Jung-5 wrote:
>>
>>
>> Video files and TextIndexNG? Makes no sense to me. For File content,
>> TXNG  will only index the textual metadata.
>>
>> Andreas
>>
>>
>
> I know you cannot index the video file itself, as it's a text index.
> However I would like to be able to index the text name and description I
> have given to the file, is that possible?
>
Please read my reply once again :-)

-aj

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink

Andreas Jung-5 wrote:
Please read my reply once again :-)

-aj
Ok thanks for your help, is there a package out there that you know would help me with my requirements?

I ask because this is possible on the plone.org site, if you search for 'Plone form gen' then you are pointed in the direction of a movie file.
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 26. August 2008 01:23:04 -0700 kees04 <[hidden email]> wrote:

>
>
>
> Andreas Jung-5 wrote:
>>
>> Please read my reply once again :-)
>>
>> -aj
>>
>>
>>
>
> Ok thanks for your help, is there a package out there that you know would
> help me with my requirements?
You just have to read the TXNG Readme. It explains you how to integrate TXNG
with other content-types.

-aj

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
kees04 () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
Andreas Jung-5 wrote:
You just have to read the TXNG Readme. It explains you how to integrate TXNG
with other content-types.

-aj
 
I have just read the Readme.txt and found this section

 
How to make your custom content-types searchable
================================================

Most current Zope index implementations are built on the fact that an
index with id XX tries to lookup the indexable content either from an objects
XX attribute or by calling the method XX() of the object. Although TextIndexNG
V3 still supports this behaviour, the recommended way to make custom types
indexable through TXNG3 is through providing dedicated methods that return
indexable content. The API of these methods is defined in
src/textindexng/interfaces/indexable.py. Custom types must either implement the
IIndexableContent API directly or provide the interface through an adapter
registered through ZCML. The IndexContentCollector class should be used to
return indexable content either as unicode string or as binary stream (to be
transformed through external converters). Some example how to use the
indexing API can be found in src/textindexng/tests/mock.py (see classes
Mock, MockPDF and StupidMockAdapter)
 
Is this the section you are refering to? If so I don't quite understand what I need to do? Would you be able to help.
ajung () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink


--On 26. August 2008 01:48:18 -0700 kees04 <[hidden email]> wrote:

>
>
> Andreas Jung-5 wrote:
>>
>>
>> You just have to read the TXNG Readme. It explains you how to integrate
>> TXNG
>> with other content-types.
>>
>> -aj
>>
>
> I have just read the Readme.txt and found this section
>
>
>
>> How to make your custom content-types searchable
>> ================================================
>>
>> Most current Zope index implementations are built on the fact that an
>> index with id XX tries to lookup the indexable content either from an
>> objects
>> XX attribute or by calling the method XX() of the object. Although
>> TextIndexNG
>> V3 still supports this behaviour, the recommended way to make custom
>> types indexable through TXNG3 is through providing dedicated methods
>> that return indexable content. The API of these methods is defined in
>> src/textindexng/interfaces/indexable.py. Custom types must either
>> implement the
>> IIndexableContent API directly or provide the interface through an
>> adapter registered through ZCML. The IndexContentCollector class should
>> be used to return indexable content either as unicode string or as
>> binary stream (to be
>> transformed through external converters). Some example how to use the
>> indexing API can be found in src/textindexng/tests/mock.py (see classes
>> Mock, MockPDF and StupidMockAdapter)
>>
>
> Is this the section you are refering to? If so I don't quite understand
> what I need to do? Would you be able to help.
This section refer to basic Zope 3 technology like adapter & components.
Sorry but I won't explain Zope 3 technology here. You have to refer to the
related documentation like Philipp von Weiterhausen's Zope 3 book.
Or you have to check the related unittests of the TXNG 3 source code.
You need to know the basic Zope 3 concepts in order to proceed..sorry, you
have learn.

-aj


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users

attachment0 (201 bytes) Download Attachment
Dieter Maurer () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
In reply to this post by ajung
Andreas Jung wrote at 2008-8-24 17:30 +0200:

>
>
>--On 23. August 2008 13:27:06 +0200 Dieter Maurer <[hidden email]>
>wrote:
>
>> kees04 wrote at 2008-8-19 01:48 -0700:
>>> ...
>>>  Module Products.TextIndexNG3.browser, line 82, in documents_for_word
>>>  Module textindexng.lexicon, line 106, in getWordId
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1:
>>> ordinal not in range(128)
>>
>> I read this as: the lexicon is mixing unicode and "str" together.
>
>Never!
>
>The lexicon of TXNG has a dedicated check for unicode strings.

The traceback tells us without any doubt:

  In line 106, "getWordId" fails to decode an "str" to "unicode"
  using the "ascii" encoding.

This means:

  * there is some "str" and some "unicode" mixed together in
    "lexicon.Lexicon.getWordId".

  * the "str" can come from the caller ("documents_for_word")
    or from the lexicon itself.



--
Dieter

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Plone-Users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users
amoreno () Re: TextIndexNG3 - Query on special characters
Reply Threaded More More options
Print post
Permalink
In reply to this post by kees04
Call this function before:

def to_unicode_or_bust(obj, encoding='utf-8'):
    if isinstance(obj, basestring):
        if not isinstance(obj, unicode):
            obj = unicode(obj, encoding)
    return obj

It worked for me indexing the results of a function using a TextIndexNG3 index (whithout this function I get the same error as you.)

I got it from here. Interesting reading. :)
http://farmdev.com/talks/unicode/

kees04 wrote:
Hi,

I have installed TextIndexNG3 v3.2.8, which is working very well. I however have a query about special characters. When I go to /Plone/portal_catalog/Indexes/SearchableText and look through a letter, say b, I get words with 'special' characters

búsquedas
bürointerne
būtiskajām

Yet when I click on one of these words I get an error which tells me that ascii cannot decode the character.

Exception Type UnicodeDecodeError
Exception Value 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)


Has anyone seen this before and have any advice on how to either access these special characters or to fix it so it doesn't use the special characters?

Regards
Kees