Should setText(), title and description be utf-8 or unicode?

7 messages Options
Embed this post
Permalink
Mikko Ohtamaa () Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
Hi,

Over the years I have seen two different behaviors in Plone:

* String and text fields store content as unicode

* String and text fields store content as utf-8 str (8-bit string)

I assume the first one is the correct behavior and it should be encouraged in the future?

Cheers,
Mikko
ajung () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
On 10.08.09 22:18, Mikko Ohtamaa wrote:
> Hi,
>
> Over the years I have seen two different behaviors in Plone:
>
> * String and text fields store content as unicode
>
> * String and text fields store content as utf-8 str (8-bit string

I have *never* seen different behaviour. Internal storage format
is unicode. Data accessed or modified through the accessor/mutator
methods has to be encoded using the configured site-encoding.
If you access field directly -> your code is broken (except when
you are using FieldPropertys).

-aj

[lists.vcf]

begin:vcard
fn:Andreas Jung
n:Jung;Andreas
org:ZOPYX Ltd. & Co. KG
adr;quoted-printable:;;Charlottenstr. 37/1;T=C3=BCbingen;;72070;Germany
email;internet:[hidden email]
title:CEO
tel;work:+49-7071-793376
tel;fax:+49-7071-7936840
tel;home:+49-7071-793257
x-mozilla-html:FALSE
url:www.zopyx.com
version:2.1
end:vcard



_______________________________________________
Product-Developers mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/product-developers
Mikko Ohtamaa () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink

ajung wrote:
I have *never* seen different behaviour. Internal storage format
is unicode. Data accessed or modified through the accessor/mutator
methods has to be encoded using the configured site-encoding.
If you access field directly -> your code is broken (except when
you are using FieldPropertys).
Ok, thanks. I thought so. I have very old Plone 2.0 in my hands and it seems at least some of fields are internally utf-8. This is probably caused by third party code and the lack of documentation and input validation regarding the issue, so I expect other might encounter similar problems.


-Mikko
ajung () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
On 10.08.09 22:34, Mikko Ohtamaa wrote:

>
>
> ajung wrote:
>  
>> I have *never* seen different behaviour. Internal storage format
>> is unicode. Data accessed or modified through the accessor/mutator
>> methods has to be encoded using the configured site-encoding.
>> If you access field directly -> your code is broken (except when
>> you are using FieldPropertys).
>>
>>    
> Ok, thanks. I thought so. I have very old Plone 2.0 in my hands
>  
Well, 2.0 and older versions had lots of misconceptions.

Andreas

--
ZOPYX Ltd. & Co KG          \  ZOPYX & Friends
Charlottenstr. 37/1          \  The experts for your Python, Zope and
D-72070 Tübingen              \  Plone projects
www.zopyx.com, [hidden email]  \  www.zopyx.de/friends, [hidden email]
------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting



[lists.vcf]

begin:vcard
fn:Andreas Jung
n:Jung;Andreas
org:ZOPYX Ltd. & Co. KG
adr;quoted-printable:;;Charlottenstr. 37/1;T=C3=BCbingen;;72070;Germany
email;internet:[hidden email]
title:CEO
tel;work:+49-7071-793376
tel;fax:+49-7071-7936840
tel;home:+49-7071-793257
x-mozilla-html:FALSE
url:www.zopyx.com
version:2.1
end:vcard



_______________________________________________
Product-Developers mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/product-developers
Martijn Pieters () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
In reply to this post by ajung
2009/8/10 Andreas Jung <[hidden email]>:
> I have *never* seen different behaviour. Internal storage format
> is unicode. Data accessed or modified through the accessor/mutator
> methods has to be encoded using the configured site-encoding.
> If you access field directly -> your code is broken (except when
> you are using FieldPropertys).

Archetypes is very much wrong doing this; encoding and decoding should
happen at the I/O boundaries. The number of times I have to
special-case Archetypes because it returns UTF-8 is ridiculous. The
reason Archetypes does this is purely historical, technically there is
no reason anymore (other than backwards compatibility) to not return
unicode.

--
Martijn Pieters

_______________________________________________
Product-Developers mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/product-developers
Wichert Akkerman () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
On 8/11/09 10:43 , Martijn Pieters wrote:
> Archetypes is very much wrong doing this; encoding and decoding should
> happen at the I/O boundaries. The number of times I have to
> special-case Archetypes because it returns UTF-8 is ridiculous.

It returns site encoding, which might not be UTF-8.

> The
> reason Archetypes does this is purely historical, technically there is
> no reason anymore (other than backwards compatibility) to not return
> unicode.

Aside from expectations by a probably surprisingly large amount of code.
Everything that tries to pass data to external processes such as portal
transforms and various newsletter products are build around a lot of
assumptions that might no longer hold if we change this.

Having said that I would love to see this change in a major Plone release.

Wichert.



_______________________________________________
Product-Developers mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/product-developers
David Glick-2 () Re: Should setText(), title and description be utf-8 or unicode?
Reply Threaded More More options
Print post
Permalink
On Aug 11, 2009, at 5:40 AM, Wichert Akkerman wrote:

> On 8/11/09 10:43 , Martijn Pieters wrote:
>> Archetypes is very much wrong doing this; encoding and decoding  
>> should
>> happen at the I/O boundaries. The number of times I have to
>> special-case Archetypes because it returns UTF-8 is ridiculous.
>
> It returns site encoding, which might not be UTF-8.
>
>> The
>> reason Archetypes does this is purely historical, technically there  
>> is
>> no reason anymore (other than backwards compatibility) to not return
>> unicode.
>
> Aside from expectations by a probably surprisingly large amount of  
> code. Everything that tries to pass data to external processes such  
> as portal transforms and various newsletter products are build  
> around a lot of assumptions that might no longer hold if we change  
> this.
>
> Having said that I would love to see this change in a major Plone  
> release.


Hanno already made this change on AT trunk, I believe.  We decided not  
to include it for Plone 4 due to the product compatibility concerns.


David Glick
Web Developer
ONE/Northwest

New tools and strategies for engaging people in protecting the  
environment

http://www.onenw.org
[hidden email]
work: (206) 286-1235 x32
mobile: (206) 679-3833

Subscribe to ONEList, our email newsletter!
Practical advice for effective online engagement
http://www.onenw.org/full_signup





_______________________________________________
Product-Developers mailing list
[hidden email]
http://lists.plone.org/mailman/listinfo/product-developers