Site encoding

17 messages Options
Embed this post
Permalink
Martin Aspeli

Site encoding

Reply Threaded More More options
Print post
Permalink
Hi,

This is probably a dumb question, but - in what situations does anyone
actually need a default site encoding other than utf-8? That is, if we
removed the ability to configure the site encoding and just expected any
encoded string in Plone to be utf-8, what would we lose?

I think if we did this, we'd gain a fair bit of simplicity. Right now,
whenever you deal with a string, you need to look up the site encoding
(which requires an acquisition context) before you can encode/decode it.

Just a thought. :)

Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
On Sun, Sep 27, 2009 at 11:35 AM, Martin Aspeli
<[hidden email]> wrote:
> This is probably a dumb question, but - in what situations does anyone
> actually need a default site encoding other than utf-8? That is, if we
> removed the ability to configure the site encoding and just expected any
> encoded string in Plone to be utf-8, what would we lose?
>
> I think if we did this, we'd gain a fair bit of simplicity. Right now,
> whenever you deal with a string, you need to look up the site encoding
> (which requires an acquisition context) before you can encode/decode it.

We have long given up on the idea of the configurable site encoding.
Since Plone 3.0 we assume it is always utf-8 and ignore it in quite a
number of places. The main reason for that was, that we ended up in
lots of places where we had to deal with data, but had no Acquisition
context / database access.

One of the main places for this was in the internals of the TAL
engine, which since Zope 2.10 only accepts Unicode data to be output
by any of the TAL constructs. We patched this to allow for utf-8
encoded strings as well, but there was no way to make this work with a
configurable site encoding.

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Martin Aspeli

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
Hanno Schlichting wrote:

> On Sun, Sep 27, 2009 at 11:35 AM, Martin Aspeli
> <[hidden email]> wrote:
>> This is probably a dumb question, but - in what situations does anyone
>> actually need a default site encoding other than utf-8? That is, if we
>> removed the ability to configure the site encoding and just expected any
>> encoded string in Plone to be utf-8, what would we lose?
>>
>> I think if we did this, we'd gain a fair bit of simplicity. Right now,
>> whenever you deal with a string, you need to look up the site encoding
>> (which requires an acquisition context) before you can encode/decode it.
>
> We have long given up on the idea of the configurable site encoding.
> Since Plone 3.0 we assume it is always utf-8 and ignore it in quite a
> number of places. The main reason for that was, that we ended up in
> lots of places where we had to deal with data, but had no Acquisition
> context / database access.

Ah, excellent. :)

> One of the main places for this was in the internals of the TAL
> engine, which since Zope 2.10 only accepts Unicode data to be output
> by any of the TAL constructs. We patched this to allow for utf-8
> encoded strings as well, but there was no way to make this work with a
> configurable site encoding.

So we can just assume the world is utf-8 always? Maybe we should
deprecate the site encoding property entirely?

Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
On Sun, Sep 27, 2009 at 11:57 AM, Martin Aspeli
<[hidden email]> wrote:
> So we can just assume the world is utf-8 always? Maybe we should
> deprecate the site encoding property entirely?

We can assume the world is either Unicode or utf-8.

Properly deprecating the property and any API's for it would be a good
idea indeed. Feel free to do that ;)

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Maurits van Rees-3

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
Hanno Schlichting, on 2009-09-27:
> On Sun, Sep 27, 2009 at 11:57 AM, Martin Aspeli
><[hidden email]> wrote:
>> So we can just assume the world is utf-8 always? Maybe we should
>> deprecate the site encoding property entirely?
>
> We can assume the world is either Unicode or utf-8.
>
> Properly deprecating the property and any API's for it would be a good
> idea indeed. Feel free to do that ;)

Is something similar perhaps true for the email_charset property?  Hm,
I thought that was iso-8859-1 by default, but apparently it is utf-8
as well.  I know Poi uses this property to properly encode the email
headers.

At least it does not make much sense to have an email charset that
differs from the site encoding: a utf-8 site encoding allows you to set
e.g. a Chinese name as the from-name of the site, which won't quite
work when you use iso-8859-1 as email charset...

--
Maurits van Rees | http://maurits.vanrees.org/
            Work | http://zestsoftware.nl/
"This is your day, don't let them take it away." [Barlow Girl]


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
On Mon, Sep 28, 2009 at 11:59 AM, Maurits van Rees
<[hidden email]> wrote:
> Is something similar perhaps true for the email_charset property?  Hm,
> I thought that was iso-8859-1 by default, but apparently it is utf-8
> as well.  I know Poi uses this property to properly encode the email
> headers.
>
> At least it does not make much sense to have an email charset that
> differs from the site encoding: a utf-8 site encoding allows you to set
> e.g. a Chinese name as the from-name of the site, which won't quite
> work when you use iso-8859-1 as email charset...

Mail charset doesn't make much to any sense anymore either. Mail
encoding type (mime, quoted-printable, ...) might still make sense.
But even the later is taken care of by the latest MailHost work from
Alec for 4.0. I think it will make a good enough default choice for
you based on the kind of mail you are trying to sent.

So +1 to deprecating the mail charset as well.

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Gilles Lenfant

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Martin Aspeli

Le 27 sept. 09 à 11:35, Martin Aspeli a écrit :

> Hi,
>
> This is probably a dumb question, but - in what situations does anyone
> actually need a default site encoding other than utf-8? That is, if we
> removed the ability to configure the site encoding and just expected  
> any
> encoded string in Plone to be utf-8, what would we lose?
>
> I think if we did this, we'd gain a fair bit of simplicity. Right now,
> whenever you deal with a string, you need to look up the site encoding
> (which requires an acquisition context) before you can encode/decode  
> it.
>
> Just a thought. :)

Hi,

Another thought: such things should be (IMHO) in a zope.conf section  
(the ZConfig way)

%import Products.CMFPlone
<plone>
charset utf-8
...
</plone>

This way, such configuration data are evaluated only at instance  
startup. In addition, accessing such data is pretty faster than  
reading the ZODB. I did such a thing to speed up iw.fss.

My 2 cents
--
Gilles

>
> Martin
>
> --
> Author of `Professional Plone Development`, a book for developers who
> want to work with Plone. See http://martinaspeli.net/plone-book
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry® Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart  
> your
> developing skills, take BlackBerry mobile applications to market and  
> stay
> ahead of the curve. Join us from November 9-12, 2009. Register  
> now!
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Plone-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/plone-developers


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
On Mon, Sep 28, 2009 at 2:17 PM, Gilles Lenfant
<[hidden email]> wrote:

> Another thought: such things should be (IMHO) in a zope.conf section
> (the ZConfig way)
>
> %import Products.CMFPlone
> <plone>
> charset utf-8
> ...
> </plone>
>
> This way, such configuration data are evaluated only at instance
> startup. In addition, accessing such data is pretty faster than
> reading the ZODB. I did such a thing to speed up iw.fss.

Unfortunately we still very wildly support the "multiple instances per
database" approach, which makes zope.conf wide configuration somewhat
useless for us most of the time. Luckily plone.registry will be
included in Plone 4.0 which gives a good abstraction for configuration
data.

On the matter of a site-encoding, the whole idea is deprecated
anyways, as we'll have to move to storing all text as Unicode anyways
in the mid-term.

For applications storing data to the filesystem, I'd just keep it
simple and say: utf-8 is the new ascii - configurable encoding schemes
are too much hassle and you can just use utf-8 as a new default.

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Takeshi Yamamoto

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
We want to keep encoding related properties.  Unification to one  
encoding(UTF-8) is too early to do.

Actually, current Plone 3.3 already enforce us to use UTF-8, since  
some "utf-8" specification are hard-coded already.
We are forced to convert encoding from UTF-8 to Shift-JIS to support  
Keitai(Japan local web savvy mobile phone).
We wish if Plone could support Shift-JIS natively, too.

Best for us is Plone has capability to allow various encoding for  
stored data and templates.
Next best is that Plone has capability to allow various encoding for  
output data with real-time encoding conversion mechanism.

English world may not have a problem since UTF-8 is upper compatible  
to ASCII.
But, UTF-8 is not compatible to any previous encoding mapping.  (eg.  
Shift-JIS, euc_jp, ISO-2022-JP)
So, we have to convert them one by one with a mapping table.

Keitai is still using Shift-JIS for its web browser and e-mail.
Even though iPhone and Android are getting popularity, still Keitai is  
dominant.
That means, most popular encoding for web browser(for PC and Keitai)  
is Shift-JIS, which was designed for DOS.

Another annoying issue is e-mail encoding.
Since e-mail data stream needs to be 7-bit safe, official Japanese E-
mail text encoding is ISO-2022-JP,
which is using shift-out/shift-in escape sequence.  (Taiwan use Big5  
for e-mail)

So, we want to have following two properties.
   - site default encoding (this may set to Shift-JIS for dedicated  
Keitai application)
   - e-mail encoding (even though all contents are in UFT-8, we may  
need to set this to ISO-2022-JP)

Takeshi Yamamoto

On Sep 28, 2009, at 10:38 PM, Hanno Schlichting wrote:

> On Mon, Sep 28, 2009 at 2:17 PM, Gilles Lenfant
> <[hidden email]> wrote:
>> Another thought: such things should be (IMHO) in a zope.conf section
>> (the ZConfig way)
>>
>> %import Products.CMFPlone
>> <plone>
>> charset utf-8
>> ...
>> </plone>
>>
>> This way, such configuration data are evaluated only at instance
>> startup. In addition, accessing such data is pretty faster than
>> reading the ZODB. I did such a thing to speed up iw.fss.
>
> Unfortunately we still very wildly support the "multiple instances per
> database" approach, which makes zope.conf wide configuration somewhat
> useless for us most of the time. Luckily plone.registry will be
> included in Plone 4.0 which gives a good abstraction for configuration
> data.
>
> On the matter of a site-encoding, the whole idea is deprecated
> anyways, as we'll have to move to storing all text as Unicode anyways
> in the mid-term.
>
> For applications storing data to the filesystem, I'd just keep it
> simple and say: utf-8 is the new ascii - configurable encoding schemes
> are too much hassle and you can just use utf-8 as a new default.
>
> Hanno
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry® Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart  
> your
> developing skills, take BlackBerry mobile applications to market and  
> stay
> ahead of the curve. Join us from November 9-12, 2009. Register  
> now!
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Plone-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/plone-developers


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
The problem is, Japan is pretty much the only country that got as far as to standardizing on a non-standard charset early enough for it to make a difference.

To me, it sounds like the best way to fix this issue is to create some middleware or proxy that you can put in front that makes Plone's utf-8 into whatever encoding you need, Shift-JIS or otherwise. Similar to the mobile proxy stuff that Mikko has been working on — maybe this could be a feature of that package. I'd also be surprised if someone didn't already write something like this.

I fear that supporting Shift-JIS just for the Japanese market, and just for mobile phones is going to add a lot of overhead and complexity for Plone as a project — better to have this support live in a smaller piece of software that can be updated and maintained by those who need it.

Of course, I might be ignorant about how Shift-JIS and the Japanese market works, so take my advice with a pinch of salt, and feel free to tell me why I'm wrong. :)

--
Alexander Limi · http://limi.net


On Mon, Sep 28, 2009 at 10:41 AM, Takeshi Yamamoto <[hidden email]> wrote:
We want to keep encoding related properties.  Unification to one
encoding(UTF-8) is too early to do.

Actually, current Plone 3.3 already enforce us to use UTF-8, since
some "utf-8" specification are hard-coded already.
We are forced to convert encoding from UTF-8 to Shift-JIS to support
Keitai(Japan local web savvy mobile phone).
We wish if Plone could support Shift-JIS natively, too.

Best for us is Plone has capability to allow various encoding for
stored data and templates.
Next best is that Plone has capability to allow various encoding for
output data with real-time encoding conversion mechanism.

English world may not have a problem since UTF-8 is upper compatible
to ASCII.
But, UTF-8 is not compatible to any previous encoding mapping.  (eg.
Shift-JIS, euc_jp, ISO-2022-JP)
So, we have to convert them one by one with a mapping table.

Keitai is still using Shift-JIS for its web browser and e-mail.
Even though iPhone and Android are getting popularity, still Keitai is
dominant.
That means, most popular encoding for web browser(for PC and Keitai)
is Shift-JIS, which was designed for DOS.

Another annoying issue is e-mail encoding.
Since e-mail data stream needs to be 7-bit safe, official Japanese E-
mail text encoding is ISO-2022-JP,
which is using shift-out/shift-in escape sequence.  (Taiwan use Big5
for e-mail)

So, we want to have following two properties.
  - site default encoding (this may set to Shift-JIS for dedicated
Keitai application)
  - e-mail encoding (even though all contents are in UFT-8, we may
need to set this to ISO-2022-JP)

Takeshi Yamamoto

On Sep 28, 2009, at 10:38 PM, Hanno Schlichting wrote:

> On Mon, Sep 28, 2009 at 2:17 PM, Gilles Lenfant
> <[hidden email]> wrote:
>> Another thought: such things should be (IMHO) in a zope.conf section
>> (the ZConfig way)
>>
>> %import Products.CMFPlone
>> <plone>
>> charset utf-8
>> ...
>> </plone>
>>
>> This way, such configuration data are evaluated only at instance
>> startup. In addition, accessing such data is pretty faster than
>> reading the ZODB. I did such a thing to speed up iw.fss.
>
> Unfortunately we still very wildly support the "multiple instances per
> database" approach, which makes zope.conf wide configuration somewhat
> useless for us most of the time. Luckily plone.registry will be
> included in Plone 4.0 which gives a good abstraction for configuration
> data.
>
> On the matter of a site-encoding, the whole idea is deprecated
> anyways, as we'll have to move to storing all text as Unicode anyways
> in the mid-term.
>
> For applications storing data to the filesystem, I'd just keep it
> simple and say: utf-8 is the new ascii - configurable encoding schemes
> are too much hassle and you can just use utf-8 as a new default.
>
> Hanno
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart
> your
> developing skills, take BlackBerry mobile applications to market and
> stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register
> now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Plone-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/plone-developers


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi · http://limi.net

Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Takeshi Yamamoto
On Mon, Sep 28, 2009 at 7:41 PM, Takeshi Yamamoto <[hidden email]> wrote:
> So, we want to have following two properties.
>  - site default encoding (this may set to Shift-JIS for dedicated Keitai
> application)
>  - e-mail encoding (even though all contents are in UFT-8, we may need to
> set this to ISO-2022-JP)

You confuse two issues here. The existing site_encoding property so
far was used to determine in which encoding text data was stored in
the database and what Archetypes field accessor return. It doesn't
make sense to store text as binary encoded strings in a database that
is able to hold Unicode. This is what we are talking about and which
we need to aggressively attack, as it creates lots of pitfalls for
UnicodeDecodeErrors, causes performance degradation and makes it
impossible to even think about switching to Python 3.

What you are talking about is the encoding used to sent data back to
the browser. This is governed by the HTTP_ACCEPT_ENCODING header, the
zpublisher default encoding, the IUserPreferredCharsets adapter and
the global_cache_settings.pt. Unfortunately the last of these uses the
current site_encoding as well, which we should change, as it is
something very different from the internal storage format.

I'm pretty sure you can use all the above mentioned hooks to configure
whatever encoding your browser responses should have without the need
for an external proxy.

For mail encoding I know too little about the various formats to judge
what the best approach is. But I'm sure this should be a setting on
the MailHost object and not some Plone-site setting.

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
On Mon, Sep 28, 2009 at 12:56 PM, Hanno Schlichting <[hidden email]> wrote:
You confuse two issues here. The existing site_encoding property so
far was used to determine in which encoding text data was stored in
the database and what Archetypes field accessor return. It doesn't
make sense to store text as binary encoded strings in a database that
is able to hold Unicode. This is what we are talking about and which
we need to aggressively attack, as it creates lots of pitfalls for
UnicodeDecodeErrors, causes performance degradation and makes it
impossible to even think about switching to Python 3.

Out of curiosity, is this the same issue as causes problems like these?

-- 
Alexander Limi · http://limi.net

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi · http://limi.net

Wichert Akkerman

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Alexander Limi
On 2009-9-28 21:13, Alexander Limi wrote:
> The problem is, Japan is pretty much the only country that got as far as
> to standardizing on a non-standard charset early enough for it to make a
> difference.

There might be another issue: Unicode has several flaws which may
prevent it from being acceptable, especially in Asian environments. For
that reason it may still be important to support other encodings.

Wichert.


--
Wichert Akkerman <[hidden email]>   It is simple to make things.
http://www.wiggy.net/                  It is hard to make things simple.

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Alexander Limi
On Tue, Sep 29, 2009 at 2:44 AM, Alexander Limi <[hidden email]> wrote:
> Out of curiosity, is this the same issue as causes problems like these?
> http://thread.gmane.org/gmane.comp.web.zope.plone.user/101646

Kind of. This one is really specific to the generation of the image tag, though.

There's about a dozen different ways to get the full img-tag structure
generated for an image or image field. Some of those generate the
whole tag as an encoded string including the alt and title attributes.
Once that is combined with the rest of the TAL output which is all
Unicode it fails.

The solution is usually to call the proper API instead. A common
mistake is to get the value of an image field and then call that
directly. That's using the OFS.Image code which returns the wrong
thing. Instead you need to call the tag() method of the field.

Hanno

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Takeshi Yamamoto

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Hanno Schlichting-4
Thanks for information various advices.

I understood that you need to remove all acquisition mechanism including
existing site_encoding property.
We may consider some other way to set default encoding later for some  
cases.
Because, some people believes response encoding should be determined  
by the site owner, not the client-side.

E-mail is really complicated.  Encoding depends on languages.
According to RFC, Subject in the header can not hold UTF-8 string.  It  
only allows
7-bit code, thus we have to have =? and ?= brackets.

For example:
Subject: =?ISO-2022-JP?B?
W3BsaXA5MzA5OjkxXSBSZTogGyRCJVMlayVJJSIlJhsoQg==?=
=?ISO-2022-JP?B?GyRCJUgkT0Q+JGokXiQ3JD8kLCEiPzckPyRKTGRCaiQsISMbKEI=?=

Or, Does recent RFC allow UTF-8 for Subject and Body text?

We may come back to this issue some time later after MailHost gets  
converted.

We support to avoid to use acquisition, anyway.

Thank you.
Takeshi Yamamoto

On Sep 29, 2009, at 4:56 AM, Hanno Schlichting wrote:

> On Mon, Sep 28, 2009 at 7:41 PM, Takeshi Yamamoto <[hidden email]>  
> wrote:
>> So, we want to have following two properties.
>>  - site default encoding (this may set to Shift-JIS for dedicated  
>> Keitai
>> application)
>>  - e-mail encoding (even though all contents are in UFT-8, we may  
>> need to
>> set this to ISO-2022-JP)
>
> You confuse two issues here. The existing site_encoding property so
> far was used to determine in which encoding text data was stored in
> the database and what Archetypes field accessor return. It doesn't
> make sense to store text as binary encoded strings in a database that
> is able to hold Unicode. This is what we are talking about and which
> we need to aggressively attack, as it creates lots of pitfalls for
> UnicodeDecodeErrors, causes performance degradation and makes it
> impossible to even think about switching to Python 3.
>
> What you are talking about is the encoding used to sent data back to
> the browser. This is governed by the HTTP_ACCEPT_ENCODING header, the
> zpublisher default encoding, the IUserPreferredCharsets adapter and
> the global_cache_settings.pt. Unfortunately the last of these uses the
> current site_encoding as well, which we should change, as it is
> something very different from the internal storage format.
>
> I'm pretty sure you can use all the above mentioned hooks to configure
> whatever encoding your browser responses should have without the need
> for an external proxy.
>
> For mail encoding I know too little about the various formats to judge
> what the best approach is. But I'm sure this should be a setting on
> the MailHost object and not some Plone-site setting.
>
> Hanno


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Takeshi Yamamoto

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Wichert Akkerman
Thank you for understanding.

Actually, CJK character consolidation made some issues and that is one  
of the
reason to prevent UTF-8 from becoming dominant.

Could you imagine if Eszett letter of German and Beta letter of Greek  
have *same code*
on GK code page?  Assuming GK code page has all characters of German  
and Greek but
this letter shares same code.

Anyway, this is not the matter of Plone.  It happens on all CMS and  
systems and we
know how to deal with it.

Even though the speed is slow, UTF-8 is becoming major, I think.
So, other encodings need to be supported still.

Takeshi

On Sep 29, 2009, at 3:21 PM, Wichert Akkerman wrote:

> On 2009-9-28 21:13, Alexander Limi wrote:
>> The problem is, Japan is pretty much the only country that got as  
>> far as
>> to standardizing on a non-standard charset early enough for it to  
>> make a
>> difference.
>
> There might be another issue: Unicode has several flaws which may
> prevent it from being acceptable, especially in Asian environments.  
> For
> that reason it may still be important to support other encodings.
>
> Wichert.
>
>
> --
> Wichert Akkerman <[hidden email]>   It is simple to make things.
> http://www.wiggy.net/                  It is hard to make things  
> simple.
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry® Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart  
> your
> developing skills, take BlackBerry mobile applications to market and  
> stay
> ahead of the curve. Join us from November 9-12, 2009. Register  
> now!
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Plone-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/plone-developers


------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alec Mitchell

Re: Site encoding

Reply Threaded More More options
Print post
Permalink
In reply to this post by Takeshi Yamamoto
On Fri, Oct 2, 2009 at 6:35 AM, Takeshi Yamamoto <[hidden email]> wrote:

> Thanks for information various advices.
>
> I understood that you need to remove all acquisition mechanism including
> existing site_encoding property.
> We may consider some other way to set default encoding later for some
> cases.
> Because, some people believes response encoding should be determined
> by the site owner, not the client-side.
>
> E-mail is really complicated.  Encoding depends on languages.
> According to RFC, Subject in the header can not hold UTF-8 string.  It
> only allows
> 7-bit code, thus we have to have =? and ?= brackets.
>
> For example:
> Subject: =?ISO-2022-JP?B?
> W3BsaXA5MzA5OjkxXSBSZTogGyRCJVMlayVJJSIlJhsoQg==?=
> =?ISO-2022-JP?B?GyRCJUgkT0Q+JGokXiQ3JD8kLCEiPzckPyRKTGRCaiQsISMbKEI=?=
>
> Or, Does recent RFC allow UTF-8 for Subject and Body text?
>
> We may come back to this issue some time later after MailHost gets
> converted.

UTF-8 can be used in email headers using either header encoding Q
(which is similar to quoted-printable) or encoding B which is base64
as you have above (see RFC2047).  The updated MailHost in Zope 2.12
tries to ensure that any generated headers are properly encoded in
this fashion.

Alec

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers