Important note regarding MySQL provider changes in trunk (character sets).

1 message Options
Embed this post
Permalink
Brent Robinson

Important note regarding MySQL provider changes in trunk (character sets).

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

Hi,

 

Some significant changes were made today, in the open-source trunk repository, for the MySQL provider. The main change was to set the connection character set to UTF8. Previously, the character set was Latin1 (the default), but the MySQL provider was passing UTF8-encoded strings to MySQL. This caused a few problems when names and strings contained non-ASCII7 characters:

 

            - After creating a datastore with non-ASCII7 name, it was not possible to open it.

            - String data inserted into tables, with double byte character sets, could not be read properly

            - Any string data inserted by FDO, into MySQL tables, was double UTF8 encoded, making it difficult for 3rd party applications to read it.

 

The changes have the following impacts on the MySQL Provider:

 

    - minimum MySQL version now 5.0.22

- when you create a new datastore, through FDO, the character set and collation are the MySQL server defaults. Previously, these were set to

  “latin1” and “latin1_bin” respectively.

    - the current open-source MySQL Provider and previous versions are no longer compatible for non-ASCII7 users.

 

MySQL Version:

 

The changes rely on some collation fixes made to MySQL somewhere between 5.0.15 and 5.0.22. Therefore, it is recommend that you upgrade to MySQL 5.0.22 before using the latest open-source MySQL Provider.

 

Datastore character set:

 

    Previously, all datastores created by FDO had default character set Latin1 and default collation Latin1_bin. With this change, the character set and collation default to that of the MySQL server. To find out what these settings currently are, connect directly to MySQL and execute the following:

 

        show variables like 'character_set_server';

        show variables like 'collation_server';

 

    You are now responsible for using a server character set and collation that matches your data. For English and Western European customers, the Latin1 character set is fine. Other customers will need to use the UTF8 character set, or a language specific one such as CP932 (Japanese). When in doubt, UTF8 is the best one to use since it is a Unicode format.

 

    The server collation determines whether your data is treated as case-sensitive or case-insensitive. This includes column contents, not just table and column names. Note that the default is usually a case-insensitive one.

   

  Server character set and collation can be set by stopping the MySQL service, modifying my.ini, and restarting the service. In my.ini, look for any:

 

       default-character-set=

 

  settings, and change these to the desired character set. The collation can be set by adding a default-collation setting to the [mysqld] section, e.g:

 

        default-collation=utf8_bin

 

    to use case-insensitive UTF8 collation.

 

    Please refer to the MySQL documentation at www.mysql.com for more information on character sets and collations.

 

Provider and Datastore Compatibility:

 

Previous versions of the MySQL provider inserted strings, via Feature Commands, that were UTF8-encoded once too many. This change fixes this problem. However, this means that pre-existing MySQL datastores can be properly accessed by both current and previous MySQL providers only if all of the following is true:

 

        - none of the character data in the datastore contains any non-ASCII7 characters

        - the default character set for the datastore is latin1

        - the default collation for the datastore is latin1_bin

        - the server character set is latin1

        - the server collation is latin1_bin.

 

    For all other cases, pre-existing datastores must be upgraded before they are accessed by the current MySQL Provider. This can be done by doing the following:

 

        - copy the datastore to SDF, using previous MySQL Provider.

        - create a new MySQL datastore, using current MySQL Provider

        - copy the above SDF data to the new datastore, using current MySQL Provider.

 

Note that the above is not an in-place conversion. You must create a new datastore for use by the current MySQL Provider.

 

Note also that datastores created by the current MySQL Provider must not be accessed by previous provider versions.

 

Just a bit more background on the above info. ASCII7 characters ( 0x00-0x7f) have the same values for their representations in various character sets. Therefore the extra UTF8 encoding, done by previous MySQL Providers, on strings to insert, had no effect. For this reason, there are no compatibility problems for datastores with only latin1_bin tables, storing only ASCII7 characters. However, for other characters, their values are different for different character sets so the extra UTF8 encoding does change their values, thus leading to the above compatibility problems.

 

 

Brent Robinson

 

  

 


_______________________________________________
fdo-announce mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-announce