|
|
|
Xumeng Chen
|
Some javascript/style in this post has been disabled (why?)
Hi All, Recently, I am developing with SHP Provider
to export an SHP file. During testing my code, I found sometimes the attributes
of the features which contain Chinese Characters can’t be written into the
file correctly, and my system locale is English(US), the exportation failed, because
Provider can’t store correct attribute name in the DBF file. After looking into the SHP provider code, I
find that when provider wrote attribute into DBF file, the wide strings will be
converted into multi-bytes with the current locale codepage. For example, in my
machine the Chinese strings were converted with 1252 codepage which is not
correct. After finding this, I tried to modify the
source code and hardcode the codepage to UTF-8, then the characters are written
and recognized correctly, and it seems that all right but there is a limitation.
The max length of attributes name in DBF is 11, if we convert all string with
UTF-8, only 5 letters are support when users in FR/DE locale… Does anyone have suggestion to my situation?
Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Dan Stoica
|
Some javascript/style in this post has been disabled (why?)
Does your SHP
dataset include a .cpg file? If no, you should create one. If yes, then it
should contain the Chinese codepage otherwise the multibyte conversion will
default to the machine’s locale. As for 11 characters
limit, there is nothing you can do since this comes from the DBF specification. From:
[hidden email]
[mailto:[hidden email]] On Behalf Of Xumeng Chen Hi All, Recently, I am developing with SHP Provider to export an SHP
file. During testing my code, I found sometimes the attributes of the features
which contain Chinese Characters can’t be written into the file
correctly, and my system locale is English(US), the exportation failed, because
Provider can’t store correct attribute name in the DBF file. After looking into the SHP provider code, I find that when
provider wrote attribute into DBF file, the wide strings will be converted into
multi-bytes with the current locale codepage. For example, in my machine the
Chinese strings were converted with 1252 codepage which is not correct. After finding this, I tried to modify the source code and
hardcode the codepage to UTF-8, then the characters are written and recognized
correctly, and it seems that all right but there is a limitation. The max
length of attributes name in DBF is 11, if we convert all string with UTF-8,
only 5 letters are support when users in FR/DE locale… Does anyone have suggestion to my situation? Like how to
work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Xumeng Chen
|
In reply to this post
by Xumeng Chen
Some javascript/style in this post has been disabled (why?)
Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can’t support writing multi-characters into DBF file when the system locale didn’t support this language(didn’t have the codepage). Currently, when users new a SHP dataset, provider will
use the codepage in system locale(get the codepage through function setlocale(LC_ALL,
“”)). In the most time, doing this is correct, but in the special
situation like in the English locale to write Chinese character, user will get
only “????”… Thanks, Jimmy From: Dan
Stoica Does
your SHP dataset include a .cpg file? If no, you should create one. If yes,
then it should contain the Chinese codepage otherwise the multibyte conversion
will default to the machine’s locale. As
for 11 characters limit, there is nothing you can do since this comes from the
DBF specification. From:
[hidden email]
[mailto:[hidden email]] On Behalf Of Xumeng Chen Hi All, Recently, I am developing with SHP Provider
to export an SHP file. During testing my code, I found sometimes the attributes
of the features which contain Chinese Characters can’t be written into
the file correctly, and my system locale is English(US), the exportation
failed, because Provider can’t store correct attribute name in the DBF
file. After looking into the SHP provider code, I
find that when provider wrote attribute into DBF file, the wide strings will be
converted into multi-bytes with the current locale codepage. For example, in my
machine the Chinese strings were converted with 1252 codepage which is not
correct. After finding this, I tried to modify the
source code and hardcode the codepage to UTF-8, then the characters are written
and recognized correctly, and it seems that all right but there is a
limitation. The max length of attributes name in DBF is 11, if we convert all
string with UTF-8, only 5 letters are support when users in FR/DE locale… Does anyone have suggestion to my
situation? Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Xumeng Chen
|
In reply to this post
by Xumeng Chen
I find a point on ESRI.
http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106 "Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop." So can we do this? Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 9:56 AM To: Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hey guys, The SHP format is not really supposed to be Unicode-capable (it's from the 1980s). The CPG file is essentially is a hack that ESRI added after the fact for ArcPad. Even if you change the source code to be able to write strings in UTF-8 format (effectively breaking the SHP standard), the resulting SHP file will not work in ESRI applications... Traian ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 8:34 PM To: [hidden email]; Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can't support writing multi-characters into DBF file when the system locale didn't support this language(didn't have the codepage). Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, "")). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only "????"... Thanks, Jimmy From: Dan Stoica Sent: Tuesday, May 05, 2009 10:16 PM To: FDO Internals Mail List Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine's locale. As for 11 characters limit, there is nothing you can do since this comes from the DBF specification. From: [hidden email] [mailto:[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 5:03 AM To: [hidden email] Cc: Kenny Jian; John Jiang; Hunter Chen Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hi All, Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can't be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can't store correct attribute name in the DBF file. After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct. After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale... Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Traian Stanev
|
Hmmm, I guess you are right. So in your specific case you are looking to create an SHP file using FDO with an used-overridden value for the codepage. I don't know the details about the implementation of the codepage stuff in the SHP provider (someone else will probably speak to that), but one possible solution would be to allow for an optional connection property that specifies the override codepage to use when creating new SHP files. I don't know if that's the best way though. Traian ________________________________________ From: Xumeng Chen Sent: Tuesday, May 05, 2009 10:09 PM To: Traian Stanev Cc: Kenny Jian; John Jiang; Hunter Chen; [hidden email] Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. I find a point on ESRI. http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106 "Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop." So can we do this? Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 9:56 AM To: Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hey guys, The SHP format is not really supposed to be Unicode-capable (it's from the 1980s). The CPG file is essentially is a hack that ESRI added after the fact for ArcPad. Even if you change the source code to be able to write strings in UTF-8 format (effectively breaking the SHP standard), the resulting SHP file will not work in ESRI applications... Traian ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 8:34 PM To: [hidden email]; Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can’t support writing multi-characters into DBF file when the system locale didn’t support this language(didn’t have the codepage). Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, “”)). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only “????”… Thanks, Jimmy From: Dan Stoica Sent: Tuesday, May 05, 2009 10:16 PM To: FDO Internals Mail List Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine’s locale. As for 11 characters limit, there is nothing you can do since this comes from the DBF specification. From: [hidden email] [mailto:[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 5:03 AM To: [hidden email] Cc: Kenny Jian; John Jiang; Hunter Chen Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hi All, Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can’t be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can’t store correct attribute name in the DBF file. After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct. After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale… Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Xumeng Chen
|
It is a good idea to add extra property of codepage in connection string.
Thanks, Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 11:35 AM To: Xumeng Chen Cc: Kenny Jian; John Jiang; Hunter Chen; [hidden email] Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hmmm, I guess you are right. So in your specific case you are looking to create an SHP file using FDO with an used-overridden value for the codepage. I don't know the details about the implementation of the codepage stuff in the SHP provider (someone else will probably speak to that), but one possible solution would be to allow for an optional connection property that specifies the override codepage to use when creating new SHP files. I don't know if that's the best way though. Traian ________________________________________ From: Xumeng Chen Sent: Tuesday, May 05, 2009 10:09 PM To: Traian Stanev Cc: Kenny Jian; John Jiang; Hunter Chen; [hidden email] Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. I find a point on ESRI. http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106 "Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop." So can we do this? Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 9:56 AM To: Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hey guys, The SHP format is not really supposed to be Unicode-capable (it's from the 1980s). The CPG file is essentially is a hack that ESRI added after the fact for ArcPad. Even if you change the source code to be able to write strings in UTF-8 format (effectively breaking the SHP standard), the resulting SHP file will not work in ESRI applications... Traian ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 8:34 PM To: [hidden email]; Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can't support writing multi-characters into DBF file when the system locale didn't support this language(didn't have the codepage). Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, "")). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only "????"... Thanks, Jimmy From: Dan Stoica Sent: Tuesday, May 05, 2009 10:16 PM To: FDO Internals Mail List Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine's locale. As for 11 characters limit, there is nothing you can do since this comes from the DBF specification. From: [hidden email] [mailto:[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 5:03 AM To: [hidden email] Cc: Kenny Jian; John Jiang; Hunter Chen Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hi All, Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can't be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can't store correct attribute name in the DBF file. After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct. After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale... Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
|
Dan Stoica
|
> But how can I use the UTF-8 codepage in SHP provider?
Did you try with "UTF-8" in the .cpg ? Dan. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Xumeng Chen Sent: Wednesday, May 06, 2009 5:51 AM To: Traian Stanev Cc: Kenny Jian; John Jiang; [hidden email]; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. It is a good idea to add extra property of codepage in connection string. Thanks, Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 11:35 AM To: Xumeng Chen Cc: Kenny Jian; John Jiang; Hunter Chen; [hidden email] Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hmmm, I guess you are right. So in your specific case you are looking to create an SHP file using FDO with an used-overridden value for the codepage. I don't know the details about the implementation of the codepage stuff in the SHP provider (someone else will probably speak to that), but one possible solution would be to allow for an optional connection property that specifies the override codepage to use when creating new SHP files. I don't know if that's the best way though. Traian ________________________________________ From: Xumeng Chen Sent: Tuesday, May 05, 2009 10:09 PM To: Traian Stanev Cc: Kenny Jian; John Jiang; Hunter Chen; [hidden email] Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. I find a point on ESRI. http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106 "Shapefile can now be stored in UTF-8. However, Shapefile encoded in UTF-8 is only recognized in ArcGIS Desktop." So can we do this? Jimmy -----Original Message----- From: Traian Stanev Sent: Wednesday, May 06, 2009 9:56 AM To: Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hey guys, The SHP format is not really supposed to be Unicode-capable (it's from the 1980s). The CPG file is essentially is a hack that ESRI added after the fact for ArcPad. Even if you change the source code to be able to write strings in UTF-8 format (effectively breaking the SHP standard), the resulting SHP file will not work in ESRI applications... Traian ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 8:34 PM To: [hidden email]; Dan Stoica Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Yes, if I created a new SHP dataset through FDO in EN locale, a default .CPG file was generated with 1252 codepage in it. When I generate a dataset which contains Chinese characters in its attribute names, I have to use UTF-8(65001) codepage and not the codepage(1252) of system locale which will possibly do incorrect conversion. But how can I use the UTF-8 codepage in SHP provider? Change the source code? Because it seems that SHP provider has a globalization(UNICODE) defect, it can't support writing multi-characters into DBF file when the system locale didn't support this language(didn't have the codepage). Currently, when users new a SHP dataset, provider will use the codepage in system locale(get the codepage through function setlocale(LC_ALL, "")). In the most time, doing this is correct, but in the special situation like in the English locale to write Chinese character, user will get only "????"... Thanks, Jimmy From: Dan Stoica Sent: Tuesday, May 05, 2009 10:16 PM To: FDO Internals Mail List Cc: Kenny Jian; John Jiang; Hunter Chen Subject: RE: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Does your SHP dataset include a .cpg file? If no, you should create one. If yes, then it should contain the Chinese codepage otherwise the multibyte conversion will default to the machine's locale. As for 11 characters limit, there is nothing you can do since this comes from the DBF specification. From: [hidden email] [mailto:[hidden email]] On Behalf Of Xumeng Chen Sent: Tuesday, May 05, 2009 5:03 AM To: [hidden email] Cc: Kenny Jian; John Jiang; Hunter Chen Subject: [fdo-internals] SHP Provider: Writing Chinese Characters into DBF Fails When the System locale is EN. Hi All, Recently, I am developing with SHP Provider to export an SHP file. During testing my code, I found sometimes the attributes of the features which contain Chinese Characters can't be written into the file correctly, and my system locale is English(US), the exportation failed, because Provider can't store correct attribute name in the DBF file. After looking into the SHP provider code, I find that when provider wrote attribute into DBF file, the wide strings will be converted into multi-bytes with the current locale codepage. For example, in my machine the Chinese strings were converted with 1252 codepage which is not correct. After finding this, I tried to modify the source code and hardcode the codepage to UTF-8, then the characters are written and recognized correctly, and it seems that all right but there is a limitation. The max length of attributes name in DBF is 11, if we convert all string with UTF-8, only 5 letters are support when users in FR/DE locale... Does anyone have suggestion to my situation? Like how to work around it, or suggestion to my fixing? Any suggestion is appreciated highly. Thanks, Jimmy _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals _______________________________________________ fdo-internals mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/fdo-internals |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |