Re: UTF-8 smart pinyin traditional input?

9 messages Options
Embed this post
Permalink
Ronald Stroethoff

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink

>Is there an input module in development for scim that handles traditional
>Chinese input using pinyin (also using UTF-8)?
>
>If so, could I get some information on where to get it and perhaps who to
>contact?
>If not, I am interested in developing such a module.  Could someone help me
>get started by pointing me to some good resources?
>
>
>Thanks.
>
>Eric
I think that it will be working more or less the same as simplifiedl >Chinese
input using pinyin.
Therefor a good starting point good be:
http://scim.cvs.sourceforge.net/scim/scim-pinyin/

here is the home of scim-pinyin (simplified)

I think that a bigger problem is where to find a descent table with pinyin and
chinese with the apropiate license (well, I do not want to type this table).

Ronald Stroethoff

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
keith321

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
Ronald Stroethoff wrote:
I think that a bigger problem is where to find a descent table with pinyin and
chinese with the apropiate license (well, I do not want to type this table).

Ronald Stroethoff
Are the tables used for traditional and simple Chinese input via Smart Pinyin different? And would it be possible to modify the current table, and contribute it back? I would like to make contribute, but I don't want to have to start from scratch.

Keith
David Oftedal

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
On Thu, 21 Aug 2008 13:14:37 -0700 (PDT)
pwnedd <[hidden email]> wrote:

>
>
> Ronald Stroethoff wrote:
> >
> > I think that a bigger problem is where to find a descent table with
> > pinyin and
> > chinese with the apropiate license (well, I do not want to type this
> > table).
> >
> > Ronald Stroethoff
> >
>
> Are the tables used for traditional and simple Chinese input via Smart
> Pinyin different? And would it be possible to modify the current
> table, and contribute it back? I would like to make contribute, but I
> don't want to have to start from scratch.
>
> Keith
>
The biggest problem would probably be that a number of simplified
Chinese characters map to any of several traditional Chinese
characters, so that any word containing even one of those characters
would have to be typed in manually. This is hardly true of any
characters when converting from simplified to traditional Chinese, so
it's too bad the data aren't in traditional Chinese originally.

Another problem is that some words, especially foreign names but also
Chinese words, differ between the mainland and the areas where
traditional Chinese is or has been used. An input method geared towards
mainland users might not contain the words used in other variants of
Mandarin Chinese, so those would have to be added from another source.

So in other words, it's doable, but not trivial...?

- David Oftedal

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
David Oftedal

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
> This is hardly true of any characters when converting from simplified
> to traditional Chinese
Ahem... from traditional to simplified.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
Choe Hwanjin

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
Do you just want to input traditional chinese with smart pinyin engine?
Scim's filter feature will be a solution for you.

Run scim-setup.
Select IMEngine - Global Setup and select smart pinyin.
Click "Select Filters" button.
Then, you can select TC<->SC filter.
This filter will make you see the lookup string in TC and input TC.


On Fri, Aug 22, 2008 at 7:15 AM, David Oftedal <[hidden email]> wrote:

>> This is hardly true of any characters when converting from simplified
>> to traditional Chinese
> Ahem... from traditional to simplified.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Scim-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/scim-devel
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
keith321

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
Choe:

Thanks for the suggestions. I have not tried out the filters before. I turned them on and
restarted scim, but they don't seem to be working. Is there anything else I need to do to activate and use them?

David:

Thanks for the feedback. I've looked into the mapping problem once a long time ago in order to help get a dictionary website I really liked (adsotrans.com) better support for traditional. At the time there were not very many solutions for handling it. I've noticed since then though that some people have at least started to solve the problem (e.g. Wikipedia http://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and_traditional_Chinese).

Do you know how the tables are currently handled? Are the simplified tables simply converted in some lossy manner to traditional ones? Also, do you know if it's possible for user-corrected data to be used to build better tables? That is, If I use the traditional style smart pinyin input a lot, it's character prediction will start to get better. Could this data then be used (from one, or possibly many users) to improve the tables?


Thanks,
Keith

Choe Hwanjin wrote:
Do you just want to input traditional chinese with smart pinyin engine?
Scim's filter feature will be a solution for you.

Run scim-setup.
Select IMEngine - Global Setup and select smart pinyin.
Click "Select Filters" button.
Then, you can select TC<->SC filter.
This filter will make you see the lookup string in TC and input TC.


On Fri, Aug 22, 2008 at 7:15 AM, David Oftedal <david@start.no> wrote:
>> This is hardly true of any characters when converting from simplified
>> to traditional Chinese
> Ahem... from traditional to simplified.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Scim-devel mailing list
> Scim-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scim-devel
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
Scim-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scim-devel
Choe Hwanjin

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
On Fri, Aug 22, 2008 at 10:07 PM, pwnedd <[hidden email]> wrote:
>
> Choe:
>
> Thanks for the suggestions. I have not tried out the filters before. I
> turned them on and
> restarted scim, but they don't seem to be working. Is there anything else I
> need to do to activate and use them?
>

I don't know much about the filter feature.
I tested the filter long time ago, and I succeeded.
I guess, it may have some bugs.
Try it on other locale, such as zh_CN.UTF-8.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
David Oftedal

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
In reply to this post by keith321
On Fri, 22 Aug 2008 06:07:05 -0700 (PDT)
pwnedd <[hidden email]> wrote:


> I've noticed since then > though that some people have at least
> started to solve the problem (e.g. Wikipedia
> http://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and_traditional_Chinese).
>
> Do you know how the tables are currently handled? Are the simplified
> tables simply converted in some lossy manner to traditional ones?
> Also, do you know if it's possible for user-corrected data to be used
> to build better tables? That is, If I use the traditional style smart
> pinyin input a lot, it's character prediction will start to get
> better. Could this data then be used (from one, or possibly many
> users) to improve the tables?

I'm not sure how the internals of the Smart Pinyin method work, but I
believe what Wikimedia does is do a best-effort conversion by using a
conversion table, and for each case where the conversion turns out
wrong, the word is either manually corrected through some special
mark-up on the page or added to the table.

I'd be interested in knowing how SCIM's filter differentiates between
these merged characters, though...

-David

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Scim-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/scim-devel
yuanzhoulv

Re: UTF-8 smart pinyin traditional input?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ronald Stroethoff
OK everyone, I just made a workaround for this. It's not intended to be part of SCIM (I can't make head or tail of how CVS works or what's in a Makefile) but I attempted to do a massive conversion of the whole SCIM phrase library based on the CEDICT project and some code that tries to assemble logical phrases and make translations. Again, this is just a workaround, not a replacement for proper support in the development tree.

http://dheera.net/projects/scimfanti.php