Converting UC_STRING to/from C pointer to UTF8 octet sequences

3 messages Options
Embed this post
Permalink
Paul Cohen-4

Converting UC_STRING to/from C pointer to UTF8 octet sequences

Reply Threaded More More options
Print post
Permalink
Hi all,

I need some advice/feedback on how to convert/create a UC_STRING from
a C pointer to a UTF8 byte sequence (and vice versa). I can't find any
routines or examples in the Gobo unicode library for doing it. The
need for this arises when writing Eiffel shared libraries (with
C-style API:s).

This is my current appach:

    uc_string_from_utf8_pointer (p: POINTER): UC_STRING is
            -- New UC_STRING object from UTF8 octet sequence pointed to by `p'.
        require
            p_not_default_pointer: p /= default_pointer
        local
            utf8: STRING_8
        do
            create utf8.make_from_c (p)
            create Result.make_from_utf8 (utf8)
        ensure
            result_not_void: Result /= Void
        end

It is unfortunate that I have to create two "string objects" to
perform the "conversion".

Returning a pointer to a UTF8 octet sequence is trickier. An initial
approach could be:

    utf8_pointer_from_uc_string (s: UC_STRING): POINTER is
            -- A pointer to the UTF8 octet sequence of `s'.
        require
            s_not_void: s /= Void
        local
            utf8: STRING_8
            a: ANY
        do
            utf8 := s.to_utf8
            a := utf8.to_c
            Result := $a
        ensure
            result_not_default_pointer: Result /= default_pointer
        end

This is not a good solution since 'utf8' and its memory area may not
be available any longer when the client tries to access the octet
sequence (they are on the stack). Many C API:s are written so that the
caller must supply a memory buffer that the API implementation uses
for returning the result, leaving the caller responsable for memory
management. So a better approach would maybe be something like:

    fill_utf8_buffer_from_uc_string (s: UC_STRING; buffer: STRING_8) is
            -- Fill the buffer with the UTF8 octet sequence of `s'.
        require
            s_not_void: s /= Void
            buffer_not_void: buffer /= Void
            buffer_large_enough: s.to_utf8.count <= buffer.count
        local
            utf8: STRING_8
            a: ANY
            p_buffer, p_utf8: POINTER
        do
            utf8 := s.to_utf8
            a := buffer.to_c
            p_buffer := $buffer
            a := utf8.to_c
            p_utf8 := $a
            p_buffer.memory_copy (p_utf8, utf8.count)
        end

Yet another approach would be for the Eiffel class that implements the
API to maintain a buffer of memory where the returned result is put.
However then one will run into multprogramming problems with accessing
shared data.

Any feedback on this issue would be welcome!

/Paul

--
Paul Cohen
www.seibostudios.se
mobile: +46 730 787 035
e-mail: [hidden email]

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop
Berend de Boer

Re: Converting UC_STRING to/from C pointer to UTF8 octet sequences

Reply Threaded More More options
Print post
Permalink
>>>>> "Paul" == Paul Cohen <[hidden email]> writes:

    Paul> Hi all, I need some advice/feedback on how to convert/create
    Paul> a UC_STRING from a C pointer to a UTF8 byte sequence (and
    Paul> vice versa). I can't find any routines or examples in the
    Paul> Gobo unicode library for doing it. The need for this arises
    Paul> when writing Eiffel shared libraries (with C-style API:s).

    Paul> It is unfortunate that I have to create two "string objects"
    Paul> to perform the "conversion".

You can iterate over the C pointer and do what make_from_utf8 does if
this is a concern.


    Paul> Returning a pointer to a UTF8 octet sequence is trickier.

It is. I've found out that the only safe way to do this in ISE is to
allocate memory with C (malloc), copy the string to there and give a
pointer to that area to C.

This was tested on a large production system with millions of such
calls, and any attempt to give pointers to an Eiffel object would fail
with weird errors after a while. There appeared to be no safe way to
retrieve a pointer to an ISE Eiffel object, whatever tricks were
applied.

--
Cheers,

Berend de Boer

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop
Paul Cohen-4

Re: Converting UC_STRING to/from C pointer to UTF8 octet sequences

Reply Threaded More More options
Print post
Permalink
Hi,

Thanks Berend for your reply. For some reason I managed to miss your
reply and didn't see it until now.

On Tue, May 19, 2009 at 8:03 AM, Berend de Boer <[hidden email]> wrote:

>>>>>> "Paul" == Paul Cohen <[hidden email]> writes:
>    Paul> Hi all, I need some advice/feedback on how to convert/create
>    Paul> a UC_STRING from a C pointer to a UTF8 byte sequence (and
>    Paul> vice versa). I can't find any routines or examples in the
>    Paul> Gobo unicode library for doing it. The need for this arises
>    Paul> when writing Eiffel shared libraries (with C-style API:s).
>
>    Paul> It is unfortunate that I have to create two "string objects"
>    Paul> to perform the "conversion".
>
> You can iterate over the C pointer and do what make_from_utf8 does if
> this is a concern.

Nice. Thanks.

>    Paul> Returning a pointer to a UTF8 octet sequence is trickier.
>
> It is. I've found out that the only safe way to do this in ISE is to
> allocate memory with C (malloc), copy the string to there and give a
> pointer to that area to C.

Ok. That what was I was thinking.

> This was tested on a large production system with millions of such
> calls, and any attempt to give pointers to an Eiffel object would fail
> with weird errors after a while. There appeared to be no safe way to
> retrieve a pointer to an ISE Eiffel object, whatever tricks were
> applied.

Hmm. That's what I feared.

Berend, do you have any comment/opinion on the API design where the
caller supplies allocated memory of some size and where the supplier
(Eiffel side) can:

  1. Fill the allocated memory with the result, if the memory is large enough.

  2. Return an error code indicating that the memory provided is to
small to return the result.

Personally I think this is a rather good design principle for C API:s
and makes it easier to use threads on the client side. However it
makes it a little bit tricker to wrap the C API from Python, Ruby or
other languages.

It would be nice if one could mark given runtime objects or a class as
"uncollectable" by the garbage collector, so that they are not touched
(collected or moved) even if there are no Eiffel references to it.
Maybe something for the Gobo Eiffel compiler? Even nicer would be if
one could create such uncollectable objects on a once per thread
basis. It would make it much easier to write thread-safe C style API:s
in Eiffel!

/Paul

--
Paul Cohen
www.seibostudios.se
mobile: +46 730 787 035
e-mail: [hidden email]

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop