XML/XSLT processing times

3 messages Options
Embed this post
Permalink
Colin Paul Adams

XML/XSLT processing times

Reply Threaded More More options
Print post
Permalink
I am just about ready to give up on the XSLT library. The W3C's
attitude to the XML "standard" being a prime driver (what's the point
of implementing a standard when it will arbitrarily and
retrospectively be pulled from underneath you?). The other is my
inability to solve the performance problem.

I have done some timings on an identity transformation of a 10MB XML
file. This is likely to be the worst case scenario for using
ST_STRING, as the cost of copying all the strings from UC_UTF8_STRING
to ST_STRING ought to be significant compared with the rest of the
processing (which is the least possible for any XSLT transformation
that does not involve filtering out some of the data). Even so, this
cost ought to be relatively low, and I would have hoped for an overall
improvement.

Instead the runtime goes from 26 seconds to 35 seconds.

I did some timings of other methods of copying an XML file to try to
put this into perspective:

1. Identity transformation using Saxon and xsltproc -> 1.5 and 1
second respectively. Pretty damning.
2. Linux cp command - 25 milliseconds.
3. Gobo Eiffel XML parser using the example/xml/tree/formatter program
(modified to operate in unicode string mode, as there were unicode
character references in the XML file, and commenting out the DTD in
the file, as the program will not process an external DTD) - 15
seconds.
4. Gobo Eiffel XML parser using the example/xml/event/print program
(same modifications as above). Output redirected to a file. - 26 seconds.

The last one was particularly intriguing, as I noticed half the time
was kernel time. I redirected it to /dev/null instead and it came down
to 10 seconds. I'm not sure what is going on at all here. Any thoughts?

Anyway, there seems little hope of getting the XSLT library to perform
on this basis. It would seem the entire XML parser/event
infrastructure would need re-writing. I have no appetite for the task
given the current W3C climate.

ST_STRING might be useful anyway. Shall I post the classes here for review?
--
Colin Adams
Preston Lancashire

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop
Berend de Boer

Re: XML/XSLT processing times

Reply Threaded More More options
Print post
Permalink
>>>>> "Colin" == Colin Paul Adams <[hidden email]> writes:

    Colin> I did some timings of other methods of copying an XML file to
    Colin> try to put this into perspective:

You could try the expat one, this was one of my reasons for using expat
as expat was still faster, despite the prohibitive event chaining
overhead. It's an interesting architecture, but performance is ****.

Did you look at memory consumption as well? That's also a biggie with
Eiffel. If you don't use memory, like the C versions do, performance is
much better.


    Colin> Anyway, there seems little hope of getting the XSLT library
    Colin> to perform on this basis. It would seem the entire XML
    Colin> parser/event infrastructure would need re-writing. I have no
    Colin> appetite for the task given the current W3C climate.

If you need performance, absolutely. As we can now use agents, we
probably can cleanup this a bit, and have the event infrastructure sit
on top of that.

BTW, I've always found the ISE profiler an indispensable tool in solving
performance problems, but as you say, it'll be a lot of work.

--
Cheers,

Berend de Boer

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop
Franck Arnaud

Re: XML/XSLT processing times

Reply Threaded More More options
Print post
Permalink
In reply to this post by Colin Paul Adams

> 2. Linux cp command - 25 milliseconds.

this is possibly unfair (e.g. if done all in kernel space and returning
before it's really finished).

> 3. Gobo Eiffel XML parser using the example/xml/tree/formatter program
> (modified to operate in unicode string mode, as there were unicode
> character references in the XML file, and commenting out the DTD in
> the file, as the program will not process an external DTD) - 15
> seconds.

it would be interesting to check with a similar example with no unicode,
to see if we're hit by the unicode processing (which wouldn't surprise
me).

> 4. Gobo Eiffel XML parser using the example/xml/event/print program
> (same modifications as above). Output redirected to a file. - 26 seconds.
>
> The last one was particularly intriguing, as I noticed half the time
> was kernel time. I redirected it to /dev/null instead and it came down
> to 10 seconds. I'm not sure what is going on at all here. Any thoughts?

hm interesting. a guess is that we may be feeding the output 1 character
at a time (you could check with strace) and the other side doesn't like
being fed that way (but it's not in the system call overhead as such, as
/dev/null has the same overhead). if it's the case, it's a bit
surprising (I'd have expected libc and/or the kernel to do buffering).

> ST_STRING might be useful anyway. Shall I post the classes here for
> review?

I'd say yes.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
gobo-eiffel-develop mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop