List of pages to fix for HTML parsing errors

4 messages Options
Embed this post
Permalink
Laurence Rowe () List of pages to fix for HTML parsing errors
Reply Threaded More More options
Print post
Permalink
Hi,

Since we now use an xdv theme on plone.org, we have some problems with
the libxml2 HTMLParser not handling some invalid html. Our custom
error page logs each view of a failed page to google analytics, and
I've updated the error page to now show the parse error.

Here's a spreadsheet with all the pages, if you own any of them please
do fix them up. Once fixed, please remove the line so that we don't
duplicate effort!

http://spreadsheets.google.com/ccc?key=rXvGZblMmdeYwuhp_AOzBYw

Note: to click a link click into the link's cell then click the button
that appears to the left.

Laurence

(436 pages showing an error at time of writing)

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Plone-docs mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-docs
Laurence Rowe () Re: List of pages to fix for HTML parsing errors
Reply Threaded More More options
Print post
Permalink
And for those thinking surely this should be fixed in code.... Most of
the errors come from stx documents,  zope.structuredtext may be found
here: http://svn.zope.org/repos/main/zope.structuredtext/trunk/

- html.py should be made to produce valid (x)html, this requires

  i) extending the paragraph_nestable concept to observe all html
nesting rules - see http://www.cs.tut.fi/~jkorpela/html/nesting.html

  ii) html quoting in the _text method.

reST rendering should also be fixed and probably requires a similar approach.

Laurence

2009/5/17 Laurence Rowe <[hidden email]>:

> Hi,
>
> Since we now use an xdv theme on plone.org, we have some problems with
> the libxml2 HTMLParser not handling some invalid html. Our custom
> error page logs each view of a failed page to google analytics, and
> I've updated the error page to now show the parse error.
>
> Here's a spreadsheet with all the pages, if you own any of them please
> do fix them up. Once fixed, please remove the line so that we don't
> duplicate effort!
>
> http://spreadsheets.google.com/ccc?key=rXvGZblMmdeYwuhp_AOzBYw
>
> Note: to click a link click into the link's cell then click the button
> that appears to the left.
>
> Laurence
>
> (436 pages showing an error at time of writing)
>

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Plone-docs mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-docs
Israel Saeta Pérez () Re: List of pages to fix for HTML parsing errors
Reply Threaded More More options
Print post
Permalink
I've fixed a bunch of them related to documentation and striked them out in the spreadsheet. As you've said, most of them come from stx documents, where spurious <p> elements are inserted, even inside literal blocks. There are also some problems with URL encoding (with composite query strings) and entities encoding (&amp; &lt; &gt;).

By the way, do we really need such a strict parsing that makes the page rendering blow up whenever the XHTML is not perfect? Can't the libxml2 parser be 'patched' to accept valid enough XHTML?

-- israel


On Sun, May 17, 2009 at 1:35 AM, Laurence Rowe <[hidden email]> wrote:
And for those thinking surely this should be fixed in code.... Most of
the errors come from stx documents,  zope.structuredtext may be found
here: http://svn.zope.org/repos/main/zope.structuredtext/trunk/

- html.py should be made to produce valid (x)html, this requires

 i) extending the paragraph_nestable concept to observe all html
nesting rules - see http://www.cs.tut.fi/~jkorpela/html/nesting.html

 ii) html quoting in the _text method.

reST rendering should also be fixed and probably requires a similar approach.

Laurence

2009/5/17 Laurence Rowe <[hidden email]>:
> Hi,
>
> Since we now use an xdv theme on plone.org, we have some problems with
> the libxml2 HTMLParser not handling some invalid html. Our custom
> error page logs each view of a failed page to google analytics, and
> I've updated the error page to now show the parse error.
>
> Here's a spreadsheet with all the pages, if you own any of them please
> do fix them up. Once fixed, please remove the line so that we don't
> duplicate effort!
>
> http://spreadsheets.google.com/ccc?key=rXvGZblMmdeYwuhp_AOzBYw
>
> Note: to click a link click into the link's cell then click the button
> that appears to the left.
>
> Laurence
>
> (436 pages showing an error at time of writing)
>

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Plone-docs mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-docs


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Plone-docs mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-docs
Israel Saeta Pérez
Laurence Rowe () Re: List of pages to fix for HTML parsing errors
Reply Threaded More More options
Print post
Permalink
2009/5/17 Israel Saeta Pérez <[hidden email]>:
> I've fixed a bunch of them related to documentation and striked them out in
> the spreadsheet. As you've said, most of them come from stx documents, where
> spurious <p> elements are inserted, even inside literal blocks. There are
> also some problems with URL encoding (with composite query strings) and
> entities encoding (& < >).
>
> By the way, do we really need such a strict parsing that makes the page
> rendering blow up whenever the XHTML is not perfect? Can't the libxml2
> parser be 'patched' to accept valid enough XHTML?

The HTMLParser does try to be tolerant, but in SAX mode it seems to
break down when the errors are in an open tag near a chunk boundary
(zope serves data in 1024 byte chunks). If you have the C and SAX foo
to fix this, that would be great!

Laurence

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Plone-docs mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-docs