This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Web Authoring Tools > April 2007 > Any tool to check against missing semicolons in entity and character references?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Any tool to check against missing semicolons in entity and character references?
|
|
| Jukka K. Korpela 2007-04-01, 6:21 pm |
| According to classic HTML (nominally, SGML-based) rules, a semicolon
(reference close) is optional in entity and character references, when the
reference is not immediately followed by a name character. Browsers haven't
had problems with this.
However, IE 7 is absurdly picky: it refuses to recognize
a) an entity reference referring to a character outside ISO Latin 1
b) a hexadecimal character reference
whenever it does not contain the trailing semicolon. Thus,
&rarr
and
→
are displayed literally (whereas é and → are OK).
Is there any practically useful checker that issues a warning about such
constructs, or about any entity or character reference not terminated by a
semicolon? As far as I can see, no.
Henri Sivonen's checker http://hsivonen.iki.fi/validator/ is closest to what
I mean, but not very close: it detects any reference without semicolon but
it only reports the first problem and then terminates, which isn't very
practical.
Of course, switching to XHTML and using a validator would solve this
problem - and create many others.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
| |
| Chris Morris 2007-04-01, 6:21 pm |
| "Jukka K. Korpela" <jkorpela@cs.tut.fi> writes:
> Is there any practically useful checker that issues a warning about
> such constructs, or about any entity or character reference not
> terminated by a semicolon? As far as I can see, no.
Simple enough to write one, though.
http://compsoc.dur.ac.uk/~cim/EntityChecker.cgi
Currently it's check by file upload only, with a 50kb limit. It
wouldn't be difficult to add a "check by URL" feature, or to convert
it to a command-line tool, if either would be more useful.
It gave the expected results on a couple of test documents that I fed
it, though it currently doesn't support input encodings other than
UTF-8 (non-multibyte encodings aren't likely to do anything worse than
a wrong column number, though)
--
Chris
| |
| Jukka K. Korpela 2007-04-01, 6:21 pm |
| Scripsit Chris Morris:
> "Jukka K. Korpela" <jkorpela@cs.tut.fi> writes:
>
> Simple enough to write one, though.
I also realized, after posting my message, that the check is fairly easy.
The problem is how to make it included into popular checkers, I guess. And
actually it's easier to _fix_ the problematic constructs rather than just
check for them: it can be done basically with the following PERL one-liner:
while(<> ) { s/(\&[#]?[0-9a-zA-Z]+);?/$1;/g; print; }
(This modifies some malformed constructs as well, but the above should cover
all character references and all entity references defined in HTML 4.01.)
> http://compsoc.dur.ac.uk/~cim/EntityChecker.cgi
It doesn't seem to check for hex character references like – being
terminated by a semicolon.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|