|
|
| Petar Popara 2005-08-17, 7:39 pm |
|
I have some XML inside XML data island:
<xml id="testXML"><?xml version="1.0" encoding="UTF-8"?>...</xml>
When I take it into JScript:
var XML = testXML.DOMDocument.xml;
it losts encoding information:
<?xml version="1.0"?>...
Why? How can I preserve encoding info?
| |
| Martin Honnen 2005-08-17, 7:39 pm |
|
Petar Popara wrote:
> I have some XML inside XML data island:
>
> <xml id="testXML"><?xml version="1.0" encoding="UTF-8"?>...</xml>
>
> When I take it into JScript:
>
> var XML = testXML.DOMDocument.xml;
>
> it losts encoding information:
>
> <?xml version="1.0"?>...
>
> Why?
Using the document.xml property serializes the document to a string
which is usually UTF-16 encoded. So while your original document might
be UTF-8 encoded it is debatable which encoding a string representation
should indicate and there have been problems I think with earlier
versions of MSXML where the encoding was output but then further
processing yielded an error. Thus if you use document.xml then any
encoding information is dropped with current MSXML versions I think.
> How can I preserve encoding info?
If you do not serialize to a string (e.g. use xmlDocument.xml) but use
the save method instead to serialize to a file or stream then the
encoding is preserved.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
| |
| Petar Popara 2005-08-17, 7:39 pm |
|
> Using the document.xml property serializes the document to a string
> which is usually UTF-16 encoded. Thus if you use document.xml then any
> encoding information is dropped with current MSXML versions I think.
I came to the similar conclusion.
> If you do not serialize to a string (e.g. use xmlDocument.xml) but use the
> save method instead to serialize to a file or stream then the encoding is
> preserved.
Can you show me how to serialise into stream? Can I serialise to stream and
then into string (plain text)?
I just want to get XML as plain text with encoding information preserved. :(
| |
| Martin Honnen 2005-08-17, 7:39 pm |
|
Petar Popara wrote:
> Can you show me how to serialise into stream? Can I serialise to stream and
> then into string (plain text)?
You can serialize to an ADODB.Stream with the save method for instance
e.g. here with JScript done in a Windows Script Host script
var xmlDocument = new ActiveXObject('Msxml2.DOMDocument.3.0');
xmlDocument.async = false;
var loaded = xmlDocument.load('test2005081701.xml');
WScript.Echo("xmlDocument.xml: " + xmlDocument.xml);
var adodbStream = new ActiveXObject('Adodb.Stream');
adodbStream.Open();
adodbStream.Type = 1; // binary
xmlDocument.save(adodbStream);
but then the stream contains a sequence of bytes and to decode that as
text you then need to know the encoding so if you knew that the encoding
is UTF-8 for instance then (continueing from above code) allows
adodbStream.Position = 0;
adodbStream.Type = 2;
adodbStream.Charset = 'UTF-8';
var streamAsText = adodbStream.ReadText();
WScript.Echo('Reading stream as UTF-8 yields: ' + streamAsText);
but of course if the original file is not UTF-8 encoded then setting
Charset on the stream and doing ReadText() would not give you the
correct characters or even decoding errors.
So you would (in my understanding that is what XML parsers do) need to
read out the first few bytes to look for the XML declaration and then
parse that for the encoding and use the found encoding as the right
Charset setting for the stream to have the ReadText method properly
decode the byte stream.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
|
|
|
|
| Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |