This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Microsoft XML > September 2006 > XPath selection inconsistencies





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author XPath selection inconsistencies
J

2006-08-31, 10:57 pm

Has anyone seen something like this before?

I am using MSXML 4 with VB6. I have XML similar in structure to:

<xml>
<a>
...
<b>
...
<c>
...
<d>1234</d>
</c>
</b>
</a>
<a>
...
</a>
....
</xml>

And attempt to select a node with an XPath query of the form (I'll call
it 'form 1'):
"a/b[c/d='1234']"

When the XML has been freshly loaded into the DOMDocument40 (from disk)
this query executes fine. However, when certain processes, which remove
nodes and replace them with exactly the same structure, but slightly
different values (ie, "...<d>5678</d>..."), are performed first, the
query (form 1, with updated values) returns without finding a node.

A reasonable workaround has been found, by simply changing the XPath
query to what I will call 'form 2':
"a/b[c/d[text()='5678']]"

I am aware of the semantic differences of the two forms- that is, I
presume that I know all of them: I know that the text() function will
refer to the textnode(s?) that is an immediate child of the current
context (in this case, the 'd' element), while the other (form1) will
be a combination of the textnodes of the current context, and
(recursively) all its child elements.
These differences should not be an issue with the XML in question,
because the 'd' node will never have anything more than a single
textnode (no elements) child.

So I am looking for the reason that these two XPath forms are behaving
differently. Because this behavior has been found in one situation in
a large enterprise application, I need to know if there is going to be
risk of failure in the large number of other 'Form 1' XPath queries.

Unfortunately, I have been unable to reproduce the behavior on a
smaller scale (the a/b/c/d example, for instance), and there is an
overwhelming amount of code (the 'certain processes' mentioned above)
that could be creating the MSXML state that causes Form1 to fail.

So I am looking for brainstorms. Perhaps this will ring a bell with
someone with more experience with MSXML than I have.

Has anyone seen anything similar before?

Thanks,

J

Anthony Jones

2006-09-01, 6:35 am


"J" <jason.jones@symyx.com> wrote in message
news:1157068267.861452.252640@m79g2000cwm.googlegroups.com...
> Has anyone seen something like this before?
>
> I am using MSXML 4 with VB6. I have XML similar in structure to:
>
> <xml>
> <a>
> ...
> <b>
> ...
> <c>
> ...
> <d>1234</d>
> </c>
> </b>
> </a>
> <a>
> ...
> </a>
> ...
> </xml>
>
> And attempt to select a node with an XPath query of the form (I'll call
> it 'form 1'):
> "a/b[c/d='1234']"
>
> When the XML has been freshly loaded into the DOMDocument40 (from disk)
> this query executes fine. However, when certain processes, which remove
> nodes and replace them with exactly the same structure, but slightly
> different values (ie, "...<d>5678</d>..."), are performed first, the
> query (form 1, with updated values) returns without finding a node.
>
> A reasonable workaround has been found, by simply changing the XPath
> query to what I will call 'form 2':
> "a/b[c/d[text()='5678']]"
>
> I am aware of the semantic differences of the two forms- that is, I
> presume that I know all of them: I know that the text() function will
> refer to the textnode(s?) that is an immediate child of the current
> context (in this case, the 'd' element), while the other (form1) will
> be a combination of the textnodes of the current context, and
> (recursively) all its child elements.
> These differences should not be an issue with the XML in question,
> because the 'd' node will never have anything more than a single
> textnode (no elements) child.
>
> So I am looking for the reason that these two XPath forms are behaving
> differently. Because this behavior has been found in one situation in
> a large enterprise application, I need to know if there is going to be
> risk of failure in the large number of other 'Form 1' XPath queries.
>
> Unfortunately, I have been unable to reproduce the behavior on a
> smaller scale (the a/b/c/d example, for instance), and there is an
> overwhelming amount of code (the 'certain processes' mentioned above)
> that could be creating the MSXML state that causes Form1 to fail.
>
> So I am looking for brainstorms. Perhaps this will ring a bell with
> someone with more experience with MSXML than I have.
>
> Has anyone seen anything similar before?
>
> Thanks,
>


Can you post some VB6 code for a repro?

> J
>



J

2006-09-01, 6:27 pm

>
> Can you post some VB6 code for a repro?
>


Unfortunately, I can't, really. The code that is experiencing the
error is essentially something like:

private function foo()
Dim doc as DOMDocument40
Dim nl As IXMLDOMNodeList

Set doc = new DOMDocument40
doc.LoadXML(str); 'str: Use the XML text above

Set doc = AddAndRemoveNodes(doc)

'The symptom appears here:
'This returns no nodes
Set nl = doc.selectNodes("a/b[c/d='5678']")

'This will return exactly what we expect
Set nl = doc.selectNodes("a/b[c/d[text()='5678']]")
end function


The problem, of course is in the AddAndRemoveNodes function, which
really is just a pseudocode abstraction for more code than I can
reasonably produce for you here.

What I can tell you, is that after the AddAndRemoveNodes routine, if I
save the DOM to disk, and reload it into the doc, both XPath selections
will work identically ('correctly'). It is when I use the DOM in the
state it is in after that routine that I have trouble.

A coworker has suggested that the implementation of the DOM may be
using some internal indexing to speed up the XPath selection, which
becomes corrupted. Under this theory, perhaps a predicate of the form
"[c/d='...']" uses that internal (presumably corrupted) indexing, and
produces an incorrect result (in this case, no result at all). Perhaps
then, the text() function, for whatever reason, must bypass that
indexing, performing the expected recursive search, and producing the
correct result.

This all sounds reasonable to me. But it is all conjecture. I have
been unable to find any documentation of this sort of behavior, and
was hoping someone else had...

Thanks for any help,

J

Joe Fawcett

2006-09-02, 6:27 am

I have seen something similar happen when the new nodes are added from a
separate document without using cloneNode or importNode.

--

Joe Fawcett - XML MVP


http://joe.fawcett.name

"J" <jason.jones@symyx.com> wrote in message
news:1157130403.995886.123150@h48g2000cwc.googlegroups.com...
>
> Unfortunately, I can't, really. The code that is experiencing the
> error is essentially something like:
>
> private function foo()
> Dim doc as DOMDocument40
> Dim nl As IXMLDOMNodeList
>
> Set doc = new DOMDocument40
> doc.LoadXML(str); 'str: Use the XML text above
>
> Set doc = AddAndRemoveNodes(doc)
>
> 'The symptom appears here:
> 'This returns no nodes
> Set nl = doc.selectNodes("a/b[c/d='5678']")
>
> 'This will return exactly what we expect
> Set nl = doc.selectNodes("a/b[c/d[text()='5678']]")
> end function
>
>
> The problem, of course is in the AddAndRemoveNodes function, which
> really is just a pseudocode abstraction for more code than I can
> reasonably produce for you here.
>
> What I can tell you, is that after the AddAndRemoveNodes routine, if I
> save the DOM to disk, and reload it into the doc, both XPath selections
> will work identically ('correctly'). It is when I use the DOM in the
> state it is in after that routine that I have trouble.
>
> A coworker has suggested that the implementation of the DOM may be
> using some internal indexing to speed up the XPath selection, which
> becomes corrupted. Under this theory, perhaps a predicate of the form
> "[c/d='...']" uses that internal (presumably corrupted) indexing, and
> produces an incorrect result (in this case, no result at all). Perhaps
> then, the text() function, for whatever reason, must bypass that
> indexing, performing the expected recursive search, and producing the
> correct result.
>
> This all sounds reasonable to me. But it is all conjecture. I have
> been unable to find any documentation of this sort of behavior, and
> was hoping someone else had...
>
> Thanks for any help,
>
> J
>



J

2006-09-05, 6:58 pm


Hmm. Thanks. That is a good suggestion. I will have to check and see
if that is something that is happening anywhere along the way.

Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews