Bug #18134 | XML: ExtractValue() should distinguish between absent and empty elements | ||
---|---|---|---|
Submitted: | 10 Mar 2006 13:19 | Modified: | 19 May 2006 7:46 |
Reporter: | Roland Bouman | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S4 (Feature request) |
Version: | 5.1.7 beta | OS: | Windows (winxp pro) |
Assigned to: | Jon Stephens | CPU Architecture: | Any |
[10 Mar 2006 13:19]
Roland Bouman
[13 Mar 2006 22:52]
Roland Bouman
changed the title
[18 May 2006 1:08]
Jon Stephens
(Following based on recent email discussion with Bar, Reggie, PeterG, et al.) According to http://www.w3.org/TR/xpath#function-string - "A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned." It doesn't say anything about showing *why* the node-set was empty. You'll have to use count() or some other means (e.g. REGEXP, SUBSTRING) to determine the reason. I've changed the category to Docs and assigned to myself. TODO: Update the ExtractValue() function description to make it clear that, since this is what the XPath spec says, this is what we do as well. (And show suggested solution to problem.)
[18 May 2006 7:10]
Roland Bouman
John, the link you are providing points to the description of the XPath string() function. Naturally, if my example would read something like: ExtractValue('<tag>text</tag>','string(*[2]/text())') the result should be an empty string. However, my sample code does not use the string function. This is what we have: XML: <tag>text</tag> XPath: *[2]/text() the result should be an empty nodeset according to the spec (http://www.w3.org/TR/xpath#node-tests), and IE and Mozilla seem to agree on this (at least in the context of xslt - files attached, open the xml in the browser). Now, I know that in MySQL the ExtractValue() function always returns the result as a (MySQL) character type - there just isn't a datatype that could pass as a true nodeset. So it's clear that some conversion must take place. However, I do NOT see why that final conversion to a character type should be performed according to the XPath string() function. There is no standard that compels MySQL to do so, because there is no standard that compels you to do the final conversion to a character type in the first place. From that perspective, I think it is a mistake to choose the semantics of the XPath string() function to implement the final conversion, as it makes it much less convenient to distinguish between XPaths that evaluate to an empty node set and XPaths that genuinly evaluate to the empty string. If you still think that this is only a documentation issue, then please explicitly describe that the final conversion of nodesets to strings has the semantics of the XPath string() function, so that it is as if each XPath passed to ExtractValue has a string({actual argument XPath expression}) implicitly wrapped around it. Hope I'm making my point clear
[18 May 2006 7:11]
Roland Bouman
XML file containing the expression
Attachment: test.xml (text/plain), 87 bytes.
[18 May 2006 7:12]
Roland Bouman
xslt accompanying the xm. Open the xml in Internet explorer or Mozilla or firefox
Attachment: test.xslt (text/plain), 597 bytes.
[18 May 2006 10:06]
Jon Stephens
Hi Roland, NULL is not appropriate here, since in SQL it means "unknown", and this is not the same as saying that we know there's no match. We may not know *why* there's no match, but the fact that there isn't one is not in doubt. ExtractValue() can be thought of as working by appending a call to text() to the XPath expression passed to it as an argument. Thus, ExtractValue(@xml, '/a/b') produces the same result as ExtractValue(@xml, '/a/b/text()'). If you test /a/b/text() against either <a><b/></a> or <a><c>5</c></a> the result is the same - an empty nodeset - and in both cases we represent this as an empty string, meaning "none".
[18 May 2006 14:04]
Roland Bouman
John, thanks for taking the time to elaborate. Your argument concerning SQL null semantics is understandable, but I think not entirely correct. I've only got a draft of the 2003 standard, but that sais: ..the null value is neither equal to any other value nor not equal to any other value — it is unknown whether or not it is equal to any given value.. (ISO/IEC 9075-1:2003 (E), p 15) and also: "A value of the Boolean data type is either true or false. The truth value of unknown is sometimes represented by the null value." (ISO/IEC 9075-1:2003 (E), p 16) So only in a boolean context can it mean "unknown" in a literal sense. I don't think the boolean context applies here, so it must mean "..unknown whether or not it is equal to any given value..." Now, I do agree that to know that there was no match means a lot more than that it is unknown wheter it is equal to a given value: we know for sure that there is _no value at all_ . There is of course also another interpretation of SQL null that claims that NULL actually means "missing value" or "not applicable". To me it seems that this interpretation could quite appropriately represent the empty nodeset given that ExtractValue() tries to extract the *value*. Anyway, I feel that both interpretations of NULL are valid, and that means that if MySQL sticks with the first interpretation, it is not appropriate to return the NULL (no matter how convenient it may seem to at least me). As for your explanation of the text-conversion behaviour of ExtractValue, I do not want to offend you, but I think that it is incorrect. Consider this: XML: <tag><b/></tag> XPath: count(//b) Suppose we stick to your interpretation, ExtractValue would actually execute this: count(//b)/text() Well, both IE and FireFox implementations regard this as invalid. Although I cannot find anything in the XPath spec that sais they should regard it as invalid, we can both see that this XPath expression could _never_ return the number of b elements. That's because count() evaluates to a number. /text() finds text *nodes*, and since count does not return a nodeset, it should find none here. On the other hand, if we assume that ExtractValue() actually does something like this: string(count(//b)) The result will check out. (Of course, neither the string() nor the text() explanation adequately describe the exact behaviour of ExtractValue in case the xpath expression returns a nodeset containing multiple nodes. ExtractValue returns a space separated list of all text values which is at least not compliant with the XPath 1.0 spec according to which only the first value should be returned)
[19 May 2006 6:10]
Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant product(s). Additional info: 1. "NULL is not equal (or unequal) to any known value" -> stands for UNKNOWN. 2. If we don't know whether a Boolean value is true or false, then we may use NULL to indicate that this is the case. This does *not* mean that NULL is restricted to use in a Boolean context. As for how ExtractValue() handles multiple matches, that's a separate issue and not really relevant to this bug. However, there are processors that handle multiple matches in this manner, so such behaviour is not without precedent - and this already indicated in the XML Functions section of the 5.1 Manual). --- Following a lengthy discussion with PeterG, Bar, and Reggie, I've updated the ExtractValue() function description to indicate that the fact that we do not distinguish between "no matching element" and "matching element contains no child text nodes" is intentional. I've also provided an example (verified in 5.1.11-bk) showing how to discover which is the case.
[19 May 2006 7:47]
Roland Bouman
John, thanks for taking the time. The only reason for me to mention the multiple value case is to show that it is not really possible to describe how ExtractValue converts from nodeset to MySQL character type - at least not using XPath terminology.