Bug #25701 | XML: ExtractValue() using wrong context to calculate XPath position() | ||
---|---|---|---|
Submitted: | 18 Jan 2007 18:59 | Modified: | 4 Oct 2007 17:13 |
Reporter: | Travis Haagen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: XML functions | Severity: | S3 (Non-critical) |
Version: | 5.1.15-BK | OS: | Linux (Linux, Win32) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
Tags: | ExtractValue, position, XML, XPath |
[18 Jan 2007 18:59]
Travis Haagen
[23 Jan 2007 13:33]
Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.1.15-BK on Linux: openxs@suse:~/dbs/5.1> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.15-beta Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> select extractvalue('<a><b>one</b><b>two</b></a><a><b>three</b></a>', '//b[1]') as result; +-----------+ | result | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> select extractvalue('<a><b>one</b><b>two</b></a><a><b>three</b></a>', '//b[position()=1]') as result; +-----------+ | result | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> select extractvalue('<a><b>one</b><b>two</b></a><a><b>three</b></a>', '//b[2]') as result; +--------+ | result | +--------+ | two | +--------+ 1 row in set (0.00 sec) mysql> select extractvalue('<a><b>one</b><b>two</b></a><a><b>three</b></a>', '//b[3]') as result; +--------+ | result | +--------+ | | +--------+ 1 row in set (0.00 sec) mysql> select extractvalue('<a><b>one</b><b>two</b></a><a><b>three</b></a>', '//descendant-or-self::b[3]') as result; +--------+ | result | +--------+ | three | +--------+ 1 row in set (0.00 sec)
[4 Sep 2007 17:10]
Alexander Barkov
Another problem is that XPath accepts a not-well formed XML. A well-formed XML must have exactly one root element, while the value in the example have two root elements "a" and actually consists of a concatenation of two XML values: "<a><b>one</b><b>two</b></a>" and "<a><b>three</b></a>". However, the same problem shows up with a well-formed value as well: mysql> select extractvalue('<root><a><b>one</b><b>two</b></a><a><b>three</b></a></root>','//b[1]') as result; +-----------+ | result | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> select extractvalue('<root><a><b>one</b><b>two</b></a><a><b>three</b></a></root>','/descendant-or-self::b[1]') as result; +--------+ | result | +--------+ | one | +--------+ 1 row in set (0.00 sec)
[3 Oct 2007 11:05]
Alexander Barkov
It seems this is not really a bug. Take a look here: http://www.w3.org/TR/xpath > // is short for /descendant-or-self::node()/. For example, //para is short for > /descendant-or-self::node()/child::para and so will select any para element in > the document (even a para element that is a document element will be selected > by //para since the document element node is a child of the root node); > div//para is short for div/descendant-or-self::node()/child::para and so will > select all para descendants of div children. > NOTE: The location path //para[1] does not mean the same as the location path > /descendant::para[1]. The latter selects the first descendant para element; > the former selects all descendant para elements that are the first para > children of their parents. So, //b[1] returns all descendant "b" elements that are the first "b" children of their parents. They are "one" and "three". Isn't it correct?
[3 Oct 2007 18:35]
Travis Haagen
You may be correct that this is expected behavior, but the XPath specification is counterintuitive when it states, "// is short for /descendant-or-self::node()/". One would assume that if they mean the same thing that the position predicate would work the same. Thank you for thoroughly investigating this issue. I did find an example of how you can achieve the result of //b[1] using 'descendant-or-self' or 'descendant'. Still, in my opinion, this should be explained in more detail in the XPath specification. mysql> SELECT ExtractValue( -> "<a><b>one</b><b>two</b></a><a><b>three</b></a>", -> "//b[1]") AS RESULT; +-----------+ | RESULT | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue( -> "<a><b>one</b><b>two</b></a><a><b>three</b></a>", -> "/descendant-or-self::*/b[1]") AS RESULT; +-----------+ | RESULT | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue( -> "<a><b>one</b><b>two</b></a><a><b>three</b></a>", -> "/descendant::*/b[1]") AS RESULT; +-----------+ | RESULT | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec)
[4 Oct 2007 6:34]
Alexander Barkov
I tried a similar script with Oracle 10g: drop table t1; create table t1 (x xmltype); insert into t1 values (xmltype.createxml('<root><a><b>one</b><b>two</b></a><a><b >three</b></a></root>')); select Extract(x, '//b[1]') from t1; select Extract(x, '//b[2]') from t1; select Extract(x, '/descendant-or-self::*/b[1]') from t1; select Extract(x, '/descendant-or-self::*/b[2]') from t1; select Extract(x, '/descendant-or-self::b[1]') from t1; select Extract(x, '/descendant-or-self::b[2]') from t1; select Extract(x, '/descendant-or-self::b[3]') from t1; And the answer was: SQL> EXTRACT(X,'//B[1]') -------------------------------------------------------------------------------- <b>one</b><b>three</b> SQL> EXTRACT(X,'//B[2]') -------------------------------------------------------------------------------- <b>two</b> SQL> EXTRACT(X,'/DESCENDANT-OR-SELF::*/B[1]') -------------------------------------------------------------------------------- <b>one</b><b>three</b> SQL> EXTRACT(X,'/DESCENDANT-OR-SELF::*/B[2]') -------------------------------------------------------------------------------- <b>two</b> SQL> EXTRACT(X,'/DESCENDANT-OR-SELF::B[1]') -------------------------------------------------------------------------------- <b>one</b> SQL> EXTRACT(X,'/DESCENDANT-OR-SELF::B[2]') -------------------------------------------------------------------------------- <b>two</b> SQL> EXTRACT(X,'/DESCENDANT-OR-SELF::B[3]') -------------------------------------------------------------------------------- <b>three</b> I.e. both '//b[1]' and '/descendant-or-self::*/b[1]' use positions from the original contexts. While '/descendant-or-self::b[1]' re-enumerates the nodes consequently, starting from 1. MySQL works the same: mysql> select ExtractValue('<root><a><b>one</b><b>two</b></a><a><b>three</b></a></root>','//b[1]') as x; +-----------+ | x | +-----------+ | one three | +-----------+ 1 row in set (0.01 sec) mysql> select ExtractValue('<root><a><b>one</b><b>two</b></a><a><b>three</b></a></root>','/descendant-or-self::*/b[1]') as x; +-----------+ | x | +-----------+ | one three | +-----------+ 1 row in set (0.00 sec) mysql> select ExtractValue('<root><a><b>one</b><b>two</b></a><a><b>three</b></a></root>','/descendant-or-self::b[3]') as x; +-------+ | x | +-------+ | three | +-------+ 1 row in set (0.00 sec) So '//b[1]' is in fact equal to '/descendant-or-self::*/b[1]'. But '/descendant-or-self::b[1]' is something different.
[4 Oct 2007 17:13]
Travis Haagen
This appears to not be a bug and I am closing it.