Bug #38227 EXTRACTVALUE doesn't work with DTD declarations
Submitted: 18 Jul 2008 16:03 Modified: 20 Jan 2009 22:05
Reporter: Harald Groven Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XML functions Severity:S3 (Non-critical)
Version:5.1.15+, 5.1.26 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: xpath extractvalue xml dom

[18 Jul 2008 16:03] Harald Groven
Description:
If I try to extract data from a record containing Document declarations, the DTD decl seems to be treated as regular XML code, leading to erroneous result. 

How to repeat:

CREATE TABLE IF NOT EXISTS `xpathtest` (
  `pagedesc` varchar(200) NOT NULL,
  `htmlpage` longtext NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 
;

INSERT INTO `xpathtest` (`pagedesc`, `htmlpage`) VALUES
('html without document decl', '<html> 
 <head>
  <title> Title - document without document declaration</title>
 </head> 
  <body> Hi, Im a webpage without document a declaration </body> 
</html>'),

('html with document decl', '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html> 
 <head>
  <title> Title - document with document declaration</title>
 </head> 
  <body> Hi, Im a webpage with document a declaration </body> 
</html>'
) ;

Then run the following Xpath query: 

SELECT 
`pagedesc`, 
EXTRACTVALUE(`htmlpage`, 'html/head/title'), 
EXTRACTVALUE(`htmlpage`, 'html/body')
FROM xpathtest

Output: 

html without document decl 	Title - document without document declaration 	Hi, Im a webpage without document a declaration

html with document decl 	NULL 	NULL

Suggested fix:
Make the EXTRACTVALUE function aware of DTD declarations, if this is not by design.
[19 Jul 2008 11:58] Valeriy Kravchuk
Thank you for a problem report. Verified just as described with 5.1.26.
[4 Dec 2008 11:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60585

2727 Alexander Barkov	2008-12-04
      Bug#38227 EXTRACTVALUE doesn't work with DTD declarations
      Problem:
      XML syntax parser allowed to use quoted strings as attribute names,
      and tried to put them into parser state stack instead of identifiers.
      After that parser failed, if quoted string contained some slash characters.
      Fix:
      - Disallowing quoted strings in regular tags.
      - Allowing quoted string in DOCTYPE declararion, but
      don't push it into parse state stack (just skip it).
[10 Dec 2008 9:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61160

2725 Alexander Barkov	2008-12-10
      Bug#38227 EXTRACTVALUE doesn't work with DTD declarations
      Problem:
       XML syntax parser allowed to use quoted strings as attribute names,
       and tried to put them into parser state stack instead of identifiers.
       After that parser failed, if quoted string contained some slash characters.
      Fix:
       - Disallowing quoted strings in regular tags.
       - Allowing quoted string in DOCTYPE declararion, but
       don't push it into parse state stack (just skip it).
[10 Dec 2008 10:21] Alexander Barkov
pushed into 5.1.31-bugteam
pushed into 6.0.9-bugteam
[15 Jan 2009 6:37] Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[15 Jan 2009 15:34] Jon Stephens
Document in the 5.1.31 changelog as follows:

        The ExtractValue() function did not work correctly with XML
        documents containing a DOCTYPE declaration.

Set status back to PQ while waiting for push to 6.0-main.
[15 Jan 2009 18:29] Jon Stephens
Status should have been set to NDI.
[19 Jan 2009 11:33] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:10] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 14:05] Jon Stephens
Set status back to NDI pending merge to 6.0 tree.
[19 Jan 2009 16:15] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[19 Jan 2009 17:08] Jon Stephens
Set back to NDI pending merge to 6.0.
[20 Jan 2009 19:01] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[20 Jan 2009 22:05] Jon Stephens
Fix also documented in the 6.0.10 changelog; closed.
[16 Sep 2010 16:16] y d
testWithDoctype.xml -- Sample xml file from S1000D Organization

Attachment: testWithDoctype.xml (text/xml), 7.72 KiB.

[16 Sep 2010 16:17] y d
I re-tested this bug on 6.0.10 and 6.0.11 with the above xml sample with DOCTYPE,but failed. If I remove the 'DOCTYPE',it is ok.
select extractvalue(contentXML,'//techName') from dm;
I get the result 'NULL' for the above query. So I think this bug still exists.
I will appreciate it if you can email your verified result. My email is dydyhb@yeah.net