Bug #113778 Encoding problem in xpath comparision
Submitted: 26 Jan 15:40 Modified: 28 Aug 16:56
Reporter: Stefano Valsecchi Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XML functions Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[26 Jan 15:40] Stefano Valsecchi
Description:
Look like there is some encoding problem with the xpath in the EXTRACTVALUE function.

When the xpath contains a condition that involve a comparison with accented characters, the condition don't match also if the xml contains a matching path.

Please note that the problem has been introduced in version 8.
In MySql 5.7 the bug is not present and EXTRACTVALUE works correctly. 

How to repeat:
The two query below show the problem. The only difference between the two queries are the accent on the 'e' chars.
The first query behave correctly, while the second actually reproduce the problem.

My expectation is that both query return the same result ("AA")

mysql> SELECT EXTRACTVALUE('<A><B>ee</B><C>AA</C></A>', '/A[B = "ee"]/C');
+-------------------------------------------------------------+
| EXTRACTVALUE('<A><B>ee</B><C>AA</C></A>', '/A[B = "ee"]/C') |
+-------------------------------------------------------------+
| AA                                                          |
+-------------------------------------------------------------+
1 row in set (0.05 sec)

mysql> SELECT EXTRACTVALUE('<A><B>èè</B><C>AA</C></A>', '/A[B = "èè"]/C');
+-----------------------------------------------------------------+
| EXTRACTVALUE('<A><B>èè</B><C>AA</C></A>', '/A[B = "èè"]/C')     |
+-----------------------------------------------------------------+
|                                                                 |
+-----------------------------------------------------------------+
1 row in set (0.05 sec)
[26 Jan 16:02] Stefano Valsecchi
Bug verified also on x86 architecture (db.m6i.large instance on AWS RDS)
[29 Jan 11:05] MySQL Verification Team
Hi Mr. Valsecchi,

Thank you for your bug report.

We have managed to repeat the problem with 8.0.36:

mysql> SELECT EXTRACTVALUE('<A><B>ee</B><C>AA</C></A>', '/A[B = "ee"]/C');
+-------------------------------------------------------------+
| EXTRACTVALUE('<A><B>ee</B><C>AA</C></A>', '/A[B = "ee"]/C') |
+-------------------------------------------------------------+
| AA                                                          |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT EXTRACTVALUE('<A><B>èè</B><C>AA</C></A>', '/A[B = "èè"]/C');
+-----------------------------------------------------------------+
| EXTRACTVALUE('<A><B>èè</B><C>AA</C></A>', '/A[B = "èè"]/C')     |
+-----------------------------------------------------------------+
|                                                                 |
+-----------------------------------------------------------------+
1 row in set (0.00 sec)

This is now a verified bug report.
[28 Aug 16:56] Jon Stephens
Documented fix as follows in the MySQL 8.4.3 and 9.1.0 changelogs:

    ExtractValue() did not handle arguments containing accented
    characters correctly.

Closed.
[16 Sep 9:36] MySQL Verification Team
Thank you, Jon.