Bug #36488 regexp returns false matches, concatenating with previous rows
Submitted: 3 May 18:24 Modified: 22 May 14:39
Reporter: Scott Noyes
Status: Closed
Category:Server Severity:S2 (Serious)
Version:5.0.38+ OS:Any
Assigned to: Gleb Shchepa Target Version:5.0+
Triage: D2 (Serious) / R2 (Low) / E2 (Low)

[3 May 18:24] Scott Noyes
Description:
A REGEXP match can return incorrect rows when the previous row matched the expression and
uses CONCAT() with an empty string.

The rows selected include the "Random Random" row, which does not satisfy the regular
expression, as shown in the `regexpMatch` column of the select list.

Bug appears in 5.0.38 and later (including 5.1 and 6.0), but not in 5.0.20a.

How to repeat:
DROP TABLE IF EXISTS `network_crm_contacts`;
CREATE TABLE `network_crm_contacts` (
  `first_name` varchar(100) NOT NULL default '',
  `last_name` varchar(100) NOT NULL default ''
);

INSERT INTO `network_crm_contacts` VALUES 
  ('',''),
  ('','First'),
  ('Random','Random');

SELECT 
  first_name, 
  last_name, 
  CONCAT(first_name,' ',last_name),
  CONCAT(first_name,' ',last_name) REGEXP 'First.*' AS regexpMatch
FROM network_crm_contacts 
WHERE CONCAT(first_name,' ',last_name) REGEXP 'First.*';

Output:

+------------+-----------+----------------------------------+-------------+
| first_name | last_name | CONCAT(first_name,' ',last_name) | regexpMatch |
+------------+-----------+----------------------------------+-------------+
|            | First     |  First                           |           1 |
| Random     | Random    | Random Random                    |           0 |
+------------+-----------+----------------------------------+-------------+

Suggested fix:
Possible workaround is to use LIKE instead of regular expression:

WHERE CONCAT(first_name, ' ', last_name) LIKE '%First%'
[3 May 18:39] Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.0.60:

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot test -P3308
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.60-enterprise-gpl-nt MySQL Enterprise Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE TABLE `network_crm_contacts` (
    ->   `first_name` varchar(100) NOT NULL default '',
    ->   `last_name` varchar(100) NOT NULL default ''
    -> );
Query OK, 0 rows affected (0.28 sec)

mysql> INSERT INTO `network_crm_contacts` VALUES
    ->   ('',''),
    ->   ('','First'),
    ->   ('Random','Random');
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->   first_name,
    ->   last_name,
    ->   CONCAT(first_name,' ',last_name),
    ->   CONCAT(first_name,' ',last_name) REGEXP 'First.*' AS regexpMatch
    -> FROM network_crm_contacts
    -> WHERE CONCAT(first_name,' ',last_name) REGEXP 'First.*';
+------------+-----------+----------------------------------+-------------+
| first_name | last_name | CONCAT(first_name,' ',last_name) | regexpMatch |
+------------+-----------+----------------------------------+-------------+
|            | First     |  First                           |           1 |
| Random     | Random    | Random Random                    |           0 |
+------------+-----------+----------------------------------+-------------+
2 rows in set (0.06 sec)

mysql> SELECT
    ->   first_name,
    ->   last_name,
    ->   CONCAT(first_name,' ',last_name),
    ->   CONCAT(first_name,' ',last_name) REGEXP 'First.*' AS regexpMatch
    -> FROM network_crm_contacts
    -> WHERE CONCAT(first_name,' ',last_name) LIKE '%First%';
+------------+-----------+----------------------------------+-------------+
| first_name | last_name | CONCAT(first_name,' ',last_name) | regexpMatch |
+------------+-----------+----------------------------------+-------------+
|            | First     |  First                           |           1 |
+------------+-----------+----------------------------------+-------------+
1 row in set (0.00 sec)
[12 May 10:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46615

ChangeSet@1.2623, 2008-05-12 13:46:46+05:00, gshchepa@host.loc +3 -0
  Fixed bug #36488: regexp returns false matches, concatenating
                    with previous rows.
  
  The WHERE clause containing expression:
    CONCAT(empty_field1, empty_field2, ..., 'literal constant', ...)
      REGEXP 'regular expression'
  may return wrong matches.
  
  Optimization of the CONCAT function has been fixed.
[13 May 15:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46653

ChangeSet@1.2623, 2008-05-13 18:38:52+05:00, gshchepa@host.loc +3 -0
  Fixed bug #36488: regexp returns false matches, concatenating
                    with previous rows.
  
  The WHERE clause containing expression:
    CONCAT(empty_field1, empty_field2, ..., 'literal constant', ...)
      REGEXP 'regular expression'
  may return wrong matches.
  
  Optimization of the CONCAT function has been fixed.
[13 May 17:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46660

ChangeSet@1.2623, 2008-05-13 20:27:46+05:00, gshchepa@host.loc +3 -0
  Fixed bug #36488: regexp returns false matches, concatenating
                    with previous rows.
  
  The WHERE clause containing expression:
    CONCAT(empty_field1, empty_field2, ..., 'literal constant', ...)
      REGEXP 'regular expression'
  may return wrong matches.
  
  Optimization of the CONCAT function has been fixed.
[19 May 10:24] Bugs System
Pushed into 5.0.64
[19 May 10:25] Bugs System
Pushed into 5.1.25-rc
[20 May 2:35] Paul DuBois
Noted in 5.0.64, 5.1.25 changelogs.

A REGEXP match could return incorrect rows when the previous row
matched the expression and used CONCAT() with an empty string.

Setting report to Need Doc Info pending push into 6.0.x.
[22 May 11:49] Bugs System
Pushed into 6.0.6-alpha
[22 May 14:39] Paul DuBois
Noted in 6.0.6 changelog.
[28 Jul 15:51] Bugs System
Pushed into 5.0.62  (revid:sp1r-gshchepa/uchum@host.loc-20080516080012-23021) (pib:2)
(Retry automatic marking, to ensure nothing is missed. cm01)
[28 Jul 18:56] Bugs System
Pushed into 5.1.25-rc  (revid:sp1r-pcrews@pcrews-mac-local.local-20080516105248-04313)
(version source revid:sp1r-pcrews@pcrews-mac-local.local-20080516105248-04313) (pib:3)