MySQL Bugs: #27299: Multi-table UPDATE does not work properly (IMHO)

Bug #27299	Multi-table UPDATE does not work properly (IMHO)
Submitted:	20 Mar 2007 18:20	Modified:	5 Dec 2007 18:55
Reporter:	James Goatcher	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S3 (Non-critical)
Version:	5.0.40-BK, 5.0.19	OS:	Linux (Linux 2.6.9-34.ELsmp)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:

The multi-table UPDATE does not work the same way as a correlated subquery UPDATE.  In my opinion, it should.  Please see "How to repeat" below.

How to repeat:

drop table if exists test_1;
drop table if exists test_2;

CREATE TABLE test_1 (
col_pk integer NOT NULL,
col_test integer
);

alter table test_1 add PRIMARY KEY (col_pk);

CREATE TABLE test_2 (
col_pk_join integer NOT NULL,
col_test_new integer
);

insert into test_1 (col_pk, col_test) values ( 1, null );
insert into test_1 (col_pk, col_test) values ( 2, null );
commit;

insert into test_2 (col_pk_join, col_test_new) values ( 1, 23 );
insert into test_2 (col_pk_join, col_test_new) values ( 1, 34 );
insert into test_2 (col_pk_join, col_test_new) values ( 2, 45 );
commit;

select * from test_1;
select * from test_2;

# This update should NOT work, but it does.
UPDATE test_1 t,
test_2 tmp
set t.col_test = tmp.col_test_new
where t.col_pk = tmp.col_pk_join;
commit;

select * from test_1;
======================================================

The output of the select and update statements is:

+--------+----------+
| col_pk | col_test |
+--------+----------+
|      1 | NULL     |
|      2 | NULL     |
+--------+----------+

2 rows in set

+-------------+--------------+
| col_pk_join | col_test_new |
+-------------+--------------+
|           1 |           23 |
|           1 |           34 |
|           2 |           45 |
+-------------+--------------+

3 rows in set

Query OK, 2 rows affected
Rows matched: 2 Changed: 2 Warnings: 0

Query OK, 0 rows affected

+--------+----------+
| col_pk | col_test |
+--------+----------+
|      1 |       23 |
|      2 |       45 |
+--------+----------+

2 rows in set

Note that the update did NOT produce any errors or warnings. It should have. Why? Because a join on value 1 produces two values from table test_2. What MySQL does in this case is use the first value and ignore the second value. This is bad in my opinion because it is, in essence, putting incorrect data into table test_1.

Replace the update statement above with:

UPDATE test_1 t1
set t1.col_test = (
select col_test_new
from test_2 t2
where t1.col_pk = t2.col_pk_join
)
;

This will produce the appropriate error for the given data:
"ERROR 1242 : Subquery returns more than 1 row"
and will not perform any update at all, which is good (it protects table test_1 from getting bad data).

Now if you have different data........if you comment out one of the "1" values inserted into table test_2 and use the correlated subquery update instead of the multi-table update, table test_1 will get updated with exactly what you expect.

Suggested fix:

Make the multi-table UPDATE perform the same kind of check that the correlated subquery does.  If the join produces more than one row (value) for a given destination (a row-column junction), then the update should fail.

Thank you for a problem report. Please, read the manual (http://dev.mysql.com/doc/refman/5.0/en/update.html):

"For the multiple-table syntax, UPDATE updates rows in each table named in table_references that satisfy the conditions."

So, unlike UPDATE with correlated subquery, multi-table UPDATE updates several tables. By definition. Hence no error message.

I understand what the manual says, and I am fine with the fact that multiple tables can get updated with the multi-table syntax, but I disagree with the result.  The point is that the updated result is *wrong*.  Just because the join condition is satisfied does not mean that the destination table has correctly updated data.  In fact it doesn't.  My script shows this.

I think you should reopen this report and fix the software.  At least provide a warning.

Thank you.

I confirm the behaviour you described:

openxs@suse:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.40 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> drop table if exists test_1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> drop table if exists test_2;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE test_1 (
    -> col_pk integer NOT NULL,
    -> col_test integer
    -> );

Query OK, 0 rows affected (0.02 sec)

mysql> alter table test_1 add PRIMARY KEY (col_pk);

Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE test_2 (
    -> col_pk_join integer NOT NULL,
    -> col_test_new integer
    -> );

Query OK, 0 rows affected (0.01 sec)

mysql> insert into test_1 (col_pk, col_test) values ( 1, null );
iQuery OK, 1 row affected (0.00 sec)

mysql> insert into test_1 (col_pk, col_test) values ( 2, null );
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test_2 (col_pk_join, col_test_new) values ( 1, 23 );
Query OK, 1 row affected (0.00 sec)

mysql> insert into test_2 (col_pk_join, col_test_new) values ( 1, 34 );
Query OK, 1 row affected (0.00 sec)

mysql> insert into test_2 (col_pk_join, col_test_new) values ( 2, 45 );
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE test_1 t,
    -> test_2 tmp
    -> set t.col_test = tmp.col_test_new
    -> where t.col_pk = tmp.col_pk_join;
Query OK, 2 rows affected (0.01 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> select * from test_1;
+--------+----------+
| col_pk | col_test |
+--------+----------+
|      1 |       23 |
|      2 |       45 |
+--------+----------+
2 rows in set (0.00 sec)

And I agree that your request to give error message in cases like this is reasonable. But I am almost sure that current behaviour is intended (value is taken from the first row found that satisfies join condition, and further search is just stopped).