MySQL Bugs: #7590: Evaluation order in multi table update

Bug #7590	Evaluation order in multi table update
Submitted:	30 Dec 2004 14:51	Modified:	28 Nov 2007 16:02
Reporter:	Kai Ruhnau	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S4 (Feature request)
Version:	4.0.22	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any
Tags:	bfsm_2007_12_06

Description:
Given the following query

UPDATE table1, table2
SET table2.value=table2.value-IF(table1.condition=7,1,0),
    table1.condition=8
WHERE table2.ID_table1=table1.ID

the IF-statement has different values, if I change the join-order of table1 and table2. In fact if I write "UPDATE table1, table2" the IF-condition always evaluates to false, since table1.condition is set to 8 before table1.condition=7 is evaluated.

How to repeat:
Using the following sample-data and the above query, the values in table2 do not change, although table1.condition is 7 in one case.

CREATE TABLE table1 (
  ID int(10) unsigned NOT NULL default '0',
  condition int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;

INSERT INTO table1 VALUES (1,7);
INSERT INTO table1 VALUES (2,8);

CREATE TABLE table2 (
  ID int(10) unsigned NOT NULL default '0',
  ID_table1 int(10) unsigned NOT NULL default '0',
  value int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;

INSERT INTO table2 VALUES (1,1,5);
INSERT INTO table2 VALUES (2,2,2);

Suggested fix:
The IF-condition should be evaluated before any data is changed.

Discussion:

It does not matter if I write

SELECT IF(table1.configion=7,1,0)
FROM table1,table2
WHERE table1.ID=table2.ID_table1

or

SELECT IF(table1.configion=7,1,0)
FROM table2,table1
WHERE table1.ID=table2.ID_table1

The only difference lies in the order the rows are returned.
But within an UPDATE the value of IF silently changes as described above.

As long as this is the case any application which uses multi table updates must be aware of some intermediate state between "data before the update" and "data after the update". But this intermediate state can only be seen if every table update is handled by a seperate query. Thus rendering multi table update useless.

It seems that MySQL doesn't respect row boundaries in its SELECT processing either.

SET @testvalue=5;
Select @testvalue+1 as bump, @testvalue:=19;

results in 6 and 19 but:

SET @testvalue=5;
Select @testvalue:=19, @testvalue+1 as bump ;

results in 19 and 20. 

I think the second calculation should see the value of @testvalue as it was when the statement execution entered that "row" not the value that is the result of the statement acting on that "row". This same behavior is visible whether updating the single virtual "row" of my example or when evaluating multiple rows in a table-based statement.

Why is this considered "correct"? Why should changing the order of columns affect the output of a SQL statement? I honestly believe that a database engine should produce identical results regardless of the table order or the column order WITHIN the statement being executed (outer joins being an exception to that case). I understand that the changes to one row (or the results of processing a row) should be visible during the processing of any subsequent row but not WITHIN the processing boundary of any individual row. 

I always thought that the engine took a "snapshot" of the row and variables being evaluated at the beginning of each row, then committed the changes as it moved to the next row. That way, the values remained consistent for the entire row evaluation. I guess I was mistaken.

verified on 4.0.22 and 4.1.8

Hi!

Here I will address only multi-table updates.

What you see is actually expected behaviour, documented in the manual.

Multi-table updates will in many, many cases, change the order of evaluation , so that it conforms with 1 to 1 with the relationship of the tables in the join.

This is documented in our manual.

Also, multi-table updates do not change any tables on the fly, except in some rare cases. Instead, they first store all offsets and fetched values in a temporary tables and then updates are done. This is due to the limitation imposed in locking and some other issues. 

Let me know if this resolves this bug.

Documenting improper behavior does not make it correct. It just puts us (the developers) on guard that things do not behave according to the standards. In fact things do not even behave as you explain it.

You say:
>>Instead, they first store all offsets and fetched values in a
temporary tables and then updates are done. This is due to the limitation
imposed in locking and some other issues.<<

If that were true then the test cases would have produced identical results. How can the IF() statement be true when it's the first statement evaluated but false if it's the second statement (as in the test UPDATE)? The only logical explanation is that the value of "table1.condition" has changed DURING THE PROCESSING OF THE STATEMENT. If it were being read from a temporary input table (and not the temporary results table) it wouldn't have changed.

This is not as you explained it. If you were right, the value of "table1.condition" would be the same for the calculation of each and every term in the SET clause. That is what at least Kai and I expect to happen during an UPDATE.

More information on my position can be found here: http://lists.mysql.com/internals/20104

In that thread I provide quotes from the SQL-92, SQL-99, and SQL-2003 specifications that seem to support my position. Which specification is used by the MySQL developers as their reference?

Usually SQL:2003, but mutli-table update is not in the standard.

Still, I think the reported behaviour looks like a bug.

I agree. The way the BNF of <update statement: searched> (SQL:2003, section 14.11) is written, it seems to apply to only single-table updates. 

However there are so many comments and definitions applied to <update statement: searched> throughout the rest of the document (I can make a list of them if you want it) that refer to either subtables (hierarchical table definitions), updateable views (which could be based on mulitple tables), and updatable unions that it seems logical to substitute the term "<table reference>" (section 7.6) for the term "<target table>" used in the BNF.

This simple change (extension?) contains, as a subset, the strict (single-table) definition of <update statement: searched> while creating a workable standard for multitable updates.

In my modest opinion, only the select list bug needs to be addressed.

Sinisia:  I guess which statement gets fixed depends on how much code is shared between the SELECT processing routine and the UPATE processing routines. Fixing one may fix the other, but I doubt it.

I am willing to withdraw my request to also "fix" the SELECT statement as I haven't been able to cross-check it with the standards. However, I still believe the UPDATE behavior does deviate from the standards and should be changed to bring it into compliance.

As explained before, multi-table updates and deletes are 100 % non-standard, so there is no standard to comply with.

Also, join part is done entirely in SELECT code, which relies heavily on freely changing the order of tables. That is something that will not be changed.

Updateable VIEWs __are__ in the standard. VIEWs can be composed of multiple tables (also standardized). Therefore, it is a logical that if you want to define a multitable update, you can infer that the update "target" (the JOINed tables) looks exactly like a VIEW composed of multiple tables. That is why I suggested using the term "<table reference>" as a subsitute for "<target table>" in the BNF of "<update statement: searched>".

For example, if we define a view:

CREATE VIEW vwTest1 AS 
SELECT *
FROM table1 
INNER JOIN table2 
    ON table1.id = table2.table1_id;

and Update a row of that view with:
Update vwTest1
SET flagme = '*' 
WHERE subname like 'fred%'

That those actions should be equivalent to:

UPDATE table1 
INNER JOIN table2 
     ON table1.id = table2.table1_id
SET flagme = '*'
WHERE subname like 'fred%';

Even if the spec doesn't explicitly include the case for multitable updates (because of a faulty BNF?) the specification _has_ defined and explained the appropriate actions for a multitable update in the context of updateable VIEWs. I believe you can agree that the multitable update "extension" to SQL-2003 should be at least modeled on the specifications for updateable VIEWs.

Back to topic:
The behavior Kai and I are trying to get fixed is not related to the actions of the JOIN processor but rather how the UPDATE statement code is not caching the results of each row's transformation. 

The specifications clearly state that the "final" values of a row are calculated from the "initial" values as transformed by the list of SET assignments. What the current engine is doing is modifying the set of "initial" values as it progresses through the list of SET transformations in the column order of the target tables. What it should be doing is storing a set of "final" values for that row until all of the columns have been calculated. Only after all calculations are complete should the changes be written back to the tables.

That is why:
UPDATE table1, table2
SET table2.value=table2.value-IF(table1.condition=7,1,0),
    table1.condition=8
WHERE table2.ID_table1=table1.ID

Currently produces a different result than 
UPDATE table2, table1
SET table2.value=table2.value-IF(table1.condition=7,1,0),
    table1.condition=8
WHERE table2.ID_table1=table1.ID
(all I did was list the source tables in a different order)

Those two statements should produce identical results because the value of table1.condition should not "change" until after all of the SET assignements are complete. Inside the IF() statement, table1.condition is used as an "initial" value. The result of "SET table1.condition=8" creates a different "final" value for table1.condition.  The UPDATE processing code is not keeping those values separate until after all of the SET assignments finish.

Shawn, we do agree with you. Notice - the state of the bugreport is "Verified" :)

This new semantics for multi-tables updates compliant with the standard will be implemented after completion of WL #927 "Simultaneous assignments in UPDATE ... SET". 
Formally two problems are weakly relate to each other, but apparently the current semantics of multi-table updates follows the logic of multi-column updates for one table, which not standard compliant either.
Standard compliant semantics is to be introduced in version 5.1.