Bug #7590 | Evaluation order in multi table update | ||
---|---|---|---|
Submitted: | 30 Dec 2004 14:51 | Modified: | 28 Nov 2007 16:02 |
Reporter: | Kai Ruhnau | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S4 (Feature request) |
Version: | 4.0.22 | OS: | Linux (Linux) |
Assigned to: | CPU Architecture: | Any | |
Tags: | bfsm_2007_12_06 |
[30 Dec 2004 14:51]
Kai Ruhnau
[30 Dec 2004 15:01]
Kai Ruhnau
Discussion: It does not matter if I write SELECT IF(table1.configion=7,1,0) FROM table1,table2 WHERE table1.ID=table2.ID_table1 or SELECT IF(table1.configion=7,1,0) FROM table2,table1 WHERE table1.ID=table2.ID_table1 The only difference lies in the order the rows are returned. But within an UPDATE the value of IF silently changes as described above. As long as this is the case any application which uses multi table updates must be aware of some intermediate state between "data before the update" and "data after the update". But this intermediate state can only be seen if every table update is handled by a seperate query. Thus rendering multi table update useless.
[30 Dec 2004 17:44]
Shawn Green
It seems that MySQL doesn't respect row boundaries in its SELECT processing either. SET @testvalue=5; Select @testvalue+1 as bump, @testvalue:=19; results in 6 and 19 but: SET @testvalue=5; Select @testvalue:=19, @testvalue+1 as bump ; results in 19 and 20. I think the second calculation should see the value of @testvalue as it was when the statement execution entered that "row" not the value that is the result of the statement acting on that "row". This same behavior is visible whether updating the single virtual "row" of my example or when evaluating multiple rows in a table-based statement. Why is this considered "correct"? Why should changing the order of columns affect the output of a SQL statement? I honestly believe that a database engine should produce identical results regardless of the table order or the column order WITHIN the statement being executed (outer joins being an exception to that case). I understand that the changes to one row (or the results of processing a row) should be visible during the processing of any subsequent row but not WITHIN the processing boundary of any individual row. I always thought that the engine took a "snapshot" of the row and variables being evaluated at the beginning of each row, then committed the changes as it moved to the next row. That way, the values remained consistent for the entire row evaluation. I guess I was mistaken.
[30 Dec 2004 17:45]
Hartmut Holzgraefe
verified on 4.0.22 and 4.1.8
[3 Jan 2005 18:16]
MySQL Verification Team
Hi! Here I will address only multi-table updates. What you see is actually expected behaviour, documented in the manual. Multi-table updates will in many, many cases, change the order of evaluation , so that it conforms with 1 to 1 with the relationship of the tables in the join. This is documented in our manual. Also, multi-table updates do not change any tables on the fly, except in some rare cases. Instead, they first store all offsets and fetched values in a temporary tables and then updates are done. This is due to the limitation imposed in locking and some other issues. Let me know if this resolves this bug.
[5 Jan 2005 20:05]
Shawn Green
Documenting improper behavior does not make it correct. It just puts us (the developers) on guard that things do not behave according to the standards. In fact things do not even behave as you explain it. You say: >>Instead, they first store all offsets and fetched values in a temporary tables and then updates are done. This is due to the limitation imposed in locking and some other issues.<< If that were true then the test cases would have produced identical results. How can the IF() statement be true when it's the first statement evaluated but false if it's the second statement (as in the test UPDATE)? The only logical explanation is that the value of "table1.condition" has changed DURING THE PROCESSING OF THE STATEMENT. If it were being read from a temporary input table (and not the temporary results table) it wouldn't have changed. This is not as you explained it. If you were right, the value of "table1.condition" would be the same for the calculation of each and every term in the SET clause. That is what at least Kai and I expect to happen during an UPDATE. More information on my position can be found here: http://lists.mysql.com/internals/20104 In that thread I provide quotes from the SQL-92, SQL-99, and SQL-2003 specifications that seem to support my position. Which specification is used by the MySQL developers as their reference?
[6 Jan 2005 11:30]
Sergei Golubchik
Usually SQL:2003, but mutli-table update is not in the standard. Still, I think the reported behaviour looks like a bug.
[7 Jan 2005 14:38]
Shawn Green
I agree. The way the BNF of <update statement: searched> (SQL:2003, section 14.11) is written, it seems to apply to only single-table updates. However there are so many comments and definitions applied to <update statement: searched> throughout the rest of the document (I can make a list of them if you want it) that refer to either subtables (hierarchical table definitions), updateable views (which could be based on mulitple tables), and updatable unions that it seems logical to substitute the term "<table reference>" (section 7.6) for the term "<target table>" used in the BNF. This simple change (extension?) contains, as a subset, the strict (single-table) definition of <update statement: searched> while creating a workable standard for multitable updates.
[10 Jan 2005 16:27]
MySQL Verification Team
In my modest opinion, only the select list bug needs to be addressed.
[19 Jan 2005 18:32]
Shawn Green
Sinisia: I guess which statement gets fixed depends on how much code is shared between the SELECT processing routine and the UPATE processing routines. Fixing one may fix the other, but I doubt it. I am willing to withdraw my request to also "fix" the SELECT statement as I haven't been able to cross-check it with the standards. However, I still believe the UPDATE behavior does deviate from the standards and should be changed to bring it into compliance.
[19 Jan 2005 18:50]
MySQL Verification Team
As explained before, multi-table updates and deletes are 100 % non-standard, so there is no standard to comply with. Also, join part is done entirely in SELECT code, which relies heavily on freely changing the order of tables. That is something that will not be changed.
[25 Jan 2005 15:02]
Shawn Green
Updateable VIEWs __are__ in the standard. VIEWs can be composed of multiple tables (also standardized). Therefore, it is a logical that if you want to define a multitable update, you can infer that the update "target" (the JOINed tables) looks exactly like a VIEW composed of multiple tables. That is why I suggested using the term "<table reference>" as a subsitute for "<target table>" in the BNF of "<update statement: searched>". For example, if we define a view: CREATE VIEW vwTest1 AS SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.table1_id; and Update a row of that view with: Update vwTest1 SET flagme = '*' WHERE subname like 'fred%' That those actions should be equivalent to: UPDATE table1 INNER JOIN table2 ON table1.id = table2.table1_id SET flagme = '*' WHERE subname like 'fred%'; Even if the spec doesn't explicitly include the case for multitable updates (because of a faulty BNF?) the specification _has_ defined and explained the appropriate actions for a multitable update in the context of updateable VIEWs. I believe you can agree that the multitable update "extension" to SQL-2003 should be at least modeled on the specifications for updateable VIEWs. Back to topic: The behavior Kai and I are trying to get fixed is not related to the actions of the JOIN processor but rather how the UPDATE statement code is not caching the results of each row's transformation. The specifications clearly state that the "final" values of a row are calculated from the "initial" values as transformed by the list of SET assignments. What the current engine is doing is modifying the set of "initial" values as it progresses through the list of SET transformations in the column order of the target tables. What it should be doing is storing a set of "final" values for that row until all of the columns have been calculated. Only after all calculations are complete should the changes be written back to the tables. That is why: UPDATE table1, table2 SET table2.value=table2.value-IF(table1.condition=7,1,0), table1.condition=8 WHERE table2.ID_table1=table1.ID Currently produces a different result than UPDATE table2, table1 SET table2.value=table2.value-IF(table1.condition=7,1,0), table1.condition=8 WHERE table2.ID_table1=table1.ID (all I did was list the source tables in a different order) Those two statements should produce identical results because the value of table1.condition should not "change" until after all of the SET assignements are complete. Inside the IF() statement, table1.condition is used as an "initial" value. The result of "SET table1.condition=8" creates a different "final" value for table1.condition. The UPDATE processing code is not keeping those values separate until after all of the SET assignments finish.
[25 Jan 2005 18:02]
Sergei Golubchik
Shawn, we do agree with you. Notice - the state of the bugreport is "Verified" :)
[29 Jan 2005 6:25]
Igor Babaev
This new semantics for multi-tables updates compliant with the standard will be implemented after completion of WL #927 "Simultaneous assignments in UPDATE ... SET". Formally two problems are weakly relate to each other, but apparently the current semantics of multi-table updates follows the logic of multi-column updates for one table, which not standard compliant either. Standard compliant semantics is to be introduced in version 5.1.