Bug #26827 table->read_set is set incorrectly, causing update of a different column
Submitted: 3 Mar 2007 20:37 Modified: 7 Jul 2007 18:50
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Partitions Severity:S2 (Serious)
Version:5.1, 5.2 OS:Any (SUSE 10.0 / 64-bit, suse9.3 x86)
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: falcon

[3 Mar 2007 20:37] Peter Gulutzan
Description:
I create a partitioned Falcon table with two columns.
I insert one row, with two non-NULL values.
I update the row to change the first column.
The second column is now NULL.

This happens only if table engine is falcon, and
table has at least two partitions, and the first
column is indexed, and the update statement has a
WHERE clause.

How to repeat:
mysql> create table tb (s1 int, s2 int) engine=falcon partition by list(s1) (partition p1 values in (1), partition p2 values in (2));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into tb values (1,1);
Query OK, 1 row affected (0.01 sec)

mysql> create index ib on tb (s1);
Query OK, 1 row affected (0.05 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> update tb set s1 = 2 where s1 = 1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from tb;
+------+------+
| s1   | s2   |
+------+------+
|    2 | NULL |
+------+------+
1 row in set (0.00 sec)
[4 Mar 2007 11:44] MySQL Verification Team
verified on todays bk source. see uploaded file for session output.
[4 Mar 2007 11:45] MySQL Verification Team
5.2.4 output showing myisam vs falcon

Attachment: output.txt (text/plain), 1.38 KiB.

[4 Mar 2007 13:36] Hakan Küçükyılmaz
Added test case falcon_bug_26827.test to mysql-5.1-falcon tree.

Regards,

Hakan
[5 Jun 2007 20:14] Kevin Lewis
This bug is similar to Bug#28158 in that the table->read_set is set incorrectly.  Falcon is being called with this call stack;

mysql_update <sql_update.cc, line 482> -> 
rr_from_tempfile <records.cc, line 391> -> 
ha_partition::rnd_pos <ha_partition.cpp, line 3226> ->
StorageInterface::rnd_pos <ha_falcon.cpp, line 528> ->
StorageInterface::decodeRecord <ha_falcon.cpp, line 2189> ->
   field->set_null();  // This is where the NULL comes from.

Bug#28158 has to do with reporting data for a duplicate conflict and is related to using a unique index.  This is also related to using an index, but it is more serious since it actually changes the value of unreferenced fields in an update.

If the update references an unindexed field, the update works correctly, but if the update filters on only indexed fields, unreferenced fields are set to null.
This can be seen in the following examples;

mysql> create table tb (s1 varchar(10), s2 varchar(10), s3 varchar(10)) engine=falcon PARTITION BY KEY(s1) PARTITIONS 2;
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO tb VALUES ('AAA', 'AAA', 'AAA');
Query OK, 1 row affected (0.02 sec)

mysql> CREATE INDEX ib ON tb (s1);
Query OK, 1 row affected (0.11 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| AAA  | AAA  | AAA  |
+------+------+------+
1 row in set (0.00 sec)

mysql> UPDATE tb SET s1 = 'BBB' WHERE s2 = 'AAA';
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| BBB  | AAA  | AAA  |
+------+------+------+
1 row in set (0.00 sec)

mysql> DROP TABLE tb;
Query OK, 0 rows affected (0.03 sec)

mysql> create table tb (s1 varchar(10), s2 varchar(10), s3 varchar(10)) engine=falcon PARTITION BY KEY(s1) PARTITIONS 2;
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO tb VALUES ('AAA', 'AAA', 'AAA');
Query OK, 1 row affected (0.01 sec)

mysql> CREATE INDEX i1 ON tb (s1);
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX i2 ON tb (s2);
Query OK, 1 row affected (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX i3 ON tb (s3);
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| AAA  | AAA  | AAA  |
+------+------+------+
1 row in set (0.01 sec)

mysql> UPDATE tb SET s1 = 'BBB' WHERE s2 = 'AAA';
Query OK, 1 row affected (2.64 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| BBB  | AAA  | NULL |
+------+------+------+
1 row in set (0.67 sec)

mysql> DROP TABLE tb;
Query OK, 0 rows affected (0.01 sec)

mysql> create table tb (s1 varchar(10), s2 varchar(10), s3 varchar(10)) engine=falcon PARTITION BY KEY(s1) PARTITIONS 2;
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO tb VALUES ('AAA', 'AAA', 'AAA');
Query OK, 1 row affected (0.01 sec)

mysql> CREATE INDEX i1 ON tb (s1);
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX i2 ON tb (s2);
Query OK, 1 row affected (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX i3 ON tb (s3);
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| AAA  | AAA  | AAA  |
+------+------+------+
1 row in set (0.02 sec)

mysql> UPDATE tb SET s1 = 'BBB' WHERE s3 = 'AAA';
Query OK, 1 row affected (3.20 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM tb;
+------+------+------+
| s1   | s2   | s3   |
+------+------+------+
| BBB  | NULL | AAA  |
+------+------+------+
1 row in set (0.61 sec)

This is a problem in the server in which the updated record is actually being inserted into a different partition.  When this happens, the table->read_set should be set to all fields so that the full record can be inserted.

For now, I am assigning it to Sergei since he is familiar with Bug#28158
[16 Jun 2007 19:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28934

ChangeSet@1.2563, 2007-06-16 21:55:04+02:00, istruewing@chilla.local +9 -0
  Bug#26827 - table->read_set is set incorrectly,
                causing update of a different column
  Bug#28158 - table->read_set is set incorrectly,
                causing wrong error message in Falcon
  Bug#28165 - Falcon: hang after select for update
  Bug#28971 - ALTER TABLE followed by UPDATE for a CSV table
              make server crash
  
  Some engines require a complete record for update_row().
  Engines with this requirement detected so far are Falcon
  and CSV.
  
  The server did not provide complete records in all cases.
  Most engines don't need them, and constructing a complete
  record would be unnecessary effort.
  
  I fixed the problem by adding a new table flag
  HA_REQUIRES_ALL_COLUMNS_FOR_UPDATE. This is set by the affected
  handlers. mysql_update() calls mark_columns_needed_for_update(),
  where the needed columns are marked. Here we notice the flag and
  mark all columns for these engines. Consequently complete records
  are read for them.
  
  After fixing this, the test for Bug#22852 (Falcon: Packets out
  of order after handled error) started to fail. Since we have all
  bits set in read_set, write_set is now a subset of read_set. In
  this case mysql_update() compares old and new records and skips
  handler::update_row() when they are the same. This must not be
  done for transactional engines. Another transaction could have
  done the same update right before. The skipped update_row() hide
  the changes from the transaction. It continues with the unchanged
  rows. I added a respective condition.
[21 Jun 2007 14:04] Ingo Strüwing
Calvin wondered if Falcon does indeed need a complete record for updating of just a few columns. So I re-investigated the problem. Now I believe it is a partitioning problem. When the problem occurs, the stack looks like so:

#1  StorageInterface::write_row at ha_falcon.cpp:890
#2  ha_partition::update_row at ha_partition.cc:2770
#3  handler::ha_update_row at handler.cc:3689
#4  mysql_update

In #2 it is detected that the update moves the record from one partition to another. Hence update_row() is converted into write_row(). The record that is sent to write_row() is the new record as created by mysql_update(). It contains valid values only for columns in read_set and write_set.

The Falcon handler expects a complete row with write_row(). Especially in this situation write_row() is into a partition that has not been read from before. Where should the handler take the missing columns from?

IMHO the partition handler must care for a complete record when converting update_row() into write_row().
[27 Jun 2007 18:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29770

ChangeSet@1.2567, 2007-06-27 20:26:52+02:00, istruewing@chilla.local +4 -0
  Bug#26827 - table->read_set is set incorrectly,
              causing update of a different column
  
  For efficiency some storage engines do not read a complete record
  for update, but only the columns required for selecting the rows.
  
  When updating a row of a partitioned table, modifying a column
  that is part of the partition or subpartition expression, then
  the row may need to move from one [sub]partition to another one.
  This is done by inserting the new row into the target
  [sub]partition and deleting the old row from the originating one.
  For the insert we need a complete record.
  
  If an above mentioned engine was used for a partitioned table, we
  did not have a complete record in update_row(). The implicitly
  executed write_row() got an incomplete record.
  
  This is solved by instructing the engine to read a complete record
  if one of the columns of the partition or subpartiton is to be
  updated.
[4 Jul 2007 16:19] Ingo Strüwing
http://lists.mysql.com/commits/30292

ChangeSet@1.2583, 2007-07-04 14:44:03+02:00, istruewing@chilla.local +3 -0
  Bug#26827 - table->read_set is set incorrectly,
              causing update of a different column
  
  Add-on patch to the previous patch.
  
  Fixed a valgrind error: Optimized bitmap buffer allocation:
  thd->alloc() does not require free.
  
  Optimized bitmap handling. Used the new bitmap for adding
  partition fields to read_set. Do so only if needed (not for SELECT).

Queued to mysql-5.1-falcon.
[4 Jul 2007 19:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30328

ChangeSet@1.2527, 2007-07-04 21:55:26+02:00, istruewing@chilla.local +4 -0
  Bug#26827 - table->read_set is set incorrectly,
            causing update of a different column
  
  For efficiency some storage engines do not read a complete record
  for update, but only the columns required for selecting the rows.
  
  When updating a row of a partitioned table, modifying a column
  that is part of the partition or subpartition expression, then
  the row may need to move from one [sub]partition to another one.
  This is done by inserting the new row into the target
  [sub]partition and deleting the old row from the originating one.
  For the insert we need a complete record.
  
  If an above mentioned engine was used for a partitioned table, we
  did not have a complete record in update_row(). The implicitly
  executed write_row() got an incomplete record.
  
  This is solved by instructing the engine to read a complete record
  if one of the columns of the partition or subpartiton is to be
  updated.
  
  No testcase. This can be reproduced with Falcon only. The engines
  contained in standard 5.1 do always return complete records on
  update.
[4 Jul 2007 19:59] Ingo Strüwing
Queued to 5.1-engines.
[5 Jul 2007 9:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30355

ChangeSet@1.2528, 2007-07-05 11:31:03+02:00, istruewing@chilla.local +2 -0
  Bug#26827 - table->read_set is set incorrectly,
              causing update of a different column
  
  Post-pushbuild fix.
  
  bitmap_set_bit() is an inline function in DEBUG builds and
  a macro in non-DEBUG builds. The latter evaluates its 'bit'
  argument twice. So one must not use increment/decrement operators
  on this argument.
  
  Moved increment of pointer out of bitmap_set_bit() call.
[5 Jul 2007 9:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30356

ChangeSet@1.2585, 2007-07-05 11:42:42+02:00, istruewing@chilla.local +2 -0
  Bug#26827 - table->read_set is set incorrectly,
            causing update of a different column
  
  Post-pushbuild fix.
  
  bitmap_set_bit() is an inline function in DEBUG builds and
  a macro in non-DEBUG builds. The latter evaluates its 'bit'
  argument twice. So one must not use increment/decrement operators
  on this argument.
  
  Moved increment of pointer out of bitmap_set_bit() call.
[6 Jul 2007 6:59] Hakan Küçükyılmaz
falcon_bug_26827.test is passing now on Falcon.
[7 Jul 2007 16:34] Bugs System
Pushed into 5.1.21-beta
[7 Jul 2007 18:50] Paul DuBois
Noted in 5.1.21 changelog.

Updates to rows in a partitioned table could update the wrong column.