Bug #31724 UPDATE not properly using indexes
Submitted: 20 Oct 2007 9:54 Modified: 14 Nov 2007 11:37
Reporter: Ricardo Amorim
Status: Analyzing
Category:Server: MyISAM Severity:S3 (Non-critical)
Version:mysql-5.1.22-rc-linux-x86_64-icc-glibc23 OS:Linux (Debian 2.6.18-5-amd64)
Assigned to: Sveta Smirnova Target Version:

[20 Oct 2007 9:54] Ricardo Amorim
Description:
I want to update `dados`.t1_b with data from tel.b. I'm using UPDATE query with INNER
JOIN on both PRIMARY KEYS.

Server info: 
Dual Xeon 5050 (dual core)
8GB RAM
6x SCSI 10k RPM 73GB RAID 10

Table structures:

mysql> describe tel;
+-------+------------------------------+------+-----+-------------+-------+
| Field | Type                         | Null | Key | Default     | Extra |
+-------+------------------------------+------+-----+-------------+-------+
| a     | bigint(11) unsigned zerofill | NO   | PRI | 00000000000 |       |
| b     | mediumint(6)                 | YES  |     | NULL        |       |
+-------+------------------------------+------+-----+-------------+-------+

mysql> describe dados;
+------------------+------------------------------+------+-----+-------------+-------+
| Field | Type                         | Null | Key | Default     | Extra |
+------------------+------------------------------+------+-----+-------------+-------+
| a     | bigint(11) unsigned zerofill | NO   | PRI | 00000000000 |       |
| atrib | smallint(5) unsigned         | YES  |     | 0           |       |
| nome  | varchar(100)                 | YES  |     | NULL        |       |
| sit   | varchar(30)                  | YES  |     | NULL        |       |
| nm    | varchar(100)                 | YES  |     | NULL        |       |
| dt    | date                         | YES  |     | NULL        |       |
| s     | char(1)                      | YES  |     | NULL        |       |
| uf    | char(2)                      | YES  |     | NULL        |       |
| t1_b  | mediumint(6)                 | YES  |     | 0           |       |
| pf    | mediumint(6)                 | YES  |     | 0           |       |
+------------------+------------------------------+------+-----+-------------+-------+

1. I run the UPDATE query, check "vmstat 1" and then cancel the query. You can notice
it's doing a lot of disk reads and a small amount of writes. I expect a high CPU usage
(for the JOIN) and mostly disk writes (UPDATE).

mysql> update dados inner join tel on dados.a = tel.a set dados.t1_b = tel.b; 
Query aborted by Ctrl+C
ERROR 1317 (70100): Query execution was interrupted

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936 296828   7524 5045340    0    0   860     0  470  440  4  2 75 18
 0  1   1936 295928   7532 5046432    0    0   860   768  471  455  3  3 75 19
 0  1   1936 294840   7532 5047564    0    0   868     0  466  441  3  3 75 19
 0  1   1936 293752   7532 5048664    0    0   848     0  461  434  5  2 75 18

2. I run a simple test query to make sure that it is properly making the JOIN using
cached keys with no reads from disk.

mysql> create table test as select tel.b from dados inner join tel on dados.a = tel.a;
Query OK, 8094268 rows affected (1 min 47.44 sec)
Records: 8094268  Duplicates: 0  Warnings: 0

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936 250796   7828 5087956    0    0     0     0  253   14 20  4 75  0
 1  0   1936 248564   7828 5089936    0    0     0     0  253   20 20  5 75  0
 1  0   1936 244472   7836 5094032    0    0     0 12332  287   26 18  6 75  0
 1  0   1936 242364   7844 5096004    0    0     0    36  255   25 20  5 75  0

3. I run the exact same query as in step "1". Now I get the desired results - no read
from disk, only writes and high CPU usage.

mysql> update dados inner join tel on dados.a = tel.a set dados.t1_b = tel.b; 

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936  50564   7320 5285200    0    0     0  7400  286   33 14 10 75  1
 1  0   1936  49572   7320 5286308    0    0     0     0  253   18 16  9 75  0
 1  0   1936  48580   7320 5287384    0    0     0     0  253   16 15 10 75  0
 1  0   1936  47464   7320 5288264    0    0     0     0  253   20 15 10 75  0

How to repeat:
Check above.
[20 Oct 2007 11:04] Ricardo Amorim
A better explanation of the issue:

According to "6.2.18. Speed of UPDATE Statements" in the documentation, UPDATE and SELECT
statements are similar when it comes to query optimization.

That said, I expect that UPDATE query "1" and CREATE TABLE (SELECT) query "2" perform the
JOIN operation in a similar way. This does not happen as you can see from "vmstat" output.
Query "1" does a lot of disk reads while query "2" is mostly CPU bound with ocasional
writes to disk.

I think that the SELECT query "2" does some kind of caching that is later used by UPDATE
query "3". Caching is good, but I expect the behaviour of UPDATE query "1" to be the same
as UPDATE query "3". The UPDATE query takes 20+ minutes the first time it is run (without
the SELECT caching).

I have tried to manually LOAD INDEX INTO CACHE the PRIMARY keys from both tables and then
perform the UPDATE but there was _NO_ change from behaviour of query "1".

I'd be happy to provide any further information or perform any other test.
[20 Oct 2007 11:51] Sveta Smirnova
Thank you for the report.

Please provvide output of explain select tel.b from dados inner join tel on dados.a =
tel.a; and SHOW STATUS for both tables.
[20 Oct 2007 17:23] Ricardo Amorim
mysql> explain select tel.b from dados inner join tel on dados.a = tel.a;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref            
      | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+----------+-------------+
|  1 | SIMPLE      | dados | index  | PRIMARY       | PRIMARY | 8       | NULL           
      | 10214373 | Using index |
|  1 | SIMPLE      | tel   | eq_ref | PRIMARY       | PRIMARY | 8       | celeste.dados.a
|        1 |             |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+----------+-------------+
2 rows in set (0.00 sec)

mysql> show table status like 'dados';
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name  | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length |
Max_data_length | Index_length | Data_free | Auto_increment | Create_time         |
Update_time         | Check_time | Collation         | Checksum | Create_options |
Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| dados | MyISAM |      10 | Dynamic    | 10214373 |             52 |   538811000 |
281474976710655 |    171595776 |         0 |           NULL | 2007-10-19 18:41:02 |
2007-10-19 18:54:17 | NULL       | latin1_swedish_ci |     NULL |                |       
 |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+

mysql> show table status like 'tel';
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length |
Max_data_length  | Index_length | Data_free | Auto_increment | Create_time         |
Update_time         | Check_time | Collation         | Checksum | Create_options |
Comment |
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| tel  | MyISAM |      10 | Fixed      | 19919306 |             12 |   239031672 |
3377699720527871 |    287178752 |         0 |           NULL | 2007-10-19 07:37:47 |
2007-10-19 07:39:54 | NULL       | latin1_swedish_ci |     NULL |                |       
 |
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+