Bug #31724 UPDATE not properly using indexes
Submitted: 20 Oct 2007 7:54 Modified: 30 Jul 2010 10:57
Reporter: Ricardo Amorim Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:mysql-5.1.22-rc-linux-x86_64-icc-glibc23 OS:Linux (Debian 2.6.18-5-amd64)
Assigned to: Assigned Account CPU Architecture:Any

[20 Oct 2007 7:54] Ricardo Amorim
Description:
I want to update `dados`.t1_b with data from tel.b. I'm using UPDATE query with INNER JOIN on both PRIMARY KEYS.

Server info: 
Dual Xeon 5050 (dual core)
8GB RAM
6x SCSI 10k RPM 73GB RAID 10

Table structures:

mysql> describe tel;
+-------+------------------------------+------+-----+-------------+-------+
| Field | Type                         | Null | Key | Default     | Extra |
+-------+------------------------------+------+-----+-------------+-------+
| a     | bigint(11) unsigned zerofill | NO   | PRI | 00000000000 |       |
| b     | mediumint(6)                 | YES  |     | NULL        |       |
+-------+------------------------------+------+-----+-------------+-------+

mysql> describe dados;
+------------------+------------------------------+------+-----+-------------+-------+
| Field | Type                         | Null | Key | Default     | Extra |
+------------------+------------------------------+------+-----+-------------+-------+
| a     | bigint(11) unsigned zerofill | NO   | PRI | 00000000000 |       |
| atrib | smallint(5) unsigned         | YES  |     | 0           |       |
| nome  | varchar(100)                 | YES  |     | NULL        |       |
| sit   | varchar(30)                  | YES  |     | NULL        |       |
| nm    | varchar(100)                 | YES  |     | NULL        |       |
| dt    | date                         | YES  |     | NULL        |       |
| s     | char(1)                      | YES  |     | NULL        |       |
| uf    | char(2)                      | YES  |     | NULL        |       |
| t1_b  | mediumint(6)                 | YES  |     | 0           |       |
| pf    | mediumint(6)                 | YES  |     | 0           |       |
+------------------+------------------------------+------+-----+-------------+-------+

1. I run the UPDATE query, check "vmstat 1" and then cancel the query. You can notice it's doing a lot of disk reads and a small amount of writes. I expect a high CPU usage (for the JOIN) and mostly disk writes (UPDATE).

mysql> update dados inner join tel on dados.a = tel.a set dados.t1_b = tel.b; 
Query aborted by Ctrl+C
ERROR 1317 (70100): Query execution was interrupted

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936 296828   7524 5045340    0    0   860     0  470  440  4  2 75 18
 0  1   1936 295928   7532 5046432    0    0   860   768  471  455  3  3 75 19
 0  1   1936 294840   7532 5047564    0    0   868     0  466  441  3  3 75 19
 0  1   1936 293752   7532 5048664    0    0   848     0  461  434  5  2 75 18

2. I run a simple test query to make sure that it is properly making the JOIN using cached keys with no reads from disk.

mysql> create table test as select tel.b from dados inner join tel on dados.a = tel.a;
Query OK, 8094268 rows affected (1 min 47.44 sec)
Records: 8094268  Duplicates: 0  Warnings: 0

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936 250796   7828 5087956    0    0     0     0  253   14 20  4 75  0
 1  0   1936 248564   7828 5089936    0    0     0     0  253   20 20  5 75  0
 1  0   1936 244472   7836 5094032    0    0     0 12332  287   26 18  6 75  0
 1  0   1936 242364   7844 5096004    0    0     0    36  255   25 20  5 75  0

3. I run the exact same query as in step "1". Now I get the desired results - no read from disk, only writes and high CPU usage.

mysql> update dados inner join tel on dados.a = tel.a set dados.t1_b = tel.b; 

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936  50564   7320 5285200    0    0     0  7400  286   33 14 10 75  1
 1  0   1936  49572   7320 5286308    0    0     0     0  253   18 16  9 75  0
 1  0   1936  48580   7320 5287384    0    0     0     0  253   16 15 10 75  0
 1  0   1936  47464   7320 5288264    0    0     0     0  253   20 15 10 75  0

How to repeat:
Check above.
[20 Oct 2007 9:04] Ricardo Amorim
A better explanation of the issue:

According to "6.2.18. Speed of UPDATE Statements" in the documentation, UPDATE and SELECT statements are similar when it comes to query optimization.

That said, I expect that UPDATE query "1" and CREATE TABLE (SELECT) query "2" perform the JOIN operation in a similar way. This does not happen as you can see from "vmstat" output. Query "1" does a lot of disk reads while query "2" is mostly CPU bound with ocasional writes to disk.

I think that the SELECT query "2" does some kind of caching that is later used by UPDATE query "3". Caching is good, but I expect the behaviour of UPDATE query "1" to be the same as UPDATE query "3". The UPDATE query takes 20+ minutes the first time it is run (without the SELECT caching).

I have tried to manually LOAD INDEX INTO CACHE the PRIMARY keys from both tables and then perform the UPDATE but there was _NO_ change from behaviour of query "1".

I'd be happy to provide any further information or perform any other test.
[20 Oct 2007 9:51] Sveta Smirnova
Thank you for the report.

Please provvide output of explain select tel.b from dados inner join tel on dados.a = tel.a; and SHOW STATUS for both tables.
[20 Oct 2007 15:23] Ricardo Amorim
mysql> explain select tel.b from dados inner join tel on dados.a = tel.a;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                   | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+----------+-------------+
|  1 | SIMPLE      | dados | index  | PRIMARY       | PRIMARY | 8       | NULL                  | 10214373 | Using index |
|  1 | SIMPLE      | tel   | eq_ref | PRIMARY       | PRIMARY | 8       | celeste.dados.a |        1 |             |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+----------+-------------+
2 rows in set (0.00 sec)

mysql> show table status like 'dados';
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name  | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| dados | MyISAM |      10 | Dynamic    | 10214373 |             52 |   538811000 | 281474976710655 |    171595776 |         0 |           NULL | 2007-10-19 18:41:02 | 2007-10-19 18:54:17 | NULL       | latin1_swedish_ci |     NULL |                |         |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+

mysql> show table status like 'tel';
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length  | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| tel  | MyISAM |      10 | Fixed      | 19919306 |             12 |   239031672 | 3377699720527871 |    287178752 |         0 |           NULL | 2007-10-19 07:37:47 | 2007-10-19 07:39:54 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+----------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
[30 Jun 2010 10:57] Sveta Smirnova
Thank you for the feedback.

I can not repeat described behavior with test data.

Please try current version 5.1.49 and if problem still exists send us output of `show status like 'ha%';` before and after every query.
[30 Jul 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".