MySQL Bugs: #94610: Server stalls because ALTER TABLE on partitioned table holds dict mutex

Bug #94610	Server stalls because ALTER TABLE on partitioned table holds dict mutex
Submitted:	8 Mar 2019 19:09	Modified:	5 Aug 2020 0:00
Reporter:	Justin Swanhart	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DDL	Severity:	S5 (Performance)
Version:	5.6.41	OS:	CentOS (7.5)
Assigned to:		CPU Architecture:	x86
Tags:	ALTER TABLE, DDL, partition, stall

Description:
DDL is slow on partitioned tables: https://bugs.mysql.com/bug.php?id=83435

That in itself isn't a particular problem, but the dictionary mutex is held for long periods during the ALTER operation on partitioned tables, and this blocks concurrent threads accessing other tables.

How to repeat:
CREATE TABLE `test`.`__ptosc_shop_shipping_label_images` (
  `shop_id` bigint(20) unsigned NOT NULL,
  `shipping_label_id` bigint(20) unsigned NOT NULL,
  `shipping_label_image_id` bigint(20) unsigned NOT NULL,
  `create_date` int(10) unsigned NOT NULL,
  `update_date` int(10) unsigned NOT NULL,
  `image` mediumblob NOT NULL,
  `image_expire_date` int(10) unsigned NOT NULL,
  `type` varchar(100) NOT NULL DEFAULT '',
  PRIMARY KEY (`shop_id`,`shipping_label_id`,`shipping_label_image_id`,`create_date`),
  KEY `create_date` (`create_date`),
  KEY `update_date` (`update_date`),
  KEY `image_expire_date` (`image_expire_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
/*!50100 PARTITION BY RANGE (create_date)
(PARTITION P1507867200 VALUES LESS THAN (1507867200) ENGINE = InnoDB,
 PARTITION P1509076800 VALUES LESS THAN (1509076800) ENGINE = InnoDB,
 PARTITION P1510290000 VALUES LESS THAN (1510290000) ENGINE = InnoDB,
 PARTITION P1511499600 VALUES LESS THAN (1511499600) ENGINE = InnoDB,
 PARTITION P1512709200 VALUES LESS THAN (1512709200) ENGINE = InnoDB,
 PARTITION P1513918800 VALUES LESS THAN (1513918800) ENGINE = InnoDB,
 PARTITION P1515128400 VALUES LESS THAN (1515128400) ENGINE = InnoDB,
 PARTITION P1516338000 VALUES LESS THAN (1516338000) ENGINE = InnoDB,
 PARTITION P1517547600 VALUES LESS THAN (1517547600) ENGINE = InnoDB,
 PARTITION P1518757200 VALUES LESS THAN (1518757200) ENGINE = InnoDB,
 PARTITION P1519966800 VALUES LESS THAN (1519966800) ENGINE = InnoDB,
 PARTITION P1521172800 VALUES LESS THAN (1521172800) ENGINE = InnoDB,
 PARTITION P1522382400 VALUES LESS THAN (1522382400) ENGINE = InnoDB,
 PARTITION P1523592000 VALUES LESS THAN (1523592000) ENGINE = InnoDB,
 PARTITION P1524801600 VALUES LESS THAN (1524801600) ENGINE = InnoDB,
 PARTITION P1526011200 VALUES LESS THAN (1526011200) ENGINE = InnoDB,
 PARTITION P1527220800 VALUES LESS THAN (1527220800) ENGINE = InnoDB,
 PARTITION P1528430400 VALUES LESS THAN (1528430400) ENGINE = InnoDB,
 PARTITION P1529640000 VALUES LESS THAN (1529640000) ENGINE = InnoDB,
 PARTITION P1530849600 VALUES LESS THAN (1530849600) ENGINE = InnoDB,
 PARTITION P1532059200 VALUES LESS THAN (1532059200) ENGINE = InnoDB,
 PARTITION P1533268800 VALUES LESS THAN (1533268800) ENGINE = InnoDB,
 PARTITION P1534478400 VALUES LESS THAN (1534478400) ENGINE = InnoDB,
 PARTITION P1535688000 VALUES LESS THAN (1535688000) ENGINE = InnoDB,
 PARTITION P1536897600 VALUES LESS THAN (1536897600) ENGINE = InnoDB,
 PARTITION P1538107200 VALUES LESS THAN (1538107200) ENGINE = InnoDB,
 PARTITION P1539316800 VALUES LESS THAN (1539316800) ENGINE = InnoDB,
 PARTITION P1540526400 VALUES LESS THAN (1540526400) ENGINE = InnoDB,
 PARTITION P1541739600 VALUES LESS THAN (1541739600) ENGINE = InnoDB,
 PARTITION P1542949200 VALUES LESS THAN (1542949200) ENGINE = InnoDB,
 PARTITION P1544158800 VALUES LESS THAN (1544158800) ENGINE = InnoDB,
 PARTITION P1545368400 VALUES LESS THAN (1545368400) ENGINE = InnoDB,
 PARTITION P1546578000 VALUES LESS THAN (1546578000) ENGINE = InnoDB,
 PARTITION P1547787600 VALUES LESS THAN (1547787600) ENGINE = InnoDB,
 PARTITION P1548997200 VALUES LESS THAN (1548997200) ENGINE = InnoDB,
 PARTITION P1550206800 VALUES LESS THAN (1550206800) ENGINE = InnoDB,
 PARTITION P1551416400 VALUES LESS THAN (1551416400) ENGINE = InnoDB,
 PARTITION P1552622400 VALUES LESS THAN (1552622400) ENGINE = InnoDB,
 PARTITION P1553832000 VALUES LESS THAN (1553832000) ENGINE = InnoDB,
 PARTITION P1555041600 VALUES LESS THAN (1555041600) ENGINE = InnoDB,
 PARTITION P1556251200 VALUES LESS THAN (1556251200) ENGINE = InnoDB) */
;

run a concurrent workload on the server selecting from any other table.

in another connection run:
ALTER TABLE `test`.`__ptosc_shop_shipping_label_images`  add column `sort` smallint unsigned not null default 1;

then examine SHOW ENGINE INNODB STATUS and you will see things like:
--Thread 140479044425472 has waited at dict0dict.cc line 562 for 26.000 seconds the semaphore:
Mutex at 0x7fdede01ea68 '&dict_sys->mutex', lock var 1

-- while running the alter run the following to see the stalled queries.  
-- Example output from a stall on one of my shards is here as an example.
select state, count(*) from information_schema.processlist group by state order by count(*) desc;
+------------------------------------------------------------------+----------+
| STATE                                                            | count(*) |
+------------------------------------------------------------------+----------+
| Opening tables                                                   |     5984 |
| closing tables                                                   |     1075 |
|                                                                  |      209 |
| Sending data                                                     |      123 |
| Creating sort index                                              |      101 |
| Waiting for an event from Coordinator                            |        8 |
| Master has sent all binlog to slave; waiting for binlog to be up |        3 |
| rename result table                                              |        1 |
| Slave has read all relay log; waiting for the slave I/O thread t |        1 |
| Waiting for master to send event                                 |        1 |
| executing                                                        |        1 |
+------------------------------------------------------------------+----------+

During the stall:
Main thread process no. 3185, id 140434206619392, state: enforcing dict cache limit
Number of rows inserted 390436659, updated 870464553, deleted 8315026, read 1796535553852
12.49 inserts/s, 10.99 updates/s, 1.00 deletes/s, 385152.42 reads/s

table_definition_cache is set to 100000 and there are <20000 opened_definitions

Hi Justin,

The issue with DDL on many partitions is verified, waiting on possible fix. Not sure what is additional to that in this report? MySQL pre 8.x we don't have data dictionary, ddl's are not transactional, it does lock everything, that's a design choice... I'm missing something in your report obviously?!

all best
Bogdan

IMHO, the difference is that this bug is about ALTER blocking everything and Bug#83435 is about ALTER being slow.  In this sense, those two bugs are different, but they might have the same resolution.

I also have similar behavior, but with TRUNCATE PARTITION and with 5.7, I think it might be related.

Bogdan,

>  ddl's are not transactional, it does lock everything,

This report not about transactional DDLs, but about the fact that DDLs on one table lock absolutely unrelated operations (DML on *other* tables).

Hey Sveta,

>  but about the fact that DDLs on one table lock absolutely unrelated operations (DML on *other* tables)

Yes, and it was always doing so (till 8.x)? Not only "dml in other tables", even dml in other database/schema on the same server. Searching the documentation for confirmation if it's "documented" or not but was always like that. Only few online ddl operations tend not to do it and only on 8.x with data dictionary in database and transactional ddl's this behaves differently. I can agree that we can have a documentation issue that this behavior is not properly documented, but I'd say it's very well known (I know I'm having issues with it privately for 20+ years)?! 

So yes, not a transactional issue, it's a locking issue, but is not a bug, especially not an S1 in my opinion. That's if I understand the report properly. 

all best
Bogdan

@Jean-François Gagné

Yes, truncate in this case does drop/create so it's a DDL and not DML (trying to be faster :D )

all best
Bogdan

DDL on one table is not supposed to stall the server.  DROP TABLE used to stall the server and a lot of work was put into it to keep that from happening.  Now DROP/ALTER on partitioned servers are holding the dictionary mutex for long periods and stalling the server.  This is a ALTER on an EMPTY table.  It blocks operations in DML threads that create temporary tables or that need to open a new table in the dictionary.  DDL on one table is not supposed to lock up other tables for the duration of the DDL.  That makes MySQL into sqlite as far as concurrency goes.

Note that a DROP or ALTER on a NON-PARTITIONED table of any size works just fine and doesn't stall the server.  Only operations on PARTITIONED tables stall the server.

Justin, I think you are missing my point, DROP on NON-PARTITIONED table iirc (I might be wrong we did change that at one point but I think this is still the case) takes the same mutex, it's just "done quickly" so it's not a problem, so a problem here is that:

 - it is slow operation ( Bug #83435 )
 - it is blocking operation (IMO Not a Bug)

Both are problems. First one is 100% a bug, as we all agreed, verified, waiting on dev team to figure out how to solve. Second one on the other hand is 100% intentional. We can argue if it's documented properly or not but this is something we have since forever. We can discuss rationale behind that decision and way to fix it, but since this is non blocking operation in 8.x (or should be non blocking in 8.x) I have my doubts of that ever being redesigned into 5.x.

So, yes, I can set the bug to "verified" as, yes, this is easily verified behavior, that exist since at least 3.0 (and I'm pretty sure it was there since day 1) but all that will happen is some documentation visibility as I doubt there a chance a patch that changes this can be allowed in 5.x. hence I'm not doing it and I'm setting it to "not a bug".

On the other hand, the "speed" is something that can probbly be fixed so one should follow Bug #83435 (easily verified, I hope soon to be fixed :) ).

all best
Bogdan

p.s. I'm not 100% final on this as I still want to do a code review here to check this out but ..

This is supposed to be an ONLINE DDL operation.  This DDL is not ONLINE.  It blocks not only the table being modify, but the whole server for the duration of the DDL, which is tens of second for an EMPTY TABLE.  This is a serious bug.  Marking is as not a bug is totally wrong.

To resolve this we had to switch to MyISAM temporary tables.  If you would have bothered to investigate a little bit you might have made that suggestion.  The response to this bug is one of the worst I've ever seen from MySQL.

This is still the issue in 8.0

How to Repeat.

1. Modify source code:

$ git diff
diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 22d29b77f47..6e838bdae90 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -2613,6 +2613,7 @@ void row_mysql_lock_data_dictionary_func(trx_t *trx, /*!< in/out: transaction */
   trx->dict_operation_lock_mode = RW_X_LATCH;
 
   mutex_enter(&dict_sys->mutex);
+  DBUG_EXECUTE_IF("bug94610", {my_sleep(60000000); fprintf(stderr, "bug94610\n");});
 }
 
 /** Unlocks the data dictionary exclusive lock. */

2. Run the attached test.

Expected result: INSERT statement is immediate. Actual result: it takes 55 seconds:

set debug="d,bug94610";
alter table t2 truncate partition p3;
select now();
now()
2019-04-23 02:20:41
insert into t1 values(3);
select now();
now()
2019-04-23 02:21:36
set debug='';
drop table t1, t2;

test case for MTR

Attachment: bug94610.test (application/octet-stream, text), 1.52 KiB.

Hi Sveta,

Thanks for the test case but as I already wrote it's easily reproducible, problem is that I don't believe it's a bug as is and that Bug #83435 is what really needs to be fixed but since Bug #83435 is still not there I'll verify this one too (as behavior is verifiable for sure) so hopefully we have the fix for both sooner, but I'm dropping it to S5 as this is in no way S1 problem.

all best
Bogdan

Thanks for verifying this Bogdan.

But I think it is more than S5.  In my case, I experience stalls in 5.7 of more than 60 seconds on TRUNCATE PARTITION of very large partitions (1.7TB).  IMHO, it deserves a S1, but is you want to lower the priority, I could settle on S2.  But because I am down for more than 1 minute while doing this TRUNCATE, it is more than an S3 for me.

This should not be an S5 bug...