Bug #80652 binary group commit hangs when binlog_group_commit_sync_delay is set to 1~9
Submitted: 8 Mar 2016 5:36 Modified: 15 Dec 2016 12:32
Reporter: Yang Lyn Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.5~5.7.11, 5.7.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog_group_commit_sync_delay, Group Commit, hang

[8 Mar 2016 5:36] Yang Lyn
Description:
When binlog_group_commit_sync_delay is set to 1~9:
1) if binlog_group_commit_sync_no_delay_count > 1, and the number of transaction commits is less than binlog_group_commit_sync_no_delay_count, these commits will hang forever if no more commits come in;
2) if binlog_group_commit_sync_no_delay_count = 0, all transaction commits will hang forever;

How to repeat:
CREATE TABLE t(
id INT PRIMARY KEY,
v INT NOT NULL) ENGINE INNODB;

INSERT INTO t VALUES(1, 100), (2, 100), (3, 100), (4, 100);

##################### begin of case 1 ######################
SET GLOBAL binlog_group_commit_sync_delay = 8;
SET GLOBAL binlog_group_commit_sync_no_delay_count = 2;

# session 1
BEGIN;
UPDATE t set v = v + 100 WHERE id = 1;
COMMIT; # hangs here forever if no more commits come in

# session 2: execute after session 1 hanging for a while
BEGIN;
UPDATE t set v = v + 100 WHERE id = 2;
COMMIT; # both session 2 and session 1 will commit
###################### end of case 1 #######################

##################### begin of case 2 ######################
SET GLOBAL binlog_group_commit_sync_delay = 8;
SET GLOBAL binlog_group_commit_sync_no_delay_count = 0;

# session 1
BEGIN;
UPDATE t set v = v + 100 WHERE id = 3;
COMMIT; # hangs here forever

# session 2
BEGIN;
UPDATE t set v = v + 100 WHERE id = 4;
COMMIT; # hangs here forever
###################### end of case 2 #######################

Suggested fix:
This bug is caused by function Stage_manager::wait_count_or_timeout in sql/binlig.cc;

time_t Stage_manager::wait_count_or_timeout(ulong count, time_t usec, StageID stage)
{
  time_t to_wait=
    DBUG_EVALUATE_IF("bgc_set_infinite_delay", LONG_MAX, usec);
  /*
    For testing purposes while waiting for inifinity
    to arrive, we keep checking the queue size at regular,
    small intervals. Otherwise, waiting 0.1 * infinite
    is too long.
   */
  time_t delta=
    DBUG_EVALUATE_IF("bgc_set_infinite_delay", 100000,
                     static_cast<time_t>(to_wait * 0.1));

  while (to_wait > 0 && (count == 0 || static_cast<ulong>(m_queue[stage].get_size()) < count))
  {
#ifndef DBUG_OFF
    if (current_thd)
      DEBUG_SYNC(current_thd, "bgc_wait_count_or_timeout");
#endif
    my_sleep(delta);
    to_wait -= delta;
  }
  return to_wait;
}

when binlog_group_commit_sync_delay is set to 1~9, the "delta" will be evaluated to 0, and thus
the while loop won't terminate properly.

BUG fix suggestion:
when binlog_group_commit_sync_delay is set to 1~9, the "delta" should be set to 1 or binlog_group_commit_sync_delay
[8 Mar 2016 9:13] MySQL Verification Team
Hello Yang Lyn,

Thank you for the report and test case.
Verified as described with 5.7.11 build.

Thanks,
Umesh
[25 May 2016 16:59] MySQL Verification Team
I hit this today on 5.7.12.  The thread stacks during hang..
mysqld!Stage_manager::wait_count_or_timeout+0x68 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 1943]
mysqld!MYSQL_BIN_LOG::ordered_commit+0x277 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 8851]
mysqld!MYSQL_BIN_LOG::commit+0x4e2 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 8066]
mysqld!ha_commit_trans+0x1b5 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\handler.cc @ 1739]
mysqld!trans_commit_stmt+0x3a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\transaction.cc @ 459]
mysqld!mysql_execute_command+0x392a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 4958]
mysqld!mysql_parse+0x25a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 5525]
mysqld!dispatch_command+0x63d [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 1432]
mysqld!do_command+0x1ea [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 999]

---------

ntdll!ZwWaitForKeyedEvent+0xa
ntdll!RtlSleepConditionVariableCS+0xd4
kernel32!SleepConditionVariableCS+0x28
mysqld!Stage_manager::enroll_for+0x15e [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 1894]
mysqld!MYSQL_BIN_LOG::change_stage+0x21 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 8381]
mysqld!MYSQL_BIN_LOG::ordered_commit+0x250 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 8839]
mysqld!MYSQL_BIN_LOG::commit+0x4e2 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\binlog.cc @ 8066]
mysqld!ha_commit_trans+0x1b5 [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\handler.cc @ 1739]
mysqld!trans_commit_stmt+0x3a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\transaction.cc @ 459]
mysqld!mysql_execute_command+0x392a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 4958]
mysqld!mysql_parse+0x25a [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 5525]
mysqld!dispatch_command+0x63d [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 1432]
mysqld!do_command+0x1ea [g:\ade\build\sb_0-18410765-1459191343.73\mysql-5.7.12\sql\sql_parse.cc @ 999]

mysql> show processlist;
+-----+------+------+------+-------------+------+---------------------------------------------------------------+----------------------------
| Id  | User | Host | db   | Command     | Time | State                                                         | Info
+-----+------+------+------+-------------+------+---------------------------------------------------------------+----------------------------
| 491 | root |      | test | Query       |  379 | query end                                                     | delete from  t  limit 1
| 500 | root |      | test | Query       |  379 | query end                                                     | delete from  s  limit 1
| 503 | root |      | test | Query       |  379 | query end                                                     | insert ignore into `test`.`
| 506 | ODBC |      | test | Query       |    0 | starting                                                      | show processlist
| 515 | root |      | NULL | Binlog Dump |   16 | Master has sent all binlog to slave; waiting for more updates | NULL
+-----+------+------+------+-------------+------+---------------------------------------------------------------+----------------------------
[15 Dec 2016 12:32] Erlend Dahl
[9 Dec 2016 6:30] David Moss

Thank you for your feedback, this has been fixed in upcoming versions and the
following was added to the 5.7.17 and 8.0.1 changelog:
When binlog_group_commit_sync_delay was set to a value between 1 and 9, if
binlog_group_commit_sync_no_delay_count was set to a value greater than 1,
and the number of transaction commits was less than
binlog_group_commit_sync_no_delay_count, these commits hung forever if no
more commits were received; and if binlog_group_commit_sync_no_delay_count
was set to 0, all transaction commits hung forever.