Bug #120196 n_mysql_tables_in_use uint32 underflow via MRR dsmrr_init() error path in handler.cc
Submitted: 31 Mar 23:50
Reporter: Vinicius Malvestio Grippa Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:9.6.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, mrr, underflow

[31 Mar 23:50] Vinicius Malvestio Grippa
Description:
ROOT CAUSE:

In sql/handler.cc, DsMrr_impl::dsmrr_init() error label:
https://github.com/mysql/mysql-server/blob/trunk/sql/handler.cc#L6894

  error:
    h2->ha_index_or_rnd_end();                    // line 6895
    h2->ha_external_lock(thd, F_UNLCK);           // line 6896 - always called
    h2->ha_close();                               // line 6897

When ha_external_lock(thd, lock_type) at line 6807 fails or is never reached:
https://github.com/mysql/mysql-server/blob/trunk/sql/handler.cc#L6807

execution jumps to "error:" which calls ha_external_lock(thd, F_UNLCK)
unconditionally. InnoDB then decrements trx->n_mysql_tables_in_use (a uint32_t)
in ha_innobase::external_lock() without a matching increment:

Decrement (no guard):
https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innodb.cc#L19...

Increment:
https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innodb.cc#L19...

Counter type (uint32_t):
https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/include/trx0trx.h#L945

Compare with ha_innopart.cc which correctly guards the decrement:
https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innopart.cc#L...

CASCADING EFFECTS:
- "if (trx->n_mysql_tables_in_use == 0)" never triggers:
  https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innodb.cc#L19...
- mysql_n_tables_locked is never reset
- Auto-commit cleanup never fires for this trx
- SHOW ENGINE INNODB STATUS permanently shows huge "tables in use" value

PRODUCTION TRIGGER CONDITIONS (ha_external_lock fails on MRR clone):
- binlog_format=STATEMENT with READ COMMITTED/READ UNCOMMITTED isolation
- Discarded tablespace (ALTER TABLE DISCARD TABLESPACE during query)
- innodb_force_recovery / innodb-read-only mode
- Lock wait timeout / deadlock in LOCK TABLES context

Each failed BKA/MRR join decrements once. After 40 such failures on one
connection: counter = 2^32 - 40 = 4294967256 (the observed value).

How to repeat:
1) Apply this DBUG injection to storage/innobase/handler/ha_innodb.cc
   after line 19172 "trx->n_mysql_tables_in_use--;":
   https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innodb.cc#L19...

   This simulates the MRR error path where ha_external_lock(F_UNLCK)
   is called without a matching lock:
   https://github.com/mysql/mysql-server/blob/trunk/sql/handler.cc#L6896

   trx->n_mysql_tables_in_use--;                           // line 19172
+  DBUG_EXECUTE_IF("ib_extra_tables_in_use_decrement", {
+    for (int i = 0; i < 40; i++) {
+      trx->n_mysql_tables_in_use--;
+    }
+  });
   m_mysql_has_locked = false;                              // line 19173

2) Build with debug:

   mkdir build && cd build
   cmake .. -DWITH_DEBUG=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/tmp/boost
   make -j$(nproc)

3) Initialize and start:

   ./bin/mysqld --initialize-insecure --datadir=/tmp/mysql-bug-data --basedir=.
   ./bin/mysqld --datadir=/tmp/mysql-bug-data --basedir=. \
     --socket=/tmp/mysql-bug.sock --skip-grant-tables \
     --log-error=/tmp/mysql-bug.log &

4) Connect and run:

   ./bin/mysql -u root --socket=/tmp/mysql-bug.sock

   CREATE DATABASE IF NOT EXISTS test;
   USE test;
   CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b INT NOT NULL) ENGINE=InnoDB;
   INSERT INTO t1 VALUES (1, 10), (2, 20), (3, 30);
   BEGIN;
   INSERT INTO t1 VALUES (99, 990);
   SET SESSION debug = '+d,ib_extra_tables_in_use_decrement';
   SELECT * FROM t1;

5) Check the error log:

   grep "tables in use" /tmp/mysql-bug.log

   Expected:
     MySQL is freeing a thd though trx->n_mysql_tables_in_use is 4294967255
     mysql tables in use 4294967255, locked 0

   In release builds (production), assertions are compiled out so the server
   does NOT crash - it keeps running with the corrupted counter visible
   in SHOW ENGINE INNODB STATUS, matching the reported observation.

Suggested fix:
Two changes needed:

1) sql/handler.cc line 6896 - only unlock if lock was actually acquired:
   https://github.com/mysql/mysql-server/blob/trunk/sql/handler.cc#L6894

--- a/sql/handler.cc
+++ b/sql/handler.cc
 error:                                                          // line 6894
   h2->ha_index_or_rnd_end();                                   // line 6895
-  h2->ha_external_lock(thd, F_UNLCK);                          // line 6896
+  if (h2->get_lock_type() != F_UNLCK) h2->ha_external_lock(thd, F_UNLCK);
   h2->ha_close();                                              // line 6897

2) storage/innobase/handler/ha_innodb.cc line 19172 - guard the
   decrement (defense in depth, matching ha_innopart.cc line 4061):
   https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innodb.cc#L19...
   https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/handler/ha_innopart.cc#L...

--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
-  trx->n_mysql_tables_in_use--;                                // line 19172
+  ut_ad(trx->n_mysql_tables_in_use > 0);
+  if (trx->n_mysql_tables_in_use > 0) {
+    trx->n_mysql_tables_in_use--;
+  }
[3 Apr 6:23] Mayank Prasad
Thank you for the bug. Please submit the contribution in contributions tab and also with OCA signed