Bug #75199 MySQL crashed because of flush_all
Submitted: 13 Dec 2014 7:15 Modified: 6 Aug 2015 17:08
Reporter: zhai weixiang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Memcached Severity:S3 (Non-critical)
Version:5.7.5, 5.7.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: memcached

[13 Dec 2014 7:15] zhai weixiang
Description:
I just installed a memcache plugin with default setting.

The operation bellow crashed the server:

Session 1:
root@test 03:07:16>select * from demo_test;
+-------+------+------+------+------+
| c1    | c2   | c3   | c4   | c5   |
+-------+------+------+------+------+
| test1 | t1   |   10 |    1 |    0 |
+-------+------+------+------+------+
1 row in set (0.00 sec)

root@test 03:07:23>delete from demo_test;
Query OK, 1 row affected (0.00 sec)

Connect telnet:
$telnet 127.0.0.1  13407
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set test1 10 0 2
t1
STORED
flush_all
Connection closed by foreign host.

And the backtrace:

(gdb) bt
#0  0x000000372c6328a5 in raise () from /lib64/libc.so.6
#1  0x000000372c634085 in abort () from /lib64/libc.so.6
#2  0x000000000060f271 in ut_dbg_assertion_failed (expr=0x0, file=<value optimized out>, line=3313) at /u01/project/mysql-lab/mysql-5.7.5-m15/storage/innobase/ut/ut0dbg.cc:68
#3  0x0000000000bb6144 in lock_table_create (table=0x2b21081e2540, type_mode=3, trx=0x2b0c1317ea18) at /u01/project/mysql-lab/mysql-5.7.5-m15/storage/innobase/lock/lock0lock.cc:3313
#4  0x0000000000bbd47c in lock_table (flags=<value optimized out>, table=0x2b21081e2540, mode=LOCK_X, thr=0x2b2198006100) at /u01/project/mysql-lab/mysql-5.7.5-m15/storage/innobase/lock/lock0lock.cc:3698
#5  0x0000000000ca9f68 in ib_trx_lock_table_with_retry (trx=0x2b0c1317ea18, table=0x2b21081e2540, mode=LOCK_X) at /u01/project/mysql-lab/mysql-5.7.5-m15/storage/innobase/api/api0misc.cc:73
#6  0x00002b210420b969 in innodb_cb_cursor_lock (eng=<value optimized out>, ib_crsr=0x2b21980049a0, ib_lck_mode=<value optimized out>)
    at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/innodb_memcache/src/innodb_api.c:2000
#7  0x00002b210420d8ed in innodb_api_begin (engine=0x2b21081c1640, dbname=<value optimized out>, name=0x2b21081e2130 "demo_test", conn_data=0x2b2198003dc0, ib_trx=0x2b0c1317ea18, crsr=0x2b2198003dd8, 
    idx_crsr=0x2b2198003de0, lock_mode=IB_LOCK_TABLE_X) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/innodb_memcache/src/innodb_api.c:225
#8  0x00002b21042094dc in innodb_conn_init (engine=0x2b21081c1640, cookie=<value optimized out>, conn_option=1, lock_mode=IB_LOCK_TABLE_X, has_lock=1 '\001', new_meta_info=0x0)
    at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c:805
#9  0x00002b2104209f75 in innodb_flush (handle=0x2b21081c1640, cookie=0x2b21980008c0, when=<value optimized out>) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c:2055
#10 0x00002b0c11eb7609 in process_command (c=0x2b21980008c0, command=<value optimized out>) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/daemon_memcached/daemon/memcached.c:4627
#11 0x00002b0c11eb7d3e in try_read_command (c=0x2b21980008c0) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/daemon_memcached/daemon/memcached.c:4858
#12 0x00002b0c11eb8821 in conn_parse_cmd (c=0x2b21980008c0) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/daemon_memcached/daemon/memcached.c:5288
#13 0x00002b0c11eac094 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x2b21980008c0) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/daemon_memcached/daemon/memcached.c:5692
#14 0x00002b0c11ebc7aa in event_process_active (base=0x2b21081e0410, flags=<value optimized out>) at /u01/project/mysql-lab/mysql-5.7.5-m15/libevent/event.c:392
#15 event_base_loop (base=0x2b21081e0410, flags=<value optimized out>) at /u01/project/mysql-lab/mysql-5.7.5-m15/libevent/event.c:544
#16 0x00002b0c11eba5b2 in worker_libevent (arg=0x2b21081eadf0) at /u01/project/mysql-lab/mysql-5.7.5-m15/plugin/innodb_memcached/daemon_memcached/daemon/thread.c:306
#17 0x000000372ca07851 in start_thread () from /lib64/libpthread.so.0
#18 0x000000372c6e767d in clone () from /lib64/libc.so.6
(gdb) f 3
#3  0x0000000000bb6144 in lock_table_create (table=0x2b21081e2540, type_mode=3, trx=0x2b0c1317ea18) at /u01/project/mysql-lab/mysql-5.7.5-m15/storage/innobase/lock/lock0lock.cc:3313
3313            check_trx_state(trx);
(gdb) p trx->auto_commit
$5 = false
(gdb) p trx->will_lock
$6 = 0
(gdb) p trx->state    
$7 = TRX_STATE_NOT_STARTED

How to repeat:
described above.

Suggested fix:
I don't know
[22 Dec 2014 6:22] zhai weixiang
My configure file:

[mysqld_safe]
pid-file=/u01/my575/run/mysqld.pid
#malloc-lib=/u01/mysql/lib/libjemalloc.so

[mysql]
port=13406
prompt=\\u@\\d \\r:\\m:\\s>
default-character-set=utf8
no-auto-rehash
skip-insecure-warning

[client]
port=13406
socket=/u01/my575/run/mysql.sock

[mysqld]
#dir
basedir=/u01/my575
datadir=/u01/my575/data
tmpdir=/u01/my575/tmp
lc_messages_dir=/u01/my575/share
log-error=/u01/my575/log/alert.log
slow_query_log_file=/u01/my575/log/slow.log
general_log_file=/u01/my575/log/general.log
socket=/u01/my575/run/mysql.sock
core-file

innodb_doublewrite=1
innodb_stats_persistent=0
innodb_data_home_dir=/u01/my575/data
innodb_log_group_home_dir=/u01/my575/data
innodb_data_file_path = ibdata1:4G;ibdata2:16M:autoextend
innodb_buffer_pool_size=80G
innodb_buffer_pool_instances=8
innodb_log_files_in_group=4
innodb_log_file_size=1G
innodb_log_buffer_size=80M
innodb_flush_log_at_trx_commit=2
innodb_max_dirty_pages_pct_lwm=10
innodb_max_dirty_pages_pct=60
innodb_thread_concurrency=32
innodb_read_io_threads=8
innodb_write_io_threads=8
innodb_open_files=60000
innodb_file_format=Barracuda
innodb_file_per_table=1
innodb_flush_method=O_DIRECT
innodb_change_buffering=all
innodb_adaptive_flushing=ON
innodb_flush_neighbors=0
innodb_adaptive_hash_index=OFF
innodb_old_blocks_time=1000
innodb_stats_on_metadata=ON
innodb_read_ahead_threshold=0
innodb_use_native_aio=1
innodb_lock_wait_timeout=5
innodb_rollback_on_timeout=0
innodb_purge_threads=1
innodb_strict_mode=1
transaction-isolation=READ-COMMITTED

innodb_disable_sort_file_cache=ON
innodb_io_capacity_max=6000
innodb_io_capacity=2000

innodb_lru_scan_depth=8194

innodb_undo_tablespaces=20

metadata_locks_hash_instances=256
metadata_locks_cache_size=32768
innodb_sync_array_size=16
innodb_print_all_deadlocks

innodb_checksum_algorithm=CRC32

#myisam
myisam_sort_buffer_size=64M
concurrent_insert=2
delayed_insert_timeout=300

#binlog
log-bin=/u01/my575/log/mysql-bin
server_id=3017441834
binlog_cache_size=32K
max_binlog_cache_size=2G
max_binlog_size=500M
binlog-format=ROW
sync_binlog=1000
log-slave-updates=1
expire_logs_days=0

port=13406
skip-name-resolve
skip-ssl
max_connections=21000
max_user_connections=20200
max_connect_errors=65536
max_allowed_packet=128M
connect_timeout=8
net_read_timeout=30
net_write_timeout=60
back_log=1024

default-storage-engine=INNODB
character-set-server=utf8
lower_case_table_names=1
skip-external-locking
open_files_limit=65536
safe-user-create
local-infile=1
sql_mode="STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE"
performance_schema=ON
performance_schema_instrument='%=ON'

log_slow_admin_statements=1
long_query_time=1
slow_query_log=1
general_log=0
log_error_verbosity=3
query_cache_type=0
query_cache_limit=1M
query_cache_min_res_unit=1K

table_definition_cache=2000
table_open_cache_instances = 128
table_open_cache=16000

thread_stack=512K
thread_cache_size=256
read_rnd_buffer_size=128K
sort_buffer_size=256K
join_buffer_size=128K
read_buffer_size=128K

#replication
master-info-file=/u01/my575/log/master.info
relay-log=/u01/my575/log/relaylog
relay_log_info_file=/u01/my575/log/relay-log.info
relay-log-index=/u01/my575/log/mysqld-relay-bin.index
slave_load_tmpdir=/u01/my575/tmp
slave_type_conversions="ALL_NON_LOSSY"
slave_net_timeout=4
skip-slave-start
sync_master_info=1000
sync_relay_log_info=1000

rpl_stop_slave_timeout=300
slave_checkpoint_group=1024
slave_checkpoint_period=300
slave_parallel_workers=8
#slave_pr_mode=TABLE
slave_pending_jobs_size_max=1073741824
slave_rows_search_algorithms='TABLE_SCAN,INDEX_SCAN,HASH_SCAN'

slave_sql_verify_checksum=OFF
master_verify_checksum=OFF

gtid_mode=OFF
enforce_gtid_consistency=OFF
master_info_repository=TABLE
relay_log_info_repository=TABLE

#memcached plugin
daemon_memcached_option='-p13407'
daemon_memcached_enable_binlog=1
innodb_api_enable_binlog=1
innodb_api_enable_mdl=1
daemon_memcached_r_batch_size=5
daemon_memcached_w_batch_size=5
[22 Dec 2014 6:26] zhai weixiang
Please ignore skip-insecure-warning   :)
[20 Jan 2015 8:14] Umesh Shastry
Hello Zhai Weixiang ,

Thank you for the report.
I could not reproduce this issue at my end with provided conf file, and with test case on latest trunk build(5.7.6) .

//

// mysql cli session

mysql> select * from demo_test;
+-------+------+------+------+------+
| c1    | c2   | c3   | c4   | c5   |
+-------+------+------+------+------+
| test1 | t1   |   10 |    3 |    0 |
+-------+------+------+------+------+
1 row in set (0.00 sec)

mysql> delete from demo_test;
Query OK, 1 row affected (0.01 sec)

[root@cluster-repo ~]# telnet x.x.x.x 13407
Trying x.x.x.x...
Connected to x.x.x.x.
Escape character is '^]'.
set test1 10 0 2
t1
STORED
flush_all
OK
set test1 10 0 2
t1
STORED
flush_all
OK
flush_all
OK
flush_all
OK
flush_all
OK

Could you please tell us about your build whether it is official build from http://dev.mysql.com/downloads/mysql/ or you compiled yourself? if yes, then exact cmake build details to help us repeat this issue?

Thanks,
Umesh
[20 Jan 2015 8:16] Umesh Shastry
Also, you mentioned default settings but still could you please provide below query output? Just want to ensure whether data written through the memcached on disk (innodb_only, the default); to store the data in memory only, as in the traditional memcached (cache-only); or both (caching).

select * from innodb_memcache.cache_policies;
[20 Jan 2015 9:38] zhai weixiang
root@innodb_memcache 05:37:38>select * from cache_policies;
+--------------+-------------+-------------+---------------+--------------+
| policy_name  | get_policy  | set_policy  | delete_policy | flush_policy |
+--------------+-------------+-------------+---------------+--------------+
| cache_policy | innodb_only | innodb_only | innodb_only   | innodb_only  |
+--------------+-------------+-------------+---------------+--------------+
1 row in set (0.00 sec)

root@innodb_memcache 05:37:44>select * from config_options;
+---------------------+-------+
| name                | value |
+---------------------+-------+
| separator           | |     |
| table_map_delimiter | .     |
+---------------------+-------+
2 rows in set (0.00 sec)

root@innodb_memcache 05:37:49>select * from containers;
+------+-----------+-----------+-------------+---------------+-------+------------+--------------------+------------------------+
| name | db_schema | db_table  | key_columns | value_columns | flags | cas_column | expire_time_column | unique_idx_name_on_key |
+------+-----------+-----------+-------------+---------------+-------+------------+--------------------+------------------------+
| aaa  | test      | demo_test | c1          | c2            | c3    | c4         | c5                 | PRIMARY                |
+------+-----------+-----------+-------------+---------------+-------+------------+--------------------+------------------------+
1 row in set (0.00 sec)

cmake . \
    -DSYSCONFDIR="$dest_dir"                                 \
    -DCMAKE_INSTALL_PREFIX="$dest_dir"                       \
    -DMYSQL_DATADIR="$dest_dir/data"                         \
    -DENABLED_PROFILING=1                                    \
    -DWITH_EXTRA_CHARSETS=all                                \
    -DDEFAULT_CHARSET=utf8                                   \
    -DDEFAULT_COLLATION=utf8_general_ci                      \
    -DWITH_SSL=bundled                                       \
    -DWITH_ZLIB=bundled                                      \
    -DWITH_PARTITION_STORAGE_ENGINE=1                        \
    -DWITH_INNOBASE_STORAGE_ENGINE=1                         \
    -DWITH_ARCHIVE_STORAGE_ENGINE=1                          \
    -DWITH_BLACKHOLE_STORAGE_ENGINE=1                        \
    -DWITH_PERFSCHEMA_STORAGE_ENGINE=1                       \
    -DENABLED_LOCAL_INFILE=1                                 \
    -DWITH_EMBEDDED_SERVER=0                                 \
    -DINSTALL_LAYOUT=STANDALONE                              \
    -DCOMMUNITY_BUILD=1 -DWITH_INNODB_MEMCACHED=ON                                     \
    -DMYSQL_SERVER_SUFFIX="$server_suffix" -DWITH_BOOST=../
[20 Jan 2015 9:39] zhai weixiang
I can still repeat the crash.. Let me know if you need more information :)
[20 Jan 2015 12:44] Umesh Shastry
Thank you for the details, Zhai.
I tried it but with no luck(will share my steps just in case am missing anything).. so, will check internally and shall get back to you if more info required.
[20 Jan 2015 12:44] Umesh Shastry
test results

Attachment: 75199.txt (text/plain), 244.46 KiB.

[20 Jan 2015 15:24] zhai weixiang
Funny !  I can't repeat the crash if debug is enabled!!
[20 Jan 2015 18:52] Shane Bester
it appears you cannot run "flush_all" concurrently with any selects, else you end up with missing table and crash:

InnoDB: Error: trying to access tablespace 3891 page no. 3,
InnoDB: but the tablespace does not exist or is just being dropped.
InnoDB: Error: trying to access tablespace 3891 page no. 3,
InnoDB: but the tablespace does not exist or is just being dropped.
InnoDB: Error: Unable to read tablespace 3891 page no 3 into the buffer pool after 100 attempts

InnoDB: Assertion failure in thread 140736594200320 in file buf0buf.cc line 2646
[21 Jan 2015 1:55] zhai weixiang
Aha, Shane, It seems you found another bug !! :-)
[9 Apr 2015 14:07] Umesh Shastry
Thank you Shane, Zhai.
I could not repeat this with 5.7.7. Also, surely fixed after Bug #72435 and per change log:

 When calling the "memcached" "flush_all" command, "InnoDB" attempts to
initialize a connection and a transaction. If the transaction is in
"TRX_STATE_NOT_STARTED" state, "InnoDB" would fail to set
"CONN_DATA->CRSR_TRX" to NULL, resulting in a serious error.

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thanks,
Umesh
[16 Apr 2015 8:09] zhai weixiang
Hi, Umesh, I guess I found the root cause.  It seems assert is not defined on my platform. I think this  is depending on the behavior of compiling order.

I removed the assert and eveything works well.

diff --git a/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c b/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c
index 46fce0c..30c2490 100644
--- a/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c
+++ b/plugin/innodb_memcached/innodb_memcache/src/innodb_engine.c
@@ -810,9 +810,9 @@ have_conn:
                } else {
                        /* Write cursor transaction exists.
                           Reuse this transaction.*/
-                       assert(ib_cb_trx_start(conn_data->crsr_trx,
+                       ib_cb_trx_start(conn_data->crsr_trx,
                                               engine->trx_level,
-                                              true, false, NULL));
+                                              true, false, NULL);
                }

                err = innodb_api_begin(
[16 Apr 2015 8:10] zhai weixiang
oh, oh ...only two option: Can't repeat and Closed ..... How can I reopen this bug ?
[16 Apr 2015 8:19] Umesh Shastry
Hello Zhai,

I think you can change status to "open" from the status select box.
If I understood correctly, this is no longer repeatable now after removing assert. Please feel free to add info which would help us trigger this issue in newer builds. or you agree to close the bug for now? Please let us know.

Thanks,
Umesh
[16 Apr 2015 10:16] zhai weixiang
I have run  the case  on two production machine  and both crashed. But since you can't repeat the crash ,  I believe this is platform specified bug.

The root cause is that assert is not defined.so the function ib_cb_trx_start inside assert is not called which should start the trx and set trx->state to active.

Quoted from the manual page : http://man7.org/linux/man-pages/man3/assert.3.html

If the macro NDEBUG was defined at the moment <assert.h> was last
       included, the macro assert() generates no code, and hence does
       nothing at all.  Otherwise, the macro assert() prints an error
       message to standard error and terminates the program by calling
       abort(3) if expression is false (i.e., compares equal to zero).

       The purpose of this macro is to help programmers find bugs in their
       programs.  The message "assertion failed in file foo.c, function
       do_bar(), line 1287" is of no help at all to a user.

Besides, generally we should call the function first and then assert the return value. :)

To help you repeat the case, a little more information:

$uname -a
Linux $HOST 2.6.32-220.23.2.ali878.el6.x86_64 #1 SMP Mon Jan 28 17:12:52 CST 2013 x86_64 x86_64 x86_64 GNU/Linux

Compile flagļ¼š

  COMMON_FLAGS="-O3 -g -fexceptions -fno-omit-frame-pointer -fno-strict-aliasing"
  CFLAGS="$COMMON_FLAGS"
  CXXFLAGS="$COMMON_FLAGS"
  CC=/opt/rh/devtoolset-2/root/usr/bin/gcc
  CXX=/opt/rh/devtoolset-2/root/usr/bin/g++
  export CC CFLAGS CXX CXXFLAGS

gcc/g++ version
opt/rh/devtoolset-2/root/usr/bin/gcc --version
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)

$/opt/rh/devtoolset-2/root/usr/bin/g++ --version
g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
[11 Jun 2015 13:25] Umesh Shastry
Thank you Zhai.
Verified as described.

Thanks,
Umesh
[11 Jun 2015 13:36] Umesh Shastry
Confirmed 5.7.7 build is affected
[11 Jun 2015 13:37] Umesh Shastry
test results

Attachment: 75199.txt (text/plain), 291.29 KiB.

[15 Jul 2015 9:29] Aditya A
Hi,

The crash reproduced by umesh is a different bug ,caused by a bogus assertion which is fixed as part of Bug#75790. So the issue reported still not reproducible.
[17 Jul 2015 1:59] zhai weixiang
Hi, Aditya, I can always repeat the crash .... 
OK, Let's put aside the test case.  But according to the manual,  the behavior of assert function is not fixed,  I think calling a important function that may affect the code logic  inside assert() is not a correct code style.
[17 Jul 2015 4:25] Aditya A
I completely agree with you. Will make a patch to remove the function from assert
and assert on return value .
[6 Aug 2015 17:08] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.6.27, 5.7.9, 5.8.0 releases, and here's the changelog entry:

A memcached flush_all command raised an assertion. A function that starts
a transaction was called from within assertion code. 

Thank you for the bug report.