MySQL Bugs: #82997: Online DDL fails with

Bug #82997	Online DDL fails with
Submitted:	14 Sep 2016 17:18	Modified:	29 Jan 2018 12:11
Reporter:	monty solomon	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.7.13	OS:	CentOS
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
mysql reported Error_code 1034

Error 'Incorrect key file for table 'users'; try to repair it' 

during an online DDL ALTER statement

ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE

How to repeat:
Execute online DDL statement

ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE

A subsequent attempt to run the ALTER succeeded on the master but failed on all of the slaves with different errors.

On slave 1 the error was "Duplicate entry" followed by "Incorrect key file for table" when the stopped slave was started

2016-09-14T16:16:44.282633Z 11418 [ERROR] Slave SQL for channel '': Error 'Duplicate entry '0--NULL' for key ' on query. Default database: ''. Query: 'ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE', Error_code: 1062
2016-09-14T16:16:44.283557Z 11418 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin.003767' position 14241742

2016-09-14T16:42:28.855311Z 1535085 [ERROR] Slave SQL for channel '': Error 'Incorrect key file for table 'users'; try to repair it' on query. Default database: ''. Query: 'ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE', Error_code: 1034
2016-09-14T16:42:28.855346Z 1535085 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin.003767' position 14241742

Repeated attempts to start the slave kept failing

After restarting mysql the ALTER statement completed.

The master and slaves were all writing errors to the logs after the ALTER statement completed

[ERROR] InnoDB: Record in index `` of table ``.`users` was not found on update: TUPLE (info_bits=0, 5 fields): 

They were also writing some notes about GIS -- for example,

2016-09-14T16:10:46.406696Z 4505445 [Note] InnoDB: GIS MBR INFO: 5.30386e+180 and 6.24019e-85, 3.54341e-317, 0
2016-09-14T16:11:13.512741Z 4505656 [Note] InnoDB: GIS MBR INFO: 8.91102e+252 and 1.38498e+219, 0, 0

I manually dropped the index and created it on each slave to see if there would be any issues doing that. That procedure completed successfully on each slave and the error messages stopped on the slaves.

When I tried the same process on the master it crashed.

/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xec501c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x79d141]
/lib64/libpthread.so.0(+0xf7e0)[0x7fd8a416a7e0]
/lib64/libc.so.6(memcpy+0x15b)[0x7fd8a23fb7fb]
/usr/sbin/mysqld(_Z26rec_convert_dtuple_to_tempPhPK12dict_index_tPK8dfield_tmPK8dtuple_t+0x29a)[0x106983a]
/usr/sbin/mysqld(_Z17row_log_online_opP12dict_index_tPK8dtuple_tm+0x2ad)[0x76900b]
/usr/sbin/mysqld[0x12830e1]
/usr/sbin/mysqld(_Z12row_undo_insP11undo_node_tP9que_thr_t+0x22b)[0x128418b]
/usr/sbin/mysqld(_Z13row_undo_stepP9que_thr_t+0x405)[0x10cf935]
/usr/sbin/mysqld(_Z15que_run_threadsP9que_thr_t+0x874)[0x1061064]
/usr/sbin/mysqld[0x112327b]
/usr/sbin/mysqld(_Z25trx_rollback_to_savepointP5trx_tP12trx_savept_t+0x22)[0x11234f2]
/usr/sbin/mysqld(_Z23row_mysql_handle_errorsP7dberr_tP5trx_tP9que_thr_tP12trx_savept_t+0x7d)[0x109901d]
/usr/sbin/mysqld[0x10997fb]
/usr/sbin/mysqld(_ZN11ha_innobase9write_rowEPh+0x1d1)[0xfb7b01]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x10d)[0x80453d]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+0x19b)[0xe10feb]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert12mysql_insertEP3THDP10TABLE_LIST+0x97a)[0xe1228a]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert7executeEP3THD+0xc2)[0xe12a32]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x14af)[0xca43ff]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x5d5)[0xcaa675]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x92f)[0xcab02f]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1b7)[0xcac9c7]
/usr/sbin/mysqld(handle_connection+0x2a8)[0xd6de28]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xedc844]
/lib64/libpthread.so.0(+0x7aa1)[0x7fd8a4162aa1]
/lib64/libc.so.6(clone+0x6d)[0x7fd8a245a93d]

The table uses compression and utf8.

ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

In order to have reliable functioning with key_block_size, you must set on your master and all of your slaves, innodb_strict_mode to ON.

Only after tables are created with this mode can we proceed on the problem.

We use

innodb_strict_mode = 1
sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY

Thank you very much for the info provided.

This seems to be a bug, possibly a dangerous one. Hence, we need  a minimal possible test case which would reproduce this bug. It is enough to reproduce it on master. Your config would also be welcome.

You can put it all into the file and upload it to this bug.

This bug was submitted in September and we have not seen a repeat of that error since then.

I don't know how I would create a possible test case which would reproduce this bug.

Can the stack trace help you isolate/determine the cause the failure?

Thanks.

Hi,

We have looked carefully at the stacktrace and, unfortunately, it is far from enough.

We must be able to repeat the bug, because we need to inspect every global, local and object variable in each of the stack levels.

You know, in the vast majority of cases, problem has originated in some place that is not at all in the stacktrace.

I'm not sure that I have any useful data for you now.

I kept the servers around for a while after submitting this bug in case you needed/wanted more info but the servers were terminated.

what we need at a very least is: SHOW CREATE TABLE
output from before you sent the alter statement.

When running the ALTER against the given table, I only get errors;

ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

So it is not clear what happened in your situation.  
If you see further similar problems in 5.7.17+ please report new bug.

Thanks!

We were using that statement to widen the varchar for the column.

Execute the following statement to change the schema so the INPLACE ALTER won't fail.

ALTER TABLE users MODIFY password varchar(40) CHARACTER SET ascii;

With some random DML running I managed to hit a problem on 5.7.13.
Next step for me is to test current version and make a suitable testcase.

Version: '5.7.13'  socket: ''  port: 3306  MySQL Community Server (GPL)
[ERROR] InnoDB: Record in index `ed` of table `test`.`users` was n
[Note] InnoDB: GIS MBR INFO: 1.20768e-153 and 4.76881e-038, 7.0436
InnoDB: Assertion failure in thread 2384 in file row0ins.cc line 282
InnoDB: Failing assertion: !cursor->index->is_committed()

mysqld.exe!my_sigabrt_handler()[my_thr_init.c:449]
mysqld.exe!raise()[winsig.c:587]
mysqld.exe!abort()[abort.c:82]
mysqld.exe!ut_dbg_assertion_failed()[ut0dbg.cc:67]
mysqld.exe!row_ins_sec_index_entry_by_modify()[row0ins.cc:284]
mysqld.exe!row_ins_sec_index_entry_low()[row0ins.cc:2989]
mysqld.exe!row_ins_sec_index_entry()[row0ins.cc:3248]
mysqld.exe!row_ins_index_entry_step()[row0ins.cc:3448]
mysqld.exe!row_ins()[row0ins.cc:3586]
mysqld.exe!row_ins_step()[row0ins.cc:3772]
mysqld.exe!row_insert_for_mysql_using_ins_graph()[row0mysql.cc:1734]
mysqld.exe!ha_innobase::write_row()[ha_innodb.cc:7486]
mysqld.exe!handler::ha_write_row()[handler.cc:7890]
mysqld.exe!write_record()[sql_insert.cc:1860]
mysqld.exe!Sql_cmd_insert::mysql_insert()[sql_insert.cc:781]
mysqld.exe!Sql_cmd_insert::execute()[sql_insert.cc:3092]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3521]
mysqld.exe!mysql_parse()[sql_parse.cc:5531]
mysqld.exe!dispatch_command()[sql_parse.cc:1432]
mysqld.exe!do_command()[sql_parse.cc:999]
mysqld.exe!handle_connection()[connection_handler_per_thread.cc:301]
mysqld.exe!pfs_spawn_thread()[pfs.cc:2191]
mysqld.exe!win_thread_start()[my_thread.c:38]
mysqld.exe!_callthreadstartex()[threadex.c:376]

On current source my test shows:

Version: '5.7.18'  socket: ''  port: 3306  (Built on 2017/01/20)
[ERROR] InnoDB: Record in index `ed` of table `test`.`users`
[Note] InnoDB: GIS MBR INFO: 0 and 0, 0, 0

mysqld.exe!memcpy()[memcpy.asm:135]
mysqld.exe!mem_heap_dup()[mem0mem.cc:60]
mysqld.exe!innobase_get_computed_value()[ha_innodb.cc:20831]
mysqld.exe!row_upd_store_v_row()[row0upd.cc:1932]
mysqld.exe!row_upd_store_row()[row0upd.cc:2049]
mysqld.exe!row_upd_del_mark_clust_rec()[row0upd.cc:2745]
mysqld.exe!row_upd_clust_step()[row0upd.cc:2899]
mysqld.exe!row_upd()[row0upd.cc:3012]
mysqld.exe!row_upd_step()[row0upd.cc:3161]
mysqld.exe!row_update_for_mysql_using_upd_graph()[row0mysql.cc:2528]
mysqld.exe!ha_innobase::delete_row()[ha_innodb.cc:8310]
mysqld.exe!handler::ha_delete_row()[handler.cc:8050]
mysqld.exe!Sql_cmd_delete::mysql_delete()[sql_delete.cc:468]
mysqld.exe!Sql_cmd_delete::execute()[sql_delete.cc:1382]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3608]
mysqld.exe!mysql_parse()[sql_parse.cc:5618]
mysqld.exe!dispatch_command()[sql_parse.cc:1463]
mysqld.exe!do_command()[sql_parse.cc:1000]
mysqld.exe!handle_connection()[connection_handler_per_thread.cc:300]
mysqld.exe!pfs_spawn_thread()[pfs.cc:2191]
mysqld.exe!win_thread_start()[my_thread.c:38]

The problem is the indexed generated column. Remove that, and things should be more stable.

tentatively verifying as duplicate of internal:

Bug 23573575 - GCOLS: SIMPLE DML, ASSERTION FAILURE: !CURSOR->INDEX->IS_COMMITTED()

Is it really a duplicate? https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-19.html says:

"If a REPLACE statement tried to update a row in a table containing a virtual generated column of type BLOB, subsequent DML statements could behave incorrectly. (Bug #23573575)"

This refers to the bug number Shane mentioned, but description sounds different from what we see here in public comments (I fail to find any mention of generated column in any comment by the original bug reporter, for example).

Please, clarify.

Hm, now I cannot repeat the same thing on 5.7.21 anyway. So it is unsure if some other fix along the way helped.  If we see this occur again, will reopen the bug.