Bug #82997 Online DDL fails with
Submitted: 14 Sep 2016 17:18 Modified: 29 Jan 12:11
Reporter: monty solomon Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7.13 OS:CentOS
Assigned to: Shane Bester CPU Architecture:Any

[14 Sep 2016 17:18] monty solomon
Description:
mysql reported Error_code 1034

Error 'Incorrect key file for table 'users'; try to repair it' 

during an online DDL ALTER statement

ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE

How to repeat:
Execute online DDL statement

ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE
[14 Sep 2016 17:22] monty solomon
A subsequent attempt to run the ALTER succeeded on the master but failed on all of the slaves with different errors.

On slave 1 the error was "Duplicate entry" followed by "Incorrect key file for table" when the stopped slave was started

2016-09-14T16:16:44.282633Z 11418 [ERROR] Slave SQL for channel '': Error 'Duplicate entry '0--NULL' for key ' on query. Default database: ''. Query: 'ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE', Error_code: 1062
2016-09-14T16:16:44.283557Z 11418 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin.003767' position 14241742

2016-09-14T16:42:28.855311Z 1535085 [ERROR] Slave SQL for channel '': Error 'Incorrect key file for table 'users'; try to repair it' on query. Default database: ''. Query: 'ALTER TABLE users modify password varchar(255) CHARACTER SET ascii, ALGORITHM=INPLACE, LOCK=NONE', Error_code: 1034
2016-09-14T16:42:28.855346Z 1535085 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin.003767' position 14241742

Repeated attempts to start the slave kept failing

After restarting mysql the ALTER statement completed.
[14 Sep 2016 21:47] monty solomon
The master and slaves were all writing errors to the logs after the ALTER statement completed

[ERROR] InnoDB: Record in index `` of table ``.`users` was not found on update: TUPLE (info_bits=0, 5 fields): 

They were also writing some notes about GIS -- for example,

2016-09-14T16:10:46.406696Z 4505445 [Note] InnoDB: GIS MBR INFO: 5.30386e+180 and 6.24019e-85, 3.54341e-317, 0
2016-09-14T16:11:13.512741Z 4505656 [Note] InnoDB: GIS MBR INFO: 8.91102e+252 and 1.38498e+219, 0, 0

I manually dropped the index and created it on each slave to see if there would be any issues doing that. That procedure completed successfully on each slave and the error messages stopped on the slaves.

When I tried the same process on the master it crashed.

/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xec501c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x79d141]
/lib64/libpthread.so.0(+0xf7e0)[0x7fd8a416a7e0]
/lib64/libc.so.6(memcpy+0x15b)[0x7fd8a23fb7fb]
/usr/sbin/mysqld(_Z26rec_convert_dtuple_to_tempPhPK12dict_index_tPK8dfield_tmPK8dtuple_t+0x29a)[0x106983a]
/usr/sbin/mysqld(_Z17row_log_online_opP12dict_index_tPK8dtuple_tm+0x2ad)[0x76900b]
/usr/sbin/mysqld[0x12830e1]
/usr/sbin/mysqld(_Z12row_undo_insP11undo_node_tP9que_thr_t+0x22b)[0x128418b]
/usr/sbin/mysqld(_Z13row_undo_stepP9que_thr_t+0x405)[0x10cf935]
/usr/sbin/mysqld(_Z15que_run_threadsP9que_thr_t+0x874)[0x1061064]
/usr/sbin/mysqld[0x112327b]
/usr/sbin/mysqld(_Z25trx_rollback_to_savepointP5trx_tP12trx_savept_t+0x22)[0x11234f2]
/usr/sbin/mysqld(_Z23row_mysql_handle_errorsP7dberr_tP5trx_tP9que_thr_tP12trx_savept_t+0x7d)[0x109901d]
/usr/sbin/mysqld[0x10997fb]
/usr/sbin/mysqld(_ZN11ha_innobase9write_rowEPh+0x1d1)[0xfb7b01]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x10d)[0x80453d]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+0x19b)[0xe10feb]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert12mysql_insertEP3THDP10TABLE_LIST+0x97a)[0xe1228a]
/usr/sbin/mysqld(_ZN14Sql_cmd_insert7executeEP3THD+0xc2)[0xe12a32]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x14af)[0xca43ff]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x5d5)[0xcaa675]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x92f)[0xcab02f]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1b7)[0xcac9c7]
/usr/sbin/mysqld(handle_connection+0x2a8)[0xd6de28]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xedc844]
/lib64/libpthread.so.0(+0x7aa1)[0x7fd8a4162aa1]
/lib64/libc.so.6(clone+0x6d)[0x7fd8a245a93d]
[14 Sep 2016 21:50] monty solomon
The table uses compression and utf8.

ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
[16 Jan 2017 15:25] Sinisa Milivojevic
In order to have reliable functioning with key_block_size, you must set on your master and all of your slaves, innodb_strict_mode to ON.

Only after tables are created with this mode can we proceed on the problem.
[16 Jan 2017 21:58] monty solomon
We use

innodb_strict_mode = 1
sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY
[17 Jan 2017 14:43] Sinisa Milivojevic
Thank you very much for the info provided.

This seems to be a bug, possibly a dangerous one. Hence, we need  a minimal possible test case which would reproduce this bug. It is enough to reproduce it on master. Your config would also be welcome.

You can put it all into the file and upload it to this bug.
[18 Jan 2017 8:06] monty solomon
This bug was submitted in September and we have not seen a repeat of that error since then.

I don't know how I would create a possible test case which would reproduce this bug.

Can the stack trace help you isolate/determine the cause the failure?

Thanks.
[18 Jan 2017 15:26] Sinisa Milivojevic
Hi,

We have looked carefully at the stacktrace and, unfortunately, it is far from enough.

We must be able to repeat the bug, because we need to inspect every global, local and object variable in each of the stack levels.

You know, in the vast majority of cases, problem has originated in some place that is not at all in the stacktrace.
[18 Jan 2017 21:47] monty solomon
I'm not sure that I have any useful data for you now.

I kept the servers around for a while after submitting this bug in case you needed/wanted more info but the servers were terminated.
[19 Jan 2017 5:20] Shane Bester
what we need at a very least is: SHOW CREATE TABLE
output from before you sent the alter statement.
[19 Jan 2017 18:14] Shane Bester
When running the ALTER against the given table, I only get errors;

ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

So it is not clear what happened in your situation.  
If you see further similar problems in 5.7.17+ please report new bug.

Thanks!
[19 Jan 2017 20:08] monty solomon
We were using that statement to widen the varchar for the column.

Execute the following statement to change the schema so the INPLACE ALTER won't fail.

ALTER TABLE users MODIFY password varchar(40) CHARACTER SET ascii;
[20 Jan 2017 4:52] Shane Bester
With some random DML running I managed to hit a problem on 5.7.13.
Next step for me is to test current version and make a suitable testcase.

Version: '5.7.13'  socket: ''  port: 3306  MySQL Community Server (GPL)
[ERROR] InnoDB: Record in index `ed` of table `test`.`users` was n
[Note] InnoDB: GIS MBR INFO: 1.20768e-153 and 4.76881e-038, 7.0436
InnoDB: Assertion failure in thread 2384 in file row0ins.cc line 282
InnoDB: Failing assertion: !cursor->index->is_committed()

mysqld.exe!my_sigabrt_handler()[my_thr_init.c:449]
mysqld.exe!raise()[winsig.c:587]
mysqld.exe!abort()[abort.c:82]
mysqld.exe!ut_dbg_assertion_failed()[ut0dbg.cc:67]
mysqld.exe!row_ins_sec_index_entry_by_modify()[row0ins.cc:284]
mysqld.exe!row_ins_sec_index_entry_low()[row0ins.cc:2989]
mysqld.exe!row_ins_sec_index_entry()[row0ins.cc:3248]
mysqld.exe!row_ins_index_entry_step()[row0ins.cc:3448]
mysqld.exe!row_ins()[row0ins.cc:3586]
mysqld.exe!row_ins_step()[row0ins.cc:3772]
mysqld.exe!row_insert_for_mysql_using_ins_graph()[row0mysql.cc:1734]
mysqld.exe!ha_innobase::write_row()[ha_innodb.cc:7486]
mysqld.exe!handler::ha_write_row()[handler.cc:7890]
mysqld.exe!write_record()[sql_insert.cc:1860]
mysqld.exe!Sql_cmd_insert::mysql_insert()[sql_insert.cc:781]
mysqld.exe!Sql_cmd_insert::execute()[sql_insert.cc:3092]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3521]
mysqld.exe!mysql_parse()[sql_parse.cc:5531]
mysqld.exe!dispatch_command()[sql_parse.cc:1432]
mysqld.exe!do_command()[sql_parse.cc:999]
mysqld.exe!handle_connection()[connection_handler_per_thread.cc:301]
mysqld.exe!pfs_spawn_thread()[pfs.cc:2191]
mysqld.exe!win_thread_start()[my_thread.c:38]
mysqld.exe!_callthreadstartex()[threadex.c:376]
[20 Jan 2017 5:54] Shane Bester
On current source my test shows:

Version: '5.7.18'  socket: ''  port: 3306  (Built on 2017/01/20)
[ERROR] InnoDB: Record in index `ed` of table `test`.`users`
[Note] InnoDB: GIS MBR INFO: 0 and 0, 0, 0

mysqld.exe!memcpy()[memcpy.asm:135]
mysqld.exe!mem_heap_dup()[mem0mem.cc:60]
mysqld.exe!innobase_get_computed_value()[ha_innodb.cc:20831]
mysqld.exe!row_upd_store_v_row()[row0upd.cc:1932]
mysqld.exe!row_upd_store_row()[row0upd.cc:2049]
mysqld.exe!row_upd_del_mark_clust_rec()[row0upd.cc:2745]
mysqld.exe!row_upd_clust_step()[row0upd.cc:2899]
mysqld.exe!row_upd()[row0upd.cc:3012]
mysqld.exe!row_upd_step()[row0upd.cc:3161]
mysqld.exe!row_update_for_mysql_using_upd_graph()[row0mysql.cc:2528]
mysqld.exe!ha_innobase::delete_row()[ha_innodb.cc:8310]
mysqld.exe!handler::ha_delete_row()[handler.cc:8050]
mysqld.exe!Sql_cmd_delete::mysql_delete()[sql_delete.cc:468]
mysqld.exe!Sql_cmd_delete::execute()[sql_delete.cc:1382]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3608]
mysqld.exe!mysql_parse()[sql_parse.cc:5618]
mysqld.exe!dispatch_command()[sql_parse.cc:1463]
mysqld.exe!do_command()[sql_parse.cc:1000]
mysqld.exe!handle_connection()[connection_handler_per_thread.cc:300]
mysqld.exe!pfs_spawn_thread()[pfs.cc:2191]
mysqld.exe!win_thread_start()[my_thread.c:38]

The problem is the indexed generated column. Remove that, and things should be more stable.
[20 Jan 2017 6:03] Shane Bester
tentatively verifying as duplicate of internal:

Bug 23573575 - GCOLS: SIMPLE DML, ASSERTION FAILURE: !CURSOR->INDEX->IS_COMMITTED()
[28 Jan 18:40] Valeriy Kravchuk
Is it really a duplicate? https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-19.html says:

"If a REPLACE statement tried to update a row in a table containing a virtual generated column of type BLOB, subsequent DML statements could behave incorrectly. (Bug #23573575)"

This refers to the bug number Shane mentioned, but description sounds different from what we see here in public comments (I fail to find any mention of generated column in any comment by the original bug reporter, for example).

Please, clarify.
[29 Jan 12:11] Shane Bester
Hm, now I cannot repeat the same thing on 5.7.21 anyway. So it is unsure if some other fix along the way helped.  If we see this occur again, will reopen the bug.