MySQL Bugs: #87850: [ERROR]Space id in fsp header but in the page header

Bug #87850	[ERROR]Space id in fsp header but in the page header
Submitted:	23 Sep 2017 15:47	Modified:	14 Apr 2020 12:40
Reporter:	bo wangbo	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.6.34	OS:	CentOS (6.8)
Assigned to:		CPU Architecture:	Any
Tags:	Space id in fsp header but in the page header

Description:
2017-09-22 03:37:43 166408 [Note] InnoDB: Log scan progressed past the checkpoint lsn 4014227018962
2017-09-22 03:37:43 166408 [Note] InnoDB: Database was not shutdown normally!
2017-09-22 03:37:43 166408 [Note] InnoDB: Starting crash recovery.
2017-09-22 03:37:43 166408 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-22 03:37:43 166408 [ERROR] InnoDB: Space id in fsp header 1416128883,but in the page header 824195850
19:37:43 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

How to repeat:
Cannot reappear. Two times on different servers.

Hi,

This means one of the *.ibd files or the ibdata itself has gotten its header overwritten with a part of the mysql error log during a previous crash.   

Check if any of the *.ibd files in the datadir starts with the string "This"? 

Then upload that particular .ibd and .frm file here for inspection, as well as the complete mysql error log.

170922 04:08:37 mysqld_safe Starting mysqld daemon with databases from /opt/app/mysql5/var
2017-09-22 04:08:37 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2017-09-22 04:08:37 0 [Note] /opt/app/mysql5/bin/mysqld (mysqld 5.6.34-log) starting as process 167464 ...
2017-09-22 04:08:37 167464 [Note] Plugin 'FEDERATED' is disabled.
2017-09-22 04:08:37 167464 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-09-22 04:08:37 167464 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-22 04:08:37 167464 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-22 04:08:37 167464 [Note] InnoDB: Memory barrier is not used
2017-09-22 04:08:37 167464 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-09-22 04:08:37 167464 [Note] InnoDB: Using CPU crc32 instructions
2017-09-22 04:08:37 167464 [Note] InnoDB: Initializing buffer pool, size = 4.0G
2017-09-22 04:08:37 167464 [Note] InnoDB: Completed initialization of buffer pool
2017-09-22 04:08:37 167464 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-22 04:08:37 167464 [Note] InnoDB: Log scan progressed past the checkpoint lsn 4014227018962
2017-09-22 04:08:37 167464 [Note] InnoDB: Database was not shutdown normally!
2017-09-22 04:08:37 167464 [Note] InnoDB: Starting crash recovery.
2017-09-22 04:08:37 167464 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-22 04:08:37 167464 [ERROR] InnoDB: Space id in fsp header 1416128883,but in the page header 824195850
20:08:37 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=536870912
read_buffer_size=16777216
max_used_connections=0
max_threads=1024
thread_count=0
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 84423448 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/app/mysql5/bin/mysqld(my_print_stacktrace+0x35)[0x8eebe5]
/opt/app/mysql5/bin/mysqld(handle_fatal_signal+0x41b)[0x6556ab]
/lib64/libpthread.so.0[0x3ff300f7e0]
/opt/app/mysql5/bin/mysqld[0xb3366b]
/opt/app/mysql5/bin/mysqld[0x9a240d]
/opt/app/mysql5/bin/mysqld[0xa49e6b]
/opt/app/mysql5/bin/mysqld[0xa89e06]
/opt/app/mysql5/bin/mysqld[0xa8c0ab]
/opt/app/mysql5/bin/mysqld[0xa908c7]
/opt/app/mysql5/bin/mysqld[0xa95502]
/opt/app/mysql5/bin/mysqld[0x9844ec]
/opt/app/mysql5/bin/mysqld[0x989bab]
/opt/app/mysql5/bin/mysqld[0x9fe679]
/opt/app/mysql5/bin/mysqld[0x94acd7]
/opt/app/mysql5/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x590798]
/opt/app/mysql5/bin/mysqld[0x6e8716]
/opt/app/mysql5/bin/mysqld(_Z11plugin_initPiPPci+0xb3e)[0x6ebf2e]
/opt/app/mysql5/bin/mysqld[0x583b85]
/opt/app/mysql5/bin/mysqld(_Z11mysqld_mainiPPc+0x4d8)[0x588c28]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3ff2c1ed1d]
/opt/app/mysql5/bin/mysqld[0x57a8c9]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170922 04:08:37 mysqld_safe mysqld from pid file /opt/app/mysql5/var/mysql.pid ended

Hi,

What you report is a situation that we have seen before. It is either a consequence of the computer crash or  memory glitch or bad RAM module, bad controller or disk cache or wrong disk media. It manifests itself in a manner that one of the *.ibd files or the ibdata itself has gotten its header overwritten with a part of the mysql error log during a previous crash.   

Check if any of the *.ibd files in the datadir starts with the string "This"? 

Use hex editor for it. If you can not find a text at the start of some .ibd file, then see if any of the headers is wrong.

After that, upload that particular .ibd and .frm file here for inspection, as well as the complete mysql error log.

Hi.
Is actually a ibd file is corrupted, I do not understand why *.ibd file is corrupted.
We use the dell PowerEdge R430 server, the same problem occurs in two different servers.
Two servers at the same time the probability of hardware in question is not high.

Here is the process crashes, all error logs：

Version: '5.6.34-log'  socket: '/opt/app/mysql5/var/mysql.sock'  port: 3306  Source distribution
/opt/app/mysql5/bin/mysqld(my_print_stacktrace+0x35)[0x8eebe5]
/opt/app/mysql5/bin/mysqld(handle_fatal_signal+0x41b)[0x6556ab]
/lib64/libpthread.so.0[0x3ff300f7e0]
/opt/app/mysql5/bin/mysqld(my_hash_delete+0x3e5)[0x8d98b5]
/opt/app/mysql5/bin/mysqld[0x696cd5]
/opt/app/mysql5/bin/mysqld(_Z18query_cache_insertPKcmj+0x32f)[0x6971bf]
/opt/app/mysql5/bin/mysqld(net_write_packet+0x2a)[0x64992a]
                                                                                                                    695,1          8%
/opt/app/mysql5/bin/mysqld(_Z18query_cache_insertPKcmj+0x32f)[0x6971bf]
/opt/app/mysql5/bin/mysqld(net_write_packet+0x2a)[0x64992a]
/opt/app/mysql5/bin/mysqld(net_flush+0x23)[0x649df3]
/opt/app/mysql5/bin/mysqld(_ZN8Protocol8send_eofEjj+0xb3)[0x64e5d3]
/opt/app/mysql5/bin/mysqld(_ZN8Protocol13end_statementEv+0xad)[0x64d42d]
/opt/app/mysql5/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x21c)[0x6e128c]
/opt/app/mysql5/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x10d)[0x6a953d]
/opt/app/mysql5/bin/mysqld(handle_one_connection+0x42)[0x6a9672]
/opt/app/mysql5/bin/mysqld(pfs_spawn_thread+0x12a)[0xb0f64a]
/lib64/libpthread.so.0[0x3ff3007aa1]
/lib64/libc.so.6(clone+0x6d)[0x3ff2ce8bcd]
170922 03:37:42 mysqld_safe Number of processes running now: 0
170922 03:37:42 mysqld_safe mysqld restarted
2017-09-22 03:37:42 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-09-22 03:37:42 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2017-09-22 03:37:42 0 [Note] /opt/app/mysql5/bin/mysqld (mysqld 5.6.34-log) starting as process 166408 ...
2017-09-22 03:37:42 166408 [Note] Plugin 'FEDERATED' is disabled.
2017-09-22 03:37:42 7f5b77946720 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2017-09-22 03:37:42 166408 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-09-22 03:37:42 166408 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-22 03:37:42 166408 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-22 03:37:42 166408 [Note] InnoDB: Memory barrier is not used
2017-09-22 03:37:42 166408 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-09-22 03:37:42 166408 [Note] InnoDB: Using CPU crc32 instructions
2017-09-22 03:37:42 166408 [Note] InnoDB: Initializing buffer pool, size = 4.0G
2017-09-22 03:37:43 166408 [Note] InnoDB: Completed initialization of buffer pool
2017-09-22 03:37:43 166408 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-22 03:37:43 166408 [Note] InnoDB: Log scan progressed past the checkpoint lsn 4014227018962
2017-09-22 03:37:43 166408 [Note] InnoDB: Database was not shutdown normally!
2017-09-22 03:37:43 166408 [Note] InnoDB: Starting crash recovery.
2017-09-22 03:37:43 166408 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-22 03:37:43 166408 [ERROR] InnoDB: Space id in fsp header 1416128883,but in the page header 824195850
19:37:43 UTC - mysqld got signal 11 ;
                                                                                                                    694,1          8%
2017-09-22 03:37:43 166408 [ERROR] InnoDB: Space id in fsp header 1416128883,but in the page header 824195850
19:37:43 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=536870912
read_buffer_size=16777216
max_used_connections=0
max_threads=1024
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 84423448 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/app/mysql5/bin/mysqld(my_print_stacktrace+0x35)[0x8eebe5]
/opt/app/mysql5/bin/mysqld(handle_fatal_signal+0x41b)[0x6556ab]
/lib64/libpthread.so.0[0x3ff300f7e0]
/opt/app/mysql5/bin/mysqld[0xb3366b]
/opt/app/mysql5/bin/mysqld[0x9a240d]
/opt/app/mysql5/bin/mysqld[0xa49e6b]
/opt/app/mysql5/bin/mysqld[0xa89e06]
/opt/app/mysql5/bin/mysqld[0xa8c0ab]
/opt/app/mysql5/bin/mysqld[0xa908c7]
/opt/app/mysql5/bin/mysqld[0xa95502]
/opt/app/mysql5/bin/mysqld[0x9844ec]
                                                                                                                    725,1          8%
/opt/app/mysql5/bin/mysqld[0xa95502]
/opt/app/mysql5/bin/mysqld[0x9844ec]
/opt/app/mysql5/bin/mysqld[0x989bab]
/opt/app/mysql5/bin/mysqld[0x9fe679]
/opt/app/mysql5/bin/mysqld[0x94acd7]
/opt/app/mysql5/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x590798]
/opt/app/mysql5/bin/mysqld[0x6e8716]
/opt/app/mysql5/bin/mysqld(_Z11plugin_initPiPPci+0xb3e)[0x6ebf2e]
/opt/app/mysql5/bin/mysqld[0x583b85]
/opt/app/mysql5/bin/mysqld(_Z11mysqld_mainiPPc+0x4d8)[0x588c28]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3ff2c1ed1d]
/opt/app/mysql5/bin/mysqld[0x57a8c9]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170922 03:37:43 mysqld_safe mysqld from pid file /opt/app/mysql5/var/mysql.pid ended

Hi!

Two different servers might have two different problems. First stack trace from your last comment comes from a server with memory problems.

You are using query cache that is already deprecated. Hence, put these lines in your my.cnf and restart the server:

query_cache_size = 0
query_cache_type = 0
query_cache_limit = 1M

under [mysqld] options header.

Two servers can have similar problems if you use quotas or if any of your path settings points to the partition with insufficient space.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

So the initial crash report about wrong spaceid.  That is a result of an earlier unrelated crash. A side-effect/fallout.   The issue is that the actual crash handling and stack trace printing is not too safe especially when multiple threads invoke crash handler at once..  

So it sometimes happens that a crash report gets written to the wrong location (or an another file is opened in STDERR's fileno).  

That is why I asked you about "This" string in ibd. 

mysql> select unhex(conv(1416128883,10,16)) a;
+------+
| a    |
+------+
| This |
+------+
1 row in set (0.00 sec)

The first crash looks to be query cache related.

These are all MySQL and OS bugs.  No bad memory here.  I've seen this same scenario play out 10-20 times but nobody is willing to try fixing it.
If it will be fixed then we have to disallow printing of stack traces in the error log because it uses unsafe API's).

I really wonder if --gdb option of mysqld should be used in production instead (and customer/user has to analyze core files all crashes).

Hello,
We have a similar problem:
- HW - The database is hosted  on Virtual machine
 - OS - OS is RHEL 6.6
 - Database - MySQL 5.6.21 standalone
The attached file "CH-EU-MYSQL.err" contains details.

Thank you four your help.

Hi,

There are no files attached ...

Hello,
Hop you are able to get attached log file from January 11th.

We were able to restore the database but the incident has happened again on 20th Feb 2018. This time were not able to take backup after starting database in recovery mode. So there was data loss this time.

Question: How to avoid this InnoDB issue that crash the database ?

2018/02/20 messages:

180220 08:15:57 mysqld_safe Starting mysqld daemon with databases from /data/mysql/5.6.21/data
2018-02-20 08:15:57 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
 2018-02-20 08:15:57 22291 [Note] Plugin 'FEDERATED' is disabled.
 2018-02-20 08:15:57 7f02dc992720 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
 2018-02-20 08:15:57 22291 [Note] InnoDB: Using atomics to ref count buffer pool pages
 2018-02-20 08:15:57 22291 [Note] InnoDB: The InnoDB memory heap is disabled
 2018-02-20 08:15:57 22291 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
 2018-02-20 08:15:57 22291 [Note] InnoDB: Memory barrier is not used
 2018-02-20 08:15:57 22291 [Note] InnoDB: Compressed tables use zlib 1.2.3
 2018-02-20 08:15:57 22291 [Note] InnoDB: Using Linux native AIO
 2018-02-20 08:15:57 22291 [Note] InnoDB: Using CPU crc32 instructions
 2018-02-20 08:15:57 22291 [Note] InnoDB: Initializing buffer pool, size = 16.0M
 2018-02-20 08:15:57 22291 [Note] InnoDB: Completed initialization of buffer pool
 2018-02-20 08:15:57 22291 [Note] InnoDB: Highest supported file format is Barracuda.
 2018-02-20 08:15:57 22291 [Note] InnoDB: Log scan progressed past the checkpoint lsn 30725967101
 2018-02-20 08:15:57 22291 [Note] InnoDB: Database was not shutdown normally!
 2018-02-20 08:15:57 22291 [Note] InnoDB: Starting crash recovery.
 2018-02-20 08:15:57 22291 [Note] InnoDB: Reading tablespace information from the .ibd files...
 2018-02-20 08:15:58 22291 [ERROR] InnoDB: Space id in fsp header 1416128883,but in the page header 824195850
 2018-02-20 08:15:58 22291 [ERROR] InnoDB: Tablespace is not sensible; Table: itop_db/ticket  Space ID: 18446744073709551615  Filepath: ./itop_db/ticket.ibd

2018-02-20 08:15:58 7f02dc992720  InnoDB: Operating system error number 2 in a file operation.
 InnoDB: The error means the system cannot find the path specified.
 InnoDB: If you are installing InnoDB, remember that you must create
 InnoDB: directories yourself, InnoDB does not create them.
 InnoDB: Error: could not open single-table tablespace file ./itop_db/ticket.ibd
 InnoDB: We do not continue the crash recovery, because the table may become
 InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
 InnoDB: To fix the problem and start mysqld:
 InnoDB: 1) If there is a permission problem in the file and mysqld cannot
 InnoDB: open the file, you should modify the permissions.
 InnoDB: 2) If the table is not needed, or you can restore it from a backup,
 InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
 InnoDB: crash recovery and ignore that table.
 InnoDB: 3) If the file system or the disk is broken, and you cannot remove
 InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
 InnoDB: and force InnoDB to continue crash recovery here.
 2018-02-20 08:15:58 22291 [Note] InnoDB: innodb_force_recovery was set to 1. Continuing crash recovery even though we cannot access the .ibd file of this table.
 2018-02-20 08:15:58 22291 [Note] InnoDB: Restoring possible half-written data pages
 2018-02-20 08:15:58 22291 [Note] InnoDB: from the doublewrite buffer...
 InnoDB: Doing recovery: scanned up to log sequence number 30726628610
 2018-02-20 08:15:58 22291 [Note] InnoDB: Starting an apply batch of log records to the database...
 InnoDB: Progress in percent: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 2018-02-20 08:15:58 7f02d4567700 InnoDB: Error: page 8 log sequence number 30979441243
 InnoDB: is in the future! Current system log sequence number 30726628610.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB
 InnoDB: tablespace but not the InnoDB log files. See
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
 InnoDB: for more information.
 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 2018-02-20 08:15:58 7f02d456770072  InnoDB: Error: page 12 log sequence number 30980474895
 InnoDB: is in the future! Current system log sequence number 30726628610.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB
 InnoDB: tablespace but not the InnoDB log files. See
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
 InnoDB: for more information.
 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
 InnoDB: Apply batch completed
 InnoDB: Last MySQL binlog file position 0 3234, file name mysql-bin.000032
 2018-02-20 08:15:59 7f02dc992720 InnoDB: Error: page 303 log sequence number 30980543696
 InnoDB: is in the future! Current system log sequence number 30726628610.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB
 InnoDB: tablespace but not the InnoDB log files. See
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
 InnoDB: for more information.
 2018-02-20 08:15:59 7f02dc992720 InnoDB: Error: page 389 log sequence number 30980543696
 InnoDB: is in the future! Current system log sequence number 30726628610.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB
 InnoDB: tablespace but not the InnoDB log files. See
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
 InnoDB: for more information.
 2018-02-20 08:15:59 22291 [ERROR] InnoDB: Table itop_db/ticket in the InnoDB data dictionary has tablespace id 1044, but tablespace with that id or name does not exist. Have you deleted or moved .ibd files? This may also be a table created with CREATE TEMPORARY TABLE whose .ibd and .frm files MySQL automatically removed, but the table still exists in the InnoDB internal data dictionary.
 InnoDB: Please refer to
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html
 InnoDB: for how to resolve the issue.
 2018-02-20 08:15:59 7f02dc992720 InnoDB: Error: page 360 log sequence number 30980491456
 InnoDB: is in the future! Current system log sequence number 30726628610.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB
 InnoDB: tablespace but not the InnoDB log files. See
 InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
 InnoDB: for more information.
 2018-02-20 08:15:59 22291 [Note] InnoDB: 128 rollback segment(s) are active.
 2018-02-20 08:15:59 22291 [Note] InnoDB: Waiting for purge to start
 2018-02-20 08:15:59 7f02b7fff700 InnoDB: Error: page 614 log sequence number 30834462685
 InnoDB: is in the future! Current system log sequence number 30726628620.
 InnoDB: Your database may be corrupt or you may have copied the InnoDB

Regards.

Hi!

Your log files do not help. Those do not provide info on the cause of crash. Also, there is no usable stack trace. 

So, first of all, please do reply on all of my questions that I have asked and follow all other recommendations that I have provided you.

Next, if that does not help, run a debug version of our binary, enable core dumping and then upload the binary that you are using and the core file produced.

Are you using our binaries , downloaded from our site ??? If not , please do ... We can not proceed further by analysing binaries from other sources.

the stack before "180111 01:12:35" is a known bug #18948649

this is caused by inability to write to tmpdir probably due to lack of disk space.   typical that many threads hit such a crash at same time which confuses signal handling on some level.

after that, one of the ibd files or ibdata has been wiped out by content meant to be written into stderr:  Space id in fsp header 1416128883

mysql> select unhex(conv(1416128883,10,16)) a;
+------+
| a    |
+------+
| This |
+------+
1 row in set (0.00 sec)

check their datadir for such occurrence (I not on vpn). one of the ibd files will contain this text.
as mentioned previously crash handling is not safe, and it malfunctions to such a degree that the fd for stderr is closed and then opened and assigned to an ibd file in the mayhem.
seen this many times, and i can only think of one solution that is to disable crash handling within mysqld by the --gdb option at the cost of having to store core files to determine post mortem crash cause.

Hi!

My colleague has done some more research.

We have noted that the stack before "180111 01:12:35" is a known bug #18948649.

It is caused by inability to write to tmpdir  due to lack of disk space. It is quite typical that many threads hit such a crash at same time which provides fuzzy result.

After that, one of the ibd files or ibdata has been wiped out by content meant to be written into stderr:  Space id in fsp header 1416128883

mysql> select unhex(conv(1416128883,10,16)) a;
+------+
| a    |
+------+
| This |
+------+
1 row in set (0.00 sec)

Hence, not a bug.

Hi,

First of all, do consider the input that I gave you in my last comment.

Recently experienced the same issue, more than likely due to memory probelm. Added innodb_force_recovery=1 to my.cnf file to start the database. Ran a mysqldump using: mysqldump --single-transaction --quick --flush-logs original_db > backup_db. This actually failed on the corrupted table that was the cause of all of the crashing. We removed the table -- it was a cache table anyway-- created the table and restarted mysql service. Database started up. Removed innodb_force_recovery=1 from my.cnf and restarted mysql service again. Everything has been working well, since. We stayed with the original database. Note: Dropping the table was not the only thing we had to do. You'll have to remove, or move the associated idb file to a different folder in order to recreate the table properly.

I ran in to something similar where there are no signs of hardware trouble (dmesg, iLO).

Context was that "MySQL thread id 33777732" was trying to import a mysqldump of database `foo` to `foo_new` while "MySQL thread id 33730946" was still using the old database (application was dropping a table that it was going to recreate, as per it's normal flow):

---TRANSACTION 5848947773, ACTIVE 949 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s), undo log entries 369
MySQL thread id 33777732, OS thread handle 0x7f20bb17a700, query id 2159842751 localhost root update
INSERT INTO `t2` VALUES <SNIP>
---TRANSACTION 5844664232, ACTIVE 7859 sec dropping table
MySQL thread id 33730946, OS thread handle 0x7f20bf0b8700, query id 2154567632 192.168.1.2 foouser checking permissions
DROP TABLE IF EXISTS `t1`
---TRANSACTION 5844664189, ACTIVE 7859 sec doing SYNC index
6 lock struct(s), heap size 1184, 0 row lock(s), undo log entries 2506

Far above in the mutex section:

2018-12-20 16:25:13 14299 [Warning] Too many connections
2018-12-20 16:25:14 14299 [Warning] Too many connections

=====================================
2018-12-20 16:25:14 7f20bfbff700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 4816473 srv_active, 0 srv_shutdown, 837910 srv_idle
srv_master_thread log flush and writes: 5654383
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 32253896
--Thread 139778554570496 has waited at row0ins.cc line 1809 for 949.00 seconds the semaphore:
S-lock on RW-latch at 0x135ddc0 '&dict_operation_lock'
a writer (thread id 139778548823808) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 1809
Last time write locked in file /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-6-x64/test/rpmbuild/BUILD/percona-server-5.6.41-84.1/storage/innobase
/srv/srv0srv.cc line 2660
--Thread 139778620884736 has waited at row0mysql.cc line 4459 for 949.00 seconds the semaphore:
X-lock on RW-latch at 0x135ddc0 '&dict_operation_lock'
a writer (thread id 139778548823808) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 1809
Last time write locked in file /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-6-x64/test/rpmbuild/BUILD/percona-server-5.6.41-84.1/storage/innobase
/srv/srv0srv.cc line 2660

<SNIP>

--Thread 139776073651968 has waited at dict0dict.cc line 1132 for 613.00 seconds the semaphore:
Mutex at 0x7f20ff81f868 '&dict_sys->mutex', lock var 1
waiters flag 1
--Thread 139776082974464 has waited at dict0dict.cc line 1132 for 613.00 seconds the semaphore:
Mutex at 0x7f20ff81f868 '&dict_sys->mutex', lock var 1
waiters flag 1
--Thread 139776073385728 has waited at row0undo.cc line 299 for 612.00 seconds the semaphore:
S-lock on RW-latch at 0x135ddc0 '&dict_operation_lock'
a writer (thread id 139778548823808) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 1809
Last time write locked in file /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-6-x64/test/rpmbuild/BUILD/percona-server-5.6.41-84.1/storage/innobase/srv/srv0srv.cc line 2660

Here is the crash (note that innodb_buffer_pool_size was far too small, which may be relevant):

2018-12-20 16:25:14 14299 [Warning] Too many connections
2018-12-20 16:25:14 14299 [Warning] Too many connections
2018-12-20 16:25:15 14299 [Warning] Too many connections
2018-12-20 16:25:15 14299 [Warning] Too many connections
2018-12-20 16:25:15 14299 [Warning] Too many connections
2018-12-20 16:25:17 14299 [Warning] Too many connections
2018-12-20 16:25:17 14299 [Warning] Too many connections
2018-12-20 16:25:17 14299 [Warning] Too many connections
2018-12-20 16:25:17 14299 [Warning] Too many connections
2018-12-20 16:25:18 14299 [Warning] Too many connections
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2018-12-20 16:25:19 7f20bbbfd700  InnoDB: Assertion failure in thread 139778565592832 in file srv0srv.cc line 2192
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8dd04c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x65c301]
/lib64/libpthread.so.0(+0xf7e0)[0x7f21018fe7e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f210059b495]
/lib64/libc.so.6(abort+0x175)[0x7f210059cc75]
/usr/sbin/mysqld[0xa4c826]
/lib64/libpthread.so.0(+0x7aa1)[0x7f21018f6aa1]
/lib64/libc.so.6(clone+0x6d)[0x7f2100651bdd]
2018-12-20 16:25:20 20416 [Note] Plugin 'FEDERATED' is disabled.
2018-12-20 16:25:20 20416 [Warning] The option innodb (skip-innodb) is deprecated and will be removed in a future release
2018-12-20 16:25:20 20416 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.

2018-12-20 16:25:20 20416 [Note] InnoDB: Using atomics to ref count buffer pool pages
2018-12-20 16:25:20 20416 [Note] InnoDB: The InnoDB memory heap is disabled
2018-12-20 16:25:20 20416 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-12-20 16:25:20 20416 [Note] InnoDB: Memory barrier is not used
2018-12-20 16:25:20 20416 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-12-20 16:25:20 20416 [Note] InnoDB: Using Linux native AIO
2018-12-20 16:25:20 20416 [Note] InnoDB: Using CPU crc32 instructions
2018-12-20 16:25:20 20416 [Note] InnoDB: Initializing buffer pool, size = 128.0M

Afterward, in an unrelated database/table:

# strings another_table.ibd | head
16:25:19 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/
key_buffer_size=67108864

Was forced to mv another_table.ibd out of the datadir before starting, and then use PITR to a new database name

The last comment is totally unrelated to the reported problem. Also it only shows held semaphores, which is expected behaviour.

Anyway, we need a repeatable test case in order to verify this bug.

Sinisa,

If I should use a new bug report for crash output corrupting an ibd and this bug is about something else, let me know to file a new bug.

Using the strategy of mysqldump + HUP from MySQL Bug 51023 and a hint from Shane in this bug, I am able to corrupt an ibd file repeatably. Note that in my kernel, I was *not* able to reproduce the BSD bug mentioned in 51023.

### Set up environment

[root@mg-el6 ~]# dmesg -n 1
[root@mg-el6 ~]# rpm -qi MySQL-server-5.6.42-1.el6.x86_64 | head -5
Name        : MySQL-server                 Relocations: (not relocatable)
Version     : 5.6.42                            Vendor: Oracle and/or its affiliates
Release     : 1.el6                         Build Date: Mon 10 Sep 2018 10:49:10 AM UTC
Install Date: Sun 23 Dec 2018 01:14:42 AM UTC      Build Host: vilma38
Group       : Applications/Databases        Source RPM: MySQL-5.6.42-1.el6.src.rpm
[root@mg-el6 ~]# uname -r
2.6.32-696.30.1.el6.x86_64
[root@mg-el6 ~]# rm -rf /var/lib/mysql*
[root@mg-el6 ~]# > /var/log/mysqld.log
[root@mg-el6 ~]# mkdir /var/lib/mysql && chown mysql. /var/lib/mysql
[root@mg-el6 ~]# mkdir /var/lib/mysql-files && chown mysql. /var/lib/mysql-files # mysqld fails to start or error-log without this directory

[root@mg-el6 ~]# mysql_install_db --user=mysql --datadir=/var/lib/mysql &> /dev/null
[root@mg-el6 ~]# MYSQL_TEST_LOGIN_FILE="" /etc/init.d/mysql start &> /dev/null
[root@mg-el6 ~]# mysqladmin ping
mysqld is alive
[root@mg-el6 ~]# my_print_defaults mysqld
--datadir=/var/lib/mysql
--socket=/var/lib/mysql/mysql.sock
--user=mysql
--symbolic-links=0
--log-error=/var/log/mysqld.log

### Start creating data using php
### screen -S insert

[root@mg-el6 ~]# cat 51023.php
<?php

for ($i=1; $i < 801; $i ++) {
        echo "create database db_$i;\n";
        for ($j=1; $j < 10; $j ++) {
                echo "create table db_$i.t_$j(f1 int not null auto_increment primary key, f2 text) engine=innodb;\n";
                for ($k = 1; $k < 1000; $k ++) {
                        echo "insert into db_$i.t_$j (f2) values(md5($k));\n";
                }
        }
}

?>

[root@mg-el6 ~]# while true; do php 51023.php | mysql -f 2> /dev/null; done

### Start mysqldump continually
### screen -S mysqldump

[root@mg-el6 ~]# while mysqldump -A > /dev/null; do true; done

### Start sending HUP to mysqld every second
### screen -S hup

[root@mg-el6 ~]# while sleep 1; do kill -HUP `cat $(mysql -sse "select @@pid_file")`; done

### Leave this running for maybe twenty minutes, I got it once right away and once after maybe 30 attempts and letting the above run for longer

### Kill mysqld_safe angel process

[root@mg-el6 ~]# killall -9 mysqld_safe

### Intentionally crash the server a lot "when multiple threads invoke crash handler at once"

[root@mg-el6 ~]# while true; do for i in {1..300}; do sleep 1; done; [[ -e /var/lib/mysql/mg-el6.pid ]] && kill -11 `cat /var/lib/mysql/mg-el6.pid`; kill -11 `cat /var/lib/mysql/mg-el6.pid`; kill -11 `cat /var/lib/mysql/mg-el6.pid`; done

### Looking after the intentional crash:

2018-12-23 05:14:55 26248 [ERROR] InnoDB: Space id in fsp header 3925,but in the page header 824195850
2018-12-23 05:14:55 26248 [ERROR] InnoDB: checksum mismatch in tablespace ./db_226/t_6.ibd (table db_226/t_6)

[root@mg-el6 ~]# head -1 /var/lib/mysql/./db_226/t_6.ibd
05:14:46 UTC - mysqld got signal 11 ;

Let me know if you want me to upload some files, provide ssh credentials, etc (or create a new bug)

Hi,

I could not repeat the test case as provided by you. It is possible that I made a change in the script, since I do not use PHP.

Also, killing MySQL with many signals concurrently can not be treated like a bug. Simply, why would anyone attempt to destroy one's own data.

Simply, you can find many other more convenient methods of crashing .ibd files, like editing them, getting out of space in tmpdir, removing part of the file, bad restore and many others.

with all due respect, MySQL should survive 'kill -9' or 'kill -11' or any other signal without corrupting data (or losing it, if sync log flushing is on). 

anyone thinking otherwise should not work at a database company %)

Hello last commenter,

I agree 100 % with you. We have fixed so many bugs that caused the corruption of InnoDB after many various KILL signals on many different platforms.

All that  we are asking for is the exact set of steps that would lead to the corruption.

To put it in the words that you possibly could understand, we need a repeatable test case. Anybody who have worked in the database company should have known that.

With all due respect.

for reference, this is one possible cause.

MySQL binlog index file corrupted with *** glibc detected *** mysqld: double free or corruption (Doc ID 2123652.1)

"If the MALLOC_CHECK_ environmental variable is set to 1, when glibc detects memory corruption it will dump a stack trace to stderr. By default stderr is assigned file descriptor id 2, in MySQL the fd 2 is pointing to the binlog index file, so glibc overwrites a portion of the binlog index file.
"

I have some bash scripts to demonstrate the file corruption problem on centos6 with the default old version of glibc. Typical runtime is a minute or two. Try restarting if it takes much longer by launching start.sh again.

The scripts are intended to run on a "throw-away" server as they will wipe the datadir, etc. so assume they will damage your vm.

The scripts should reliably write "mysqld got signal 11" to an ibd, MYI, or MYD file after sending a single "/usr/bin/pkill -11 -x mysqld" under the right conditions. If you use innodb_flush_method=O_DIRECT, the corrupt file is more likely to be binlog.index or something like that.

By default, install.txt helps install MySQL 8.0. If you are testing on MySQL 5.6, consider setting innodb_force_recovery=3 in my.cnf, as 5.6 tends to create empty ibd files during crash, which is not what this bug is about.

When start.sh completes, you get a notice like this:

"""
We intentionally crash the server because it appears to be hung.
(waiting for user to simulate long semaphore wait with: '/usr/bin/pkill -11 -x mysqld')

screens have quit - suggested actions, issue:
/root/87850/fdinfo.sh
/usr/bin/pkill -11 -x mysqld
grep -lR '^[0-9].*UTC.*signal' /var/lib/mysql 2> /dev/null
"""

Additional thoughts about the bug and script usage:

### start.sh stops mysqld, kills all screens, deletes existing datadir, then attempts to reproduce the problem anew
### stop.sh closes mysqld and any screen session
### disabling flush.sh in start.sh still reproduces, but takes much longer

##############
### reproduces with ext3 and xfs
### reproduces with innodb_use_native_aio=0,1
### reproduces on these cent6 kernels:
# 2.6.32-754.27.1.el6.x86_64
# 3.8.13-98.7.1.el6uek.x86_64
# 4.1.12-124.37.1.el6uek.x86_64
# 4.4.215-1.el6.elrepo.x86_64

##############
### This seems to manifest with older versions of glibc (possibly due to "buggy select use")
### I wonder if this is similar to https://sourceware.org/bugzilla/show_bug.cgi?id=10352
### Maybe the problem is somewhere around my_safe_freopen in ./mysys/my_fopen.c
### Possibly the problem could be recreated on modern glibc with -D_FORTIFY_SOURCE=0

## reproduces with default rpm-installed glibc:
# 67f59e096f46fc59b830c2fed53244ba  glibc-2.12-1.212.el6_10.3.x86_64.rpm

## reproduces with upstream glibc-2.12.1:
# be0ea9e587f08c87604fe10a91f72afd  glibc-2.12.1.tar.bz2

## reproduces with upstream glibc-2.12.2:
# 903fcfa547df2f453476800e0838fe52  glibc-2.12.2.tar.bz2

## reproduces with upstream glibc-2.13:
# 38808215a7c40aa0bb47a5e6d3d12475  glibc-2.13.tar.bz2

### can't reproduce with upstream glibc-2.14:
# 1588cc22e796c296223744895ebc4cef  glibc-2.14.tar.bz2

##############
### if you are going to see the bug, you should quickly see the file descriptors of the error log > 2 - try /root/87850/fdinfo.sh
### the file corruption seems to manifest when stderr fd > 1024, which crash.sh responds to
### with glibc 2.14, error log seems to always have FD 1 and 2

##############
### patching glibc
### if you try a different mysql version in yum, before patching glibc, remember to issue:
# rpm -e --nodeps mysql-community-client mysql-community-common mysql-community-libs mysql-community-server
# rm /usr/sbin/mysqld* -f
# yum install -y mysql-community-client.x86_64 mysql-community-server.x86_64
# cp -a /usr/sbin/mysqld /usr/sbin/mysqld.dist

### modify line two of 87850/patchglibc.sh to choose one of these versions:
# glibc-2.12.1
# glibc-2.12.2
# glibc-2.13
# glibc-2.14

### 87850/stop.sh; 87850/patchglibc.sh; 87850/start.sh

Demo:

[root@87850 ~]# bash makescripts.sh
[root@87850 ~]# /root/87850/start.sh
[root@87850 ~]# /usr/bin/pkill -11 -x mysqld
[root@87850 ~]# grep -lR '^[0-9].*UTC.*signal' /var/lib/mysql 2> /dev/null
/var/lib/mysql/db_1/t_473.ibd

I was worried about formatting in this bug tracker, so the scripts are in my github: https://github.com/mikegriffin/bug_helpers/tree/master/mysql_87850

Hi Mr. wangbo,

Have you been able to repeat the behaviour with latest 5.7 or 8.0 ???

This feedback is important to us .....

Next, what we need is the following.

Since you are killing MySQL server, have you made sure that your system has been setup to full ACID compliance ???

Have you switched off caches on your disk controller and hard disk / SSD ???

Are you using O_DIRECT option ??

Have you set InnoDB to use double-write and full checksumming ???

Is your innodb_flush_log_at_trx_commit set to 1 ??? 

Have you turned off adaptive flushing ???

If you have not met a full ACID compliance, can you please repeat your test ???

The bug reproduction script works with the latest 5.6, 5.7, and 8.0, eg it detects whether to use mysql_install_db or mysqld --user=mysql --initialize-insecure

The corruption manifests on any of those versions, using the default my.cnf (ie default durability settings for a given release of MySQL)

Regarding hardware, I am just spinning up a cloud server running centos6, with "local" storage, and not modifying any OS settings besides timezone.

As mentioned, the file corruption can occur on files like binlog.index (a random file is corrupted) so InnoDB settings are not all that relevant. It is worth mention that with innodb_flush_method=O_DIRECT, I have not seen any .ibd file corruption - instead you seem to always corrupt some non-InnoDB file (an MYD file, binlog.index, etc)

Correction to my previous posts, you can reproduce this corruption of random file when stderr fd > 2.

I had previously said it needed to be > 1024 but > 2 reproduces. That said, it is faster to create this condition with more tables and file descriptors open - the script still works but 1024 isn't really a hint towards the problem:

Error logs:
               33
               68
mysqld pid:    22971
Uptime:        11 min 15 sec

# /usr/bin/pkill -11 -x mysqld
# grep -lR '^[0-9].*UTC.*signal' /var/lib/mysql 2> /dev/null
/var/lib/mysql/binlog.index

HI All,

Sorry, but based on the last comments, I do not think that this is a bug.

First of all, stderr must be 2, before our server is started.

Next, if binary logging files or some other non-InnoDB files are corrupted, that is usually due to bad hardware. You can partially alleviate that by regular flushing of those files. There are separate settings for that.

Next, you could be getting those corruption by a very well documented documentation, which says:

MySQL binlog index file corrupted with *** glibc detected *** mysqld:
double free or corruption (Doc ID 2123652.1)

"If the MALLOC_CHECK_ environmental variable is set to 1, when glibc
detects memory corruption it will dump a stack trace to stderr. By
default stderr is assigned file descriptor id 2, in MySQL the fd 2 is
pointing to the binlog index file, so glibc overwrites a portion of the
binlog index file.
"

So, the cause of the corruption is a known and well documented issue.