MySQL Bugs: #43832: Segfault with table corruption...not able to recreate table.

Bug #43832	Segfault with table corruption...not able to recreate table.
Submitted:	24 Mar 2009 13:59	Modified:	27 Aug 2014 4:33
Reporter:	Vivek V	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S4 (Feature request)
Version:	5.1.28-rc-community-log, 5.1 - bzr	OS:	Linux (CentOS release 5.2 (Final))
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	memory issue

Description:
One of the slaves in our replication setup seems to have crashed. The timing on the crash revolves around a table which is an intermediate table within the application and gets refreshed every 1 hour from the master. When the table got refreshed yesterday around 18:00 hrs. the error started occuring on the slave around 18:04 as below:
Trx read view will not see trx with id >= 0 1214304479, sees < 0 1214304479

On further delving into the logs, the error log seems to hint at a corruption or some major issues as seen below:
"
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x83acde]
/usr/sbin/mysqld(handle_segfault+0x327)[0x5b9147]
/lib64/libpthread.so.0[0x331a60de80]
/lib64/libc.so.6(gsignal+0x35)[0x3319e30155]
/lib64/libc.so.6(abort+0x110)[0x3319e31bf0]
/usr/sbin/mysqld[0x7b4ca1]
/lib64/libpthread.so.0[0x331a606307]
/lib64/libc.so.6(clone+0x6d)[0x3319ed1ded]
"

Also the stack which tried to dummp the memory stack seems to be emty as below:
"
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x6400000
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
"

We have tried to recreate the table within the slave but get an error of 121.

How to repeat:
Create a table on the master which refreshes down to all the slaves every hour. Around the hourly refresh when a session is accessing the data from the table the slave DB seems to fail.

Suggested fix:
Requesting one.

Forgot to mention that we are using InooDB as the application engine.

Thank you for the report.

But what does mean "a table on the master which refreshes down to all the slaves every hour." Is it table content in which changes fully every hour?

> We have tried to recreate the table within the slave but get an error of 121.

Error 121 means "Remote I/O error". How did you try to recreate the table?

Please also provide full error log file from the slave.

Q:.   But what does mean "a table on the master which refreshes down to all the slaves every hour." Is it table content in which changes fully every hour?

A:.   Hello Sveta, basically we have a table which we do the following in the master MySQL server every hour:
truncate table check_roomcount;
insert into check_roomcount
select * from ...;

This truncate and insert gets replicated to the slave every hour and the error seems to around the time this happened on the slave.

Q:.     We have tried to recreate the table within the slave but get an error of 121.
Error 121 means "Remote I/O error". How did you try to recreate the table?
Please also provide full error log file from the slave.

A:.     I will attaching the error log herewith. As regards the table re-creation I have tried to re-create the table on this slave alone where I got the error 121. Please find the error file attached herewith.

Requesting to let me know the ftp process as the file is greater than 500 Kb and I am not able to upload it here. If I click the ftp site it give me a no answer from server error.

Thank you for the feedback.

Our public FTP directory is not listable. Probably you just don't see the directory. Please try again.

It works for me:

$ftp ftp://ftp.mysql.com/pub/mysql/upload/
Connected to ftp.mysql.com.
220 Welcome to MySQL AB's FTP service
331 Please specify the password.
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Switching to Binary mode.
250 Directory successfully changed.
250 Directory successfully changed.
250 Directory successfully changed.
ftp> put
(local-file) ^C
ftp> quit
221 Goodbye.

Hello Sveta,
   Thank You very much for a quick turnaround. I get the following errors:

C:\Temp>ping  ftp://ftp.mysql.com
Ping request could not find host ftp://ftp.mysql.com. Please check the name and
try again.

C:\Temp>
C:\Temp>
C:\Temp>ftp ftp://ftp.mysql.com/pub/mysql/upload/
Unknown host ftp://ftp.mysql.com/pub/mysql/upload/.
ftp>

If you could give me the IP of the ftp.mysql.com I could give it a try again.

Thanks and Regards,
Vivek.

Thank you for the feedback.

Error log shows mysqld was intentionally crashed because long semaphore wait in time when truncate was running. As truncate is really sequence of drop and create statements InnoDB tablespace was corrupted.

To restore your tablespace you can:

1. stop slave
2. copy frm file from master to slave
3. drop table
4. create table
5. change slave to restore latest data if needed
6. start slave

See also bug #42643 for details why truncate is not safe for replication.

Hello Sveta,
   I am still getting the errors even when trying to drop and recreate the table after copying the frm file.

Eagerly awaiting your feedback.

Thanks and Regards,
Vivek.

Seems you have option --innodb-file-per-table, so you need to follow same instructions again. See also http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html and http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting-datadict.html

Verified using latest 5.1 sources

How to repeat.

In tab #1:

$cat bug43832.test
--source include/have_innodb.inc

create table t1 (f1 int) engine=innodb;

insert into t1 values(1);

select * from t1;

truncate table t1;

select * from t1;

$cat bug43832-master.opt
--innodb-file-per-table

$./mtr --manual-gdb bug43832
Logging: ./mtr  --manual-gdb bug43832
...

In another tab (tab #2):

$gdb -cd /Users/apple/bzr/mysql-5.1/mysql-test -x /Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/gdbinit.mysqld.1 /Users/apple/bzr/mysql-5.1/sql/mysqld
GNU gdb 6.3.50-20050815 (Apple version gdb-696) (Sat Oct 20 18:16:54 GMT 2007)
Copyright 2004 Free Software Foundation, Inc.
...
5763      DBUG_ENTER("mysql_parse");
(gdb) b que_fork_start_command
Breakpoint 2 at 0x3598b7: file que/que0que.c, line 332.
(gdb) c
Continuing.
[Switching to process 25317 thread 0x2703]

Breakpoint 2, que_fork_start_command (fork=0x4c7b060) at que/que0que.c:332
332             que_thr_t*      suspended_thr = NULL;

At this time in tab #3:

$ps -A | grep mysqld
24701  p2  S+     0:01.38 /usr/libexec/gdb/gdb-i386-apple-darwin  -cd /Users/apple/bzr/mysql-5.1/mysql-test -x /Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/gdbinit.mysqld.1
25317  p2  SX     0:00.38 /Users/apple/bzr/mysql-5.1/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/Users/apple/bzr/mysql-5.1/mysql-test/var/my.cnf --log-output=table,file --gdb --loose-skip-log-
 1693  p3  S+     0:00.00 grep mysqld

$kill -6 25317

In tab #2:

(gdb) c
Continuing.
Current language:  auto; currently c

Program received signal SIGABRT, Aborted.
[Switching to process 25317 local thread 0xf03]
0x9001a1cc in select ()
(gdb) c
Continuing.

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
(gdb) q

In tab #1:

main.bug43832                            [ fail ]
...

$../sql/mysqld --no-defaults --basedir=~/Applications/mysql-5.1 --datadir=./var/log/main.bug43832/mysqld.1/data --log-error --port=33051 --socket=/tmp/mysql5111.sock &
[1] 6563

apple@apple ~/bzr/mysql-5.1/mysql-test
$090326 13:37:11 [Warning] Setting lower_case_table_names=2 because file system for /Users/apple/bzr/mysql-5.1/mysql-test/var/log/main.bug43832/mysqld.1/data/ is case insensitive

apple@apple ~/bzr/mysql-5.1/mysql-test
$mysql  --socket=/tmp/mysql5111.sock test -e "select * from t1"
ERROR 1146 (42S02) at line 1: Table 'test.t1' doesn't exist

apple@apple ~/bzr/mysql-5.1/mysql-test
$cat  var/log/main.bug43832/mysqld.1/data/apple.err 
InnoDB: Log scan progressed past the checkpoint lsn 0 39548
090326 13:37:12  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 0 46409
090326 13:37:12  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
090326 13:37:13  InnoDB: Started; log sequence number 0 46409
090326 13:37:13 [Note] Event Scheduler: Loaded 0 events
090326 13:37:13 [Note] ../sql/mysqld: ready for connections.
Version: '5.1.34-debug'  socket: '/tmp/mysql5111.sock'  port: 33051  Source distribution
090326 13:37:16 [ERROR] Cannot find or open table test/t1 from
the internal data dictionary of InnoDB though the .frm file for the
table exists. Maybe you have deleted and recreated InnoDB data
files but have forgotten to delete the corresponding .frm files
of InnoDB tables, or you have moved .frm files to another database?
or, the table contains indexes that this version of the engine
doesn't support.
See http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html
how you can resolve the problem.

Suggested fix: handle this situation when send signal 6 for intentional crash if InnoDB wait semaphore for long time, probably in case of other intentional crashes.

Hello Sveta,
   Dropping and recreating the table seems to have worked now. We are now in the process os catching up the replication. Will update the bug in case I face any further issues. on this.

Thank You very much for the all you help. I will monitor the bug to trace whether you get a resolution to why we were not able to drop the table in the first place.

Thanks and Kind Regards,
Vivek.

this looks like a duplicate of bug#42643

Vivek,

thank you fro the feedback. Latest problem really looks like bug#42643. But problem with table corruption still exists, so set to "Verified" again as described in comment "[26 Mar 11:49] Sveta Smirnova"

Sveta, please let me know if I'm right in the interpretation of you note.
If I'm not, could you provide your own interpretation in plain English, not gdb?

1. you set a breakpoint, the test starts to execute CREATE TABLE (or is it TRUNCATE?), hits
the breakpoint.
2. you kill mysqld
3. on restart InnoDB complains about a table for which there's a .frm file but no entry in InnoDB's data dictionary. InnoDB does start normally after that error message, it doesn't abort, am I right here?

If so it's hardly a bug, a feature request at most, and in fact a MySQL design problem of lack of transactional DDL. No matter if it's CREATE or TRUNCATE we end up with a partially created table, it's a well known issue. Could you verify the workaround:  remove the .frm file and recreate the table.

Michael,

thank you for the feedback.

> 1. you set a breakpoint, the test starts to execute CREATE TABLE (or is it TRUNCATE?), hits the breakpoint.

Correct, this is create

> 2. you kill mysqld

Yes, with sygnal 6

> 3. on restart InnoDB complains about a table for which there's a .frm file but no entry in InnoDB's data dictionary. InnoDB does start normally after that error message, it doesn't abort, am I right here?

Correct. And drop/recreate works fine.

> If so it's hardly a bug, a feature request at most, and in fact a MySQL design problem of lack of transactional DDL.

I agree, but anyway this is good to have.

now that we agreed it's more of a feature request let's assign to Calvin to coordinate

This was fixed in 5.7.2, cf. 

WL#6501: InnoDB: Make Internal InnoDB TRUNCATE TABLE statement to be atomic for single tablespace
http://dev.mysql.com/worklog/task/?id=6501