| Bug #73392 | Server crashes (segfaul) at shutdown or slave start - NO MERGE TABLES | ||
|---|---|---|---|
| Submitted: | 25 Jul 2014 13:13 | Modified: | 5 Sep 2014 19:12 |
| Reporter: | Mike Diehn | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) |
| Version: | 5.6.19 | OS: | Linux (CentOS 6.5 with mysql-community repos) |
| Assigned to: | CPU Architecture: | Any | |
[25 Jul 2014 13:15]
Mike Diehn
my.cnf for mysqld 5.6.19 that crashed
Attachment: my.cnf.2014072403 (application/octet-stream, text), 987 bytes.
[25 Jul 2014 13:16]
Mike Diehn
log file with crashes and stack traces
Attachment: mysqld.log.borked.4 (application/octet-stream, text), 42.60 KiB.
[25 Jul 2014 13:16]
Mike Diehn
Fixed typo in synopsis
[25 Jul 2014 19:00]
Sveta Smirnova
Thank you for the report. Does your backup include mysql database? If yes, have you run mysql_upgrade?
[25 Jul 2014 21:06]
Mike Diehn
I've done this four times now, I think. The first time, I'd actually shutdown the master/slave (in the middle) and used rsync to copy it's data directory to the new machine. Then I ran mysql_upgrade. Then started repl. Then it started crashing. Thinking I'd done something stupid, I tried again with the same result. So next I made a dump with --all-databases, restored that into a pristine data directory. The dump did have the mysql database in it. And yes, I did run mysql_upgrade on it. Same crashes. Then I tried restoring a dump of just the 'cdna' database into yet another 5.6.19 pristine installation. After the restoration, the replication started, caught up and then the shutdown command crashed the server. Since then, the start slave command will crash the server. That was the most recent attempt and it's where I'm stuck at the moment.
[25 Jul 2014 21:20]
Mike Diehn
Hey - I just tried this with MariaDB 10 and had the same result. As I recall, MariaDB 10 is based on MySQL 5.5 and has backports of some 5.6 features. Just an FYI.
[29 Jul 2014 16:28]
Sveta Smirnova
Thank you for the feedback. Try to identify if this is single table which crashes the server: try single-table backup. Interesting that in the error log you provided crash happened, most likely, when server tried to set a variable. Please also send us error log file from the server where you restored database cdna only: I want to check if backtrace is same.
[30 Jul 2014 17:28]
Mike Diehn
The log I attached *is* for the cdna only restoration. I have 73 tables in this database. Were you thinking I'd try each one alone? This is the most frequent and consistent stack trace I get. It happens when I attempt "START SLAVE" on the 5.6.19 slave server. You can see the call to handle_slave_io in there and so I think it's got something to do with that. /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8d6885] /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x663234] /lib64/libpthread.so.0(+0xf710)[0x7f8b7b5d4710] /lib64/libc.so.6(+0x13385f)[0x7f8b7a38085f] /usr/sbin/mysqld(my_strdup+0x1e)[0x8d21ce] /usr/sbin/mysqld(mysql_options+0x513)[0x77daa3] /usr/sbin/mysqld[0x8a40e2] /usr/sbin/mysqld(handle_slave_io+0x812)[0x8ae242] /usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xafc8ea] /lib64/libpthread.so.0(+0x79d1)[0x7f8b7b5cc9d1] /lib64/libc.so.6(clone+0x6d)[0x7f8b7a335b5d]
[1 Aug 2014 19:15]
Sveta Smirnova
Thank you for the feedback. Well, I used to debug this way even with more than 73 tables. Anyway it is still strange that your dump affects START SLAVE command and backtrace complains about options, not about table corruption or some other affect on your database. Please send us output of SHOW TABLES FROM cdna and try to comment out option "relay_log_info_repository = TABLE" in your configuration file (or change it to relay_log_info_repository = FILE) and inform us if this solves the problem.
[4 Aug 2014 11:00]
Mike Diehn
Hi Sveta, I did that - commented out the "crash safe slave" options - and it made no difference either. Here's the output of SHOW TABLES: mysql(root@localhost:6) cdna> show tables; +-------------------------------------+ | Tables_in_cdna | +-------------------------------------+ | AnnotationRuleValue | | AnnotationSet | | AnnotationTableName | | Architecture | | Attribute | | AutoGUIConfig | | AutoGuiMode | | BackwardMode | | Boolean | | Compiler | | Component | | Config | | DATABASECHANGELOG | | DATABASECHANGELOGLOCK | | DependentMethod | | DeveloperSpecialties | | Employee | | ExpectedTestArchitectureNp | | ExpectedTestManagementGroup | | ExpectedTestTypeMode | | Group | | GuiMode | | IterationMaximumMemoryMetrics | | JiraUser | | Machine | | Message | | Module | | ModuleError | | ModuleResult | | Office | | Owner | | PerformanceData | | PreConfigModule | | PreConfigSuite | | Product | | ProductVersion | | ProductVersionToTest | | Project | | ResolveEmployeeSpecialty | | Server | | Setting | | SolverIterationData | | SolverMetrics | | SolverTimeStepData | | Specialty | | Suite | | SuiteAnnotationRule | | TCF_MISSING | | Test | | TestCase | | TestCaseFile | | TestFramework | | TestManagementGroup | | TestMode | | TestStatus | | TestSuiteResult | | TestSuiteResultFile | | TestType | | UnitMode | | ViewConfig | | ViewModuleResult | | ViewPreConfigModule | | ViewPreConfigSuite | | ViewProductVersionSuiteAnnotation | | ViewProductVersionSuiteAttribute | | ViewProductVersionSuiteBackwardMode | | ViewProductVersionSuiteExcludeOSes | | ViewProductVersionSuiteGroup | | ViewProductVersionSuiteOwner | | ViewProductVersionSuiteSandbox | | ViewProductVersionSuiteServer | | ViewProductVersionSuiteTestType | | ViewProductVersionSuiteUnitMode | | ViewTestCase | | ViewTestSuiteResult | | WhoSpecializesIn | | membership_grouppermissions | | membership_groups | | membership_userrecords | | membership_users | | tempDuplicateModuleErrors | +-------------------------------------+ 81 rows in set (0.00 sec)
[5 Aug 2014 12:11]
Sveta Smirnova
Thank you for the feedback. I did few tests and could not repeat described behavior. I assume your master server is 5.1? Please send us its option file.
[5 Aug 2014 13:21]
Mike Diehn
Hi Sveta, the master is: [root@tdx-db02 ~]# /usr/libexec/mysqld --verbose --help | more /usr/libexec/mysqld Ver 5.1.73-log for redhat-linux-gnu on x86_64 (Source distribution) I'll attach the my.cnf in a moment
[5 Aug 2014 13:23]
Mike Diehn
my.cnf for master mysqld at version 5.1.73
Attachment: my.cnf.tdx-db02.txt (text/plain), 4.12 KiB.
[5 Aug 2014 19:12]
Sveta Smirnova
Thank you for the feedback. I still cannot repeat described behavior. Please check your machine's memory for errors and if it is fine, enable core file and send us core file after crash together with mysqld binary.
[6 Sep 2014 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

Description: I've dumped a 100 GB database from a 5.1.73 server and imported it into a 5.6.19 server. I'll attach the my.cnf from the new server and a mysqld.log. I'm using InnoDB in file-per-table mode. The problem is that when I try to stop mysqld with "service mysqld stop" or "mysqladmin shutdown", it segfaults and then mysqld_safe restarts it. I can get it down by kill -KILLing the mysqld_safe process and then the mysqld process. The biggest problem, though, is that from then on, it also segfaults when I try to "START SLAVE." Bug reports I checked: 69574: I do NOT have any merge tables anywhere in the database 71546: does not appear to be like the tmp table bug. 70194: my stack traces are nothing like these and the behavior is different. Other info: This is a slave in a replication chain: first master is 5.1.69, for redhat-linux-gnu (x86_64) master/slave is 5.1.73, for redhat-linux-gnu (x86_64) last slave is 5.6.19, for Linux (x86_64) When I first restore the mysqldump into the new server, I can start and stop the server and I can even start replication and it will catch up. At some point, and I think it's anytime after I've restored the database, I try to stop the server and it crashes. From then on there's no starting the replication anymore. As soon as I move the data directory "out of the way" and let mysql create a fresh one, we go back to normal operations. I tried upgrading to mysql 5.7 and the problem got worse, if anything. Or at least the log files were noisier. I'm about to try version 5.5. Here are the log entries and stack trace from a crash at start slave: 2014-07-24 18:33:04 3877 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.6.19' socket: '(null)' port: 3306 MySQL Community Server (GPL) 22:33:34 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=1 max_threads=500 thread_count=2 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 207043 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f89c0000990 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... 2014-07-24 18:33:34 3877 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2014-07-24 18:33:34 3877 [Note] Slave SQL thread initialized, starting replication in log 'test-db02-mysqld-bin.001503' at position 42834278, relay log './tdx-db03-relay-bin.000057' position: 4 stack_bottom = 7f89f7e1ed80 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8d6885] /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x663234] /lib64/libpthread.so.0(+0xf710)[0x7f8b7b5d4710] /lib64/libc.so.6(+0x13385f)[0x7f8b7a38085f] /usr/sbin/mysqld(my_strdup+0x1e)[0x8d21ce] /usr/sbin/mysqld(mysql_options+0x513)[0x77daa3] /usr/sbin/mysqld[0x8a40e2] /usr/sbin/mysqld(handle_slave_io+0x812)[0x8ae242] /usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xafc8ea] /lib64/libpthread.so.0(+0x79d1)[0x7f8b7b5cc9d1] /lib64/libc.so.6(clone+0x6d)[0x7f8b7a335b5d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): is an invalid pointer Connection ID (thread ID): 2 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 140724 18:33:34 mysqld_safe Number of processes running now: 0 140724 18:33:34 mysqld_safe mysqld restarted From then on, any attempts to stop the mysqld end with: 01:22:17 UTC - mysqld got signal 11 ; [boilerplate for segfaults] terribly wrong... and nothing else. How to repeat: Build fresh CentOS 6.5 Minimal VM. Add mysql-community repo Install mysql-community-server 5.6.19 and friends. service start mysqld create a database named 'cdna' then "mysql cdna < cdna-mysqldump-opt.out" once it's down, which takes about five hours, configure replication change master to master_host='tdx-db02', master_log_file='tdx-db02-mysqld-bin.001499', master_log_pos='4', master_log_user='username', master_log_user='password'. start slave; show slave status; observe replication running and that after a few hours it's all caught up. service mysqld stop ps faux | grep mysqld observe msyqld_safe and mysqld both running. mysqld_save still has many hours old start time but mysqld has moments old start time. observe /var/log/mysqld has "something went terribly wrong" crash message as the last entry before mysqld_safe messages about starting another mysqld process. try start slave; note lost connection and check the logs. discover that it's crashed again, this time leaving a small stack trace. repeat ad nauseam. Suggested fix: I have no clue.