Bug #26081 | AMD platform: Failing assertion: block->buf_fix_count > 0 | ||
---|---|---|---|
Submitted: | 5 Feb 2007 12:34 | Modified: | 30 Dec 2008 8:27 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.0.38, 5.0.36, 5.0.34, 5.0.32, 5.0.30, 5.0.40, 5.0.45, 5.0.46, 5.0.50a, | OS: | Linux (AMD: linux 64 bit, solaris, linux 32 bit on 64-bit hardware) |
Assigned to: | Sunny Bains | CPU Architecture: | Any |
Tags: | crash, innodb, read committed |
[5 Feb 2007 12:34]
Shane Bester
[13 Feb 2007 17:25]
Heikki Tuuri
buf0buf.ic in 5.0: /************************************************************************ Decrements the bufferfix count of a buffer control block and releases a latch, if specified. */ UNIV_INLINE void buf_page_release( /*=============*/ buf_block_t* block, /* in: buffer block */ ulint rw_latch, /* in: RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */ mtr_t* mtr) /* in: mtr */ { ulint buf_fix_count; ut_ad(block); ut_a(block->state == BUF_BLOCK_FILE_PAGE); ut_a(block->buf_fix_count > 0); if (rw_latch == RW_X_LATCH && mtr->modifications) { mutex_enter(&buf_pool->mutex); buf_flush_note_modification(block, mtr); mutex_exit(&buf_pool->mutex); } mutex_enter(&block->mutex); #ifdef UNIV_SYNC_DEBUG rw_lock_s_unlock(&(block->debug_latch)); #endif buf_fix_count = block->buf_fix_count; block->buf_fix_count = buf_fix_count - 1; mutex_exit(&block->mutex); if (rw_latch == RW_S_LATCH) { rw_lock_s_unlock(&(block->lock)); } else if (rw_latch == RW_X_LATCH) { rw_lock_x_unlock(&(block->lock)); } } I do not recall a similar bug report prior to this. This might be memory corruption, even bad hardware. Setting this bug report to 'Can't repeat' until we can repeat this.
[27 Feb 2007 18:13]
MySQL Verification Team
Heikki, the common factor between the two independent occurrences of this crash I've seen is this: 64-bit READ-COMMITTED isolation level. Got any clues?
[5 Mar 2007 16:02]
Heikki Tuuri
Shane, maybe the #15815 and #22868 bug fixes have broken the buffer fix count keeping for a page. Are both users using >= 5.0.30? Regards, Heikki
[5 Mar 2007 16:06]
Heikki Tuuri
Sunny, please add diagnostic code to buf0buf.ic in 5.0, so that when this the next time fails, we know what is the thread doing. Printing the trx, if any, may help. Regards, Heikki
[5 Mar 2007 16:07]
MySQL Verification Team
1 user had 5.0.32-enterprise-gpl-log and the other had 5.0.30-ent...
[22 Mar 2007 17:42]
Dan Burford
Hi -- we have also seen this same crash: 070322 6:34:37InnoDB: Assertion failure in thread 1258297696 in file ../include/buf0buf.ic line 626 InnoDB: Failing assertion: block->buf_fix_count > 0 InnoDB: We intentionally generate a memory trap. InnoDB: Thread 1201588576 stopped in file ../include/sync0sync.ic line 111 InnoDB: Thread 1230608736 stopped in file ../include/sync0sync.ic line 111 We are using enterprise-gpl-5.0.32-linux-x86_64-glibc23, and like your other two cases, we are also using read-committed transaction isolation.
[22 Mar 2007 23:30]
Dan Burford
error log from offermatica's mysql crash (starts 070322)
Attachment: mysqlcrash.err (application/octet-stream, text), 10.71 KiB.
[22 Mar 2007 23:31]
Dan Burford
I've uploaded a chunk of our .err file; let me know if I can give you any more information.
[4 May 2007 23:17]
MySQL Verification Team
Heikki, we have another reported 64 bit + read-committed crash for this bug. I'll try come up with a stress test soon.
[7 May 2007 12:56]
Heikki Tuuri
Sunny, please study the READ COMMITTED case. Can you add some assertions to the code to catch this bug? Regards, Heikki
[23 Jul 2007 22:27]
Narayan Newton
I don't have much new info, but I hit this recently and thought I'd add my error log to the list. Linux 64-bit, mysql 5.0.40
[23 Jul 2007 22:30]
Narayan Newton
Error log, let me know if I can do anything else...
Attachment: mysqld.err (application/octet-stream, text), 2.61 KiB.
[9 Aug 2007 14:35]
Pete Harlan
Me too, 64-bit read-committed, error log attached. 5.0.45, built from source. Debian etch.
Attachment: db0.err (application/octet-stream, text), 1.71 KiB.
[9 Aug 2007 14:35]
Pete Harlan
Config file in case that helps.
Attachment: db0.conf (application/octet-stream, text), 1.00 KiB.
[16 Aug 2007 10:35]
MySQL Verification Team
all reporters: please try get a corefile on next crash. has anybody repeated this crash in a controlled environment?
[24 Aug 2007 8:17]
MySQL Verification Team
A lot of debugging information has been added to help us diagnose this crash. See this patch: http://lists.mysql.com/commits/30607?f=plain
[20 Sep 2007 5:03]
Jeremiah Gowdy
I just experienced what may be the same thing under 64bit Solaris System is an 8 core Opteron Sun Fire V40z with 32GB memory running Solaris 10 x64 (AMD64). Databases are exclusively InnoDB. SunOS sql-gamma.freedomvoice.com 5.10 Generic_118855-36 i86pc i386 i86pc 5.0.38-enterprise-gpl-log MySQL Enterprise Server (GPL) 070918 22:20:01InnoDB: Assertion failure in thread 26 in file ./../include/buf0buf.ic line 626 InnoDB: Failing assertion: block->buf_fix_count > 0 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html InnoDB: about forcing recovery. InnoDB: Thread 89 stopped in file ./../include/sync0sync.ic line 111 InnoDB: Thread 204 stopped in file ./../include/sync0sync.ic line 111 InnoDB: Thread 484 stopped in file btr0pcur.c line 236 070918 22:20:01 - mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. InnoDB: Thread 367 stopped in file ha_innodb.cc line 2025 We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=134217728 read_buffer_size=1044480 max_used_connections=555 max_connections=1200 threads_connected=139 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 2583862 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. InnoDB: Thread 125 stopped in file ./../include/sync0sync.ic line 111
[30 Oct 2007 8:35]
Nikolay Sivko
Mysql-5.0.45 builded from source (linux 64 bit): 071029 19:39:44InnoDB: Assertion failure in thread 1189607744 in file ./../include/buf0buf.ic line 626 InnoDB: Failing assertion: block->buf_fix_count > 0 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html InnoDB: about forcing recovery. InnoDB: Thread 1260161344 stopped in file row0sel.c line 2930 InnoDB: Thread 1166444864 stopped in file btr0sea.c line 336 InnoDB: Thread 1163516224 stopped in file ./../include/buf0buf.ic line 558 InnoDB: Thread 1184016704 stopped in file ./../include/sync0sync.ic line 111 InnoDB: Thread 1282791744 stopped in file row0row.c line 480 071029 19:39:44 - mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=268435456 read_buffer_size=1044480 max_used_connections=598 max_connections=1000 threads_connected=236 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 5378136 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0x2ab048859c70 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0x46e7f150, backtrace may not be correct. Stack range sanity check OK, backtrace follows: (nil) Stack trace seems successful - bottom reached Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x2ab05ca6b020 is invalid pointer thd->thread_id=41424 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. InnoDB: Thread 1212238144 stopped in file ha_innodb.cc line 977 Number of processes running now: 0 071029 19:39:52 mysqld restarted
[30 Oct 2007 14:32]
Heikki Tuuri
Nikolay, what is your hardware? Regards, Heikki
[31 Oct 2007 7:15]
Nikolay Sivko
4xDual-Core Opteron 885/32Gb RAM Linux kernel 2.6.17-1.2157_FC5 x86_64 thread_concurrency = 16 transaction-isolation = READ-COMMITTED innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT innodb_buffer_pool_size = 20G innodb_additional_mem_pool_size = 64M
[18 Jan 2008 6:58]
MySQL Verification Team
update: I've been running 160 threads against 5.0.40 all inserting/updating/deleting/replacing/selecting on our 64-bit box with 7G buffer pool, read-committed for past 72 hours. No crashes yet. Will update again, after another few days.
[22 Jan 2008 5:47]
Mark Callaghan
I get this on my servers (4 and 8 core AMD). It occurs about 1 time every 1000 machine days. I don't have core files. I use 5.0.37. I am not sure what cursor isolation level apps choose, but the default is repeatable read.
[25 Jan 2008 15:03]
Heikki Tuuri
My current hypothesis is that this is a problem in AMD's cache coherence protocol. Has anyone hit this on a non-AMD platform? Regards, Heikki
[29 Jan 2008 15:26]
MySQL Verification Team
Update: I ran some stress tests in 160 threads for over 7 days and couldn't repeat a crash on 64-bit Intel CPUs. Will have to try AMD next. I don't think we've ever seen >=5.0.48 crashing due to this bug. Is there a chance it's been fixed unknowingly with some other fixes in the mean time?
[4 Mar 2008 17:04]
Heikki Tuuri
Shane, I do not think it could have been fixed in 5.0.48. The bug was a total mystery from the start. --Heikki
[27 Mar 2008 20:28]
Mark Callaghan
There is a bug for some AMD systems. This bug is _very_ unlikely, which matches the frequency with which we have this problem and we are still determining whether it causes our crashes. Look at src/base/atomicops-internals-x86.{h,cc} from http://code.google.com/p/google-perftools/
[28 Mar 2008 12:28]
Heikki Tuuri
Mark, thank you. Maybe time to contact AMD? --Heikki
[28 Mar 2008 17:03]
Mark Callaghan
I think they know about it. I was told it was in the errata.
[28 Mar 2008 17:10]
Heikki Tuuri
Thank you Mark! Let us keep this bug report still open for a year or two, so that we get feedback from users. I guess AMD has fixed the bug in their latest processors, and these bug reports will slowly taper out. Regards, Heikki
[1 Apr 2008 15:26]
Robert Krzykawski
Well, we have hit this bug a couple of times on our 5.0.32 servers, but it seems that no server never than 5.0.48 is affected. Is there anyone out there who has hit this bug on a newer version of 5.0? This is the log readout: Version: '5.0.32-enterprise-gpl-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Enterprise Server (GPL) 080401 8:40:20InnoDB: Assertion failure in thread 1189874016 in file ../include/buf0buf.ic line 626 InnoDB: Failing assertion: block->buf_fix_count > 0 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html InnoDB: about forcing recovery. mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. InnoDB: Thread 1225017696 stopped in file row0sel.c line 2088 We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=52428800 read_buffer_size=2093056 max_used_connections=382 max_connections=1200 threads_connected=97 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 4961590 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0x2b4814df50 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0x46ec0118, backtrace may not be correct. Bogus stack limit or frame pointer, fp=0x46ec0118, stack_bottom=0x46ec0000, thread_stack=262144, aborting backtrace. InnoDB: Thread 1165113696 stopped in file row0sel.c line 2088 Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x2b4560e660 is invalid pointer thd->thread_id=1423637 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. This instance is running on a 8 core AMD server.
[2 Apr 2008 9:20]
Robert Krzykawski
Anyone who knows more about how involved AMD is in this issue? Are they aware of this issue? If they already have done something about it, is there a date from where processors have been fixed? Maybe a processor version in where it has been fixed?
[23 Apr 2008 0:04]
Ben Handy
I have pulled this comment from the files that Mark referenced earlier: // Opteron Rev E has a bug in which on very rare occasions a locked // instruction doesn't act as a read-acquire barrier if followed by a // non-locked read-modify-write instruction. Rev F has this bug in // pre-release versions, but not in versions released to customers, // so we test only for Rev E, which is family 15, model 32..63 inclusive. It would be interesting to know if anybody has encountered this crash on a processor that falls outside this family/model range.
[4 Aug 2008 13:59]
Arkadiusz Miskiewicz
Saw this on: mysql 5.1.26 2 x dual core system vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 270 stepping : 2 cpu MHz : 2009.261
[4 Aug 2008 14:08]
Mark Callaghan
We use a few AMD models, but the error only occurs on this one: model name : Dual Core AMD Opteron(tm) Processor 270
[11 Aug 2008 16:57]
Arkadiusz Miskiewicz
/proc/cpuinfo data is important (cpu family, model, model name and stepping). Knowing that this happens on OTHER models than limited by family == 15 && 32 <= model && model <= 63 rule would be important.
[18 Sep 2008 17:35]
Siva C
db crash error log
Attachment: db-cst.log (, text), 12.13 KiB.
[18 Sep 2008 17:45]
Siva C
Encountered similar issue with AMD-64 bit 4 core processor but transaction set to innodb default. This seems to be open for a long time. Is there any way to shield this other than changing the processors? Hope the logs help http://bugs.mysql.com/file.php?id=10257&text=1 processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : AMD Opteron(tm) Processor 280 stepping : 2 cpu MHz : 2004.590 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni bogomips : 3607.67 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
[18 Sep 2008 18:26]
Arkadiusz Miskiewicz
The bug is known to linux kernel developers but for now no developer was interested in fixing the problem. The generic workaround are also described there. OpenSolaris has in kernel workarounds for the problem. http://bugzilla.kernel.org/show_bug.cgi?id=11305
[1 Dec 2008 19:42]
Mikhail Izioumtchenko
This should probably be set to 'not a bug' since the problem looks to be well understood: - can happen with any MySQL version, observed in 5.0 and 5.1 - the cause is a bug in AMD processor family 15, models 32-63 inclusive - the workaround can be done only at OS kernel level, not in MySQL or InnoDB so outside of scope of MySQL bugdb
[8 Apr 2009 6:39]
Arkadiusz Miskiewicz
Patch for Linux kernel (waiting for testers) at http://bugzilla.kernel.org/show_bug.cgi?id=11305
[27 Jan 2010 13:57]
Marko Mäkelä
It is still too early to say for sure, but Bug #35077 may be related to this.
[28 Apr 2011 9:55]
MySQL Verification Team
http://support.amd.com/us/Processor_TechDocs/25759.pdf See Errata #147
[18 Oct 2011 20:56]
Arnaud Adant
So far, no mainline kernel fixes the problem : 2.6.39 does not. see : http://timetobleed.com/mysql-doesnt-always-suck-this-time-its-amd/ There is a unofficial patch there : http://marc.info/?l=linux-kernel&m=124043356704100