Bug #4570 Replication hangs on Fedora Core1 on Opteron
Submitted: 15 Jul 2004 22:11 Modified: 16 Jul 2004 14:38
Reporter: Greg Whalin Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:4.0.20 OS:Linux (Fedora Core 1 (Linux))
Assigned to: Guilhem Bichot CPU Architecture:Any

[15 Jul 2004 22:11] Greg Whalin
Rolling out mysql on a new dual-Opteron (4GB memory) running Fedora Core2 as a replicant.  Replication starts with no problem, and show slave status looks correct.  I can see the relay log filling up.   However, after a short amount of time, the replication locks up and nothing gets applied.  This was happening with the pre-compiled version I downloaded from the site and with a custom compiled version.  At that point the only way to bring down the server was to kill -9 the mysqld processes.  On a hunch, I suspected NPTL as we have had similar problems with Suns JVM.  Sure enough, making sure I had export LD_ASSUME_KERNEL=2.4.19 in the start script allowed it to work with no lockups (and this was running through about two weeks of backlogs).  I can provide more info if needed, but this seems to be a NPTL problem I suppose.

How to repeat:
Run 4.0.20 on Fedora Core 1 running on Opteron box.  Replication was locking for me almost immediately.

Suggested fix:
export LD_ASSUME_KERNEL=2.4.19 in the start script to disable NPTL
[15 Jul 2004 22:13] Greg Whalin
I mentioned Core 2.  That was a typo.  We are using Core 1
[15 Jul 2004 23:22] Guilhem Bichot
When you say you used the pre-compiled package: which one exactly, please (that will help me know if it's a statically or dynamically linked binary).
[15 Jul 2004 23:30] Greg Whalin
We were using mysql-standard-4.0.20-unknown-linux-x86_64.tar.gz.  I have not verified if disabling NPTL helps with this.  I am running now with a self compiled version, which was compiled as:

CFLAGS="-O3" CXX=gcc CXXFLAGS="-O3 -felide-constructors -fno-exceptions -fno-rtti" ./configure --prefix=/usr/local/mysql --with-extra-charsets=all --enable-thread-safe-client --enable-local-infile  --disable-shared --enable-assembler
[16 Jul 2004 1:04] Greg Whalin
An update to this issue.  We decided to go to Fedora Core 2 (running kernel 2.6.6-1.435.2.3smp) on this box with all latest rpms.  Using the pre-compiled version of mysql (mentioned above) is now working fine with no problems and not needing the LD_ASSUME_KERNEL var set.
[16 Jul 2004 14:38] Guilhem Bichot
Hi Greg,

If an upgrade to Fedora Core 2 removed the problem, I'm going to call it a bug in FC1 or in the NPTL of FC1. This is quite a relief, by the way.
And I'm happy it's now working fine at your site :)