Bug #38592 | ndb_mgmd doesn't start due to epoll error | ||
---|---|---|---|
Submitted: | 6 Aug 2008 7:05 | Modified: | 5 Oct 2008 16:30 |
Reporter: | Kai Voigt | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 5.1.24-ndb-6.3.14-telco | OS: | Linux (Kernel 2.4.21 (Red Hat)) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[6 Aug 2008 7:05]
Kai Voigt
[6 Aug 2008 11:35]
Hartmut Holzgraefe
Looking at the code that produces this message + m_epoll_events = new struct epoll_event[maxTransporters]; + m_epoll_fd = epoll_create(maxTransporters); + if (m_epoll_fd == -1 || !m_epoll_events) + { + /* Failure to allocate data or get epoll socket, abort */ + perror("Failed to alloc epoll-array or calling epoll_create...giving up!"); + abort(); + } and the epoll_create() manpage it can have three different reasons: - out of memory - out of file handles - maxTransporters was negative i think we can rule out the last one and probably out-of-memory, too, so you may want to check the systems open files limits?
[6 Aug 2008 11:45]
Jonas Oreland
Untested patch, if OS doesnt provide function
Attachment: epoll.patch (text/x-patch), 6.21 KiB.
[6 Aug 2008 11:53]
Hartmut Holzgraefe
Hm ... looking on the error message code again, seeing that it is emitted by perror() which adds information based on 'errno' the actual error is ENOSYS (38) Function not implemented which is probably simply due to the fact that epoll was only introduced in Linux 2.5/2.6 and does not exist in the 2.4.21 kernel used here?
[7 Aug 2008 1:14]
Kai Voigt
With the patch applied, mysqld now crashes 080806 18:26:25 mysqld_safe Starting mysqld daemon with databases from /u02/mysql-5.1.24/var 080806 18:26:25 [Warning] option 'thread_stack': unsigned value 65536 adjusted to 131072 080806 18:26:25 InnoDB: Started; log sequence number 0 569278 080806 18:26:25 [Note] Starting MySQL Cluster Binlog Thread 080806 18:26:25 [Note] Event Scheduler: Loaded 0 events 080806 18:26:25 [Note] /u02/mysql-5.1.24/libexec/mysqld: ready for connections. Version: '5.1.24-ndb-6.3.14-telco' socket: '/u02/mysql-5.1.24/mysql.sock' port: 3308 Source distribution 080806 18:27:52 [ERROR] Invalid (old?) table or database name 'mysql-cluster' Failed to alloc epoll-array or calling epoll_create...giving up!: Function not implemented 080806 18:28:31 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16384 read_buffer_size=262144 max_used_connections=1 max_threads=151 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 49284 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... /u02/mysql-5.1.24/libexec/mysqld(print_stacktrace+0x1d) [0x82f0ee9] /u02/mysql-5.1.24/libexec/mysqld(handle_segfault+0x1ca) [0x81e803a] /lib/tls/libpthread.so.0 [0xa8fe48] /lib/tls/libc.so.6(abort+0x1d5) [0x3174e5] /u02/mysql-5.1.24/libexec/mysqld [0x84eca56] /u02/mysql-5.1.24/libexec/mysqld(TransporterFacade::init(unsigned int, ndb_mgm_configuration const*)+0x40) [0x84e1fc0] /u02/mysql-5.1.24/libexec/mysqld(TransporterFacade::start_instance(int, ndb_mgm_configuration const*)+0x21) [0x84e16dd] /u02/mysql-5.1.24/libexec/mysqld(Ndb_cluster_connection::connect(int, int, int)+0xa1) [0x848c005] /u02/mysql-5.1.24/libexec/mysqld(Ndb_cluster_connection_impl::connect_thread()+0x2a) [0x848c15e] /u02/mysql-5.1.24/libexec/mysqld(run_ndb_cluster_connection_connect_thread+0x24) [0x848a458] /u02/mysql-5.1.24/libexec/mysqld [0x84c97a4] /lib/tls/libpthread.so.0 [0xa89de8] /lib/tls/libc.so.6(__clone+0x5a) [0x3ca93a] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 080806 18:28:31 mysqld_safe Number of processes running now: 0 080806 18:28:31 mysqld_safe mysqld restarted 080806 18:28:31 [Warning] option 'thread_stack': unsigned value 65536 adjusted to 131072 080806 18:28:32 InnoDB: Started; log sequence number 0 569278 Failed to alloc epoll-array or calling epoll_create...giving up!: Function not implemented 080806 18:28:32 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16384 read_buffer_size=262144 max_used_connections=0 max_threads=151 threads_connected=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 49284 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... /u02/mysql-5.1.24/libexec/mysqld(print_stacktrace+0x1d) [0x82f0ee9] /u02/mysql-5.1.24/libexec/mysqld(handle_segfault+0x1ca) [0x81e803a] /lib/tls/libpthread.so.0 [0x304e48] /lib/tls/libc.so.6(abort+0x1d5) [0x15c4e5] /u02/mysql-5.1.24/libexec/mysqld [0x84eca56] /u02/mysql-5.1.24/libexec/mysqld(TransporterFacade::init(unsigned int, ndb_mgm_configuration const*)+0x40) [0x84e1fc0] /u02/mysql-5.1.24/libexec/mysqld(TransporterFacade::start_instance(int, ndb_mgm_configuration const*)+0x21) [0x84e16dd] /u02/mysql-5.1.24/libexec/mysqld(Ndb_cluster_connection::connect(int, int, int)+0xa1) [0x848c005] /u02/mysql-5.1.24/libexec/mysqld(ndbcluster_connect(int (*)())+0x176) [0x8346932] /u02/mysql-5.1.24/libexec/mysqld [0x8332599] /u02/mysql-5.1.24/libexec/mysqld(ha_initialize_handlerton(st_plugin_int*)+0x142) [0x82a58de] /u02/mysql-5.1.24/libexec/mysqld [0x8319e62] /u02/mysql-5.1.24/libexec/mysqld(plugin_init(int*, char**, int)+0x358) [0x8318248] /u02/mysql-5.1.24/libexec/mysqld [0x81ebb33] /u02/mysql-5.1.24/libexec/mysqld(main+0x1d5) [0x81e8849] /lib/tls/libc.so.6(__libc_start_main+0xda) [0x14878a] /u02/mysql-5.1.24/libexec/mysqld(__fxstat64+0x81) [0x814d745] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 080806 18:28:32 mysqld_safe mysqld from pid file /u02/mysql-5.1.24/var/dbserver.pid ended
[7 Aug 2008 1:17]
Kai Voigt
Small side note from epoll man page: epoll(4) is a new API introduced in Linux kernel 2.5.44. Its interface should be finalized in Linux kernel 2.5.66
[7 Aug 2008 5:53]
Kai Voigt
After patching, ndbd still doesn't work: [root@dbserver mysql-5.1.24]# libexec/ndbd Failed to alloc epoll-array or calling epoll_create...!: Function not implemented Segmentation fault (core dumped)
[7 Aug 2008 9:01]
Chooi Ting Goon
I still faced the same error, which are: ======================================================================= -bash-2.05b$ libexec/ndbd --initial Failed to alloc epoll-array or calling epoll_create...!: Function not implemented Segmentation fault (core dumped) ======================================================================= The steps that I have done: 1) Patch the TransporterRegistry.cpp [root@dbserver mysql-5.1.24-ndb-6.3.14-telco]# patch -p0 < epoll.diff patching file storage/ndb/src/common/transporter/TransporterRegistry.cpp 2) # ./configure --with-plugins=max --prefix=/u02/mysql-5.1.24 3) # make 4) # make install 5) start mysql server 6) start ndb_mgmd 7) start ndbd --initial ---> where the problelm occured. Note: could it be that there is multiple version of mysql running in this machine ? as I have multiple of my.cnf, my-5.0.cnf and my-5.1.24.cnf. and I started the server by pointing to the respective cnf file. Please advise.
[7 Aug 2008 9:19]
Bernd Ocklin
Chooi Ting Goon, thank you for trying the patch. Can you deliver a core file and the exact version you have been using?
[7 Aug 2008 10:00]
Chooi Ting Goon
ndbd core file
Virus scan engine found a threat. This file might be infected. Attachment: core.2318.gz (application/x-gzip-compressed, text), 83.16 KiB.
[7 Aug 2008 10:00]
Chooi Ting Goon
Hi Bernhard, Please refer to attach core.2318.gz. Thanks. rgds
[7 Aug 2008 13:27]
Bernd Ocklin
Chooi Ting Goon, can you upload the backtrace directly?
[8 Aug 2008 2:02]
Chooi Ting Goon
(gdb) bt #0 0x082592b1 in TransporterRegistry::TransporterRegistry () #1 0x0824ceac in __static_initialization_and_destruction_0 () #2 0x0824d722 in global constructors keyed to theEmulatedJam () #3 0x082ad515 in __do_global_ctors_aux () #4 0x080ae48d in _init () #5 0x082ad48a in __libc_csu_init () #6 0x0095873b in __libc_start_main () from /lib/tls/libc.so.6 #7 0x080af061 in _start ()
[8 Aug 2008 10:22]
Bernd Ocklin
Rad Hat 2.4.21 has epoll_create function in glibc even if the kernel doesn't support the epoll API. Also head file exists. This confuses ./configure. Easiest workaround is to edit include/config.h and include/my_config.h _after_ ./configure run and comment this line out: #define HAVE_EPOLL_CREATE 1 Run make again and it works.
[11 Aug 2008 8:51]
Bernd Ocklin
http://lists.mysql.com/commits/51196
[13 Aug 2008 19:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/51562 2667 Jonas Oreland 2008-08-13 ndb - bug#38592 - handle epoll_create returning ENOSYS by falling back on select
[13 Aug 2008 20:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/51573 2668 Jonas Oreland 2008-08-13 [merge] merge 6.2 to 6.3
[13 Aug 2008 20:29]
Jonas Oreland
pushed to 63 and 64
[14 Aug 2008 7:15]
Chooi Ting Goon
Hi, Ndbd seems not able to start up after I have done the follow steps that you have mentioned previously. Below is the steps that I have done to recompile mysq-5.1.24-ndb: 1) # ./configure --with-plugins=max --prefix=/u02/mysql-5.1.24 2) # make 3) # make install 4) start mysql server 5) start ndb_mgmd 6) start ndbd --initial ---> wait for very long time. Below is the log from ndb_1_cluster.log 2008-08-14 15:11:35 [MgmSrvr] INFO -- Node 2: Initial start, waiting for 000 0000000000008 to connect, nodes [ all: 000000000000000c connected: 000000000000 0004 no-wait: 0000000000000000 ] 2008-08-14 15:11:38 [MgmSrvr] INFO -- Node 2: Initial start, waiting for 000 0000000000008 to connect, nodes [ all: 000000000000000c connected: 000000000000 0004 no-wait: 0000000000000000 ] 2008-08-14 15:11:41 [MgmSrvr] INFO -- Node 2: Initial start, waiting for 000 0000000000008 to connect, nodes [ all: 000000000000000c connected: 000000000000 0004 no-wait: 0000000000000000 ] 2008-08-14 15:11:44 [MgmSrvr] INFO -- Node 2: Initial start, waiting for 000 0000000000008 to connect, nodes [ all: 000000000000000c connected: 000000000000 0004 no-wait: 0000000000000000 ] 2008-08-14 15:11:47 [MgmSrvr] INFO -- Node 2: Initial start, waiting for 000 0000000000008 to connect, nodes [ all: 000000000000000c connected: 000000000000 0004 no-wait: 0000000000000000 ] rgds, Chooi Ting
[11 Sep 2008 19:40]
Jon Stephens
Documented bugfix in the NDB 6.2.16 and 6.3.17 changelogs as follows: ndb_mgmd failed to start on older Linux distributions (2.4 kernels) that did not support e-polling.
[5 Oct 2008 16:30]
Jon Stephens
Already documented; closed.