Bug #24745 | InnoDB semaphore wait timeout/crash - deadlock waiting for itself | ||
---|---|---|---|
Submitted: | 1 Dec 2006 7:00 | Modified: | 4 Jan 2007 16:17 |
Reporter: | Arjen Lentz | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 4.0.26, 5.0.38 | OS: | Linux (Linux) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
Tags: | crash, deadlock, innodb, Semaphore |
[1 Dec 2006 7:00]
Arjen Lentz
[1 Dec 2006 13:19]
Heikki Tuuri
It may be that thread 196621 has initiated a read of the page from the data file to the buffer pool. Then the situation would look sensible. InnoDB's X-latch is not re-entrant for an S-latch request of the same thread.
[1 Dec 2006 14:45]
Mark Callaghan
I don't understand this code, but in what cases would a thread acquire an X-lock on the latch? This thread is only doing queries.
[1 Dec 2006 14:55]
Heikki Tuuri
Mark, it was X-latched in this function: MySQL-4.1.16, buf0buf.c: /************************************************************************ Function which inits a page for read to the buffer buf_pool. If the page is (1) already in buf_pool, or (2) if we specify to read only ibuf pages and the page is not an ibuf page, or (3) if the space is deleted or being deleted, then this function does nothing. Sets the io_fix flag to BUF_IO_READ and sets a non-recursive exclusive lock on the buffer frame. The io-handler must take care that the flag is cleared and the lock released later. This is one of the functions which perform the state transition NOT_USED => FILE_PAGE to a block (the other is buf_page_create). */ buf_block_t* buf_page_init_for_read( /*===================*/ /* out: pointer to the block or NULL */ ulint* err, /* out: DB_SUCCESS or DB_TABLESPACE_DELETED */ ulint mode, /* in: BUF_READ_IBUF_PAGES_ONLY, ... */ ulint space, /* in: space id */ ib_longlong tablespace_version,/* in: prevents reading from a wrong version of the tablespace in case we have done DISCARD + IMPORT */ ulint offset) /* in: page number */ { ... block->io_fix = BUF_IO_READ; buf_pool->n_pend_reads++; /* We set a pass-type x-lock on the frame because then the same thread which called for the read operation (and is running now at this point of code) can wait for the read to complete by waiting for the x-lock on the frame; if the x-lock were recursive, the same thread would illegally get the x-lock before the page read is completed. The x-lock is cleared by the io-handler thread. */ rw_lock_x_lock_gen(&(block->lock), BUF_IO_READ); mutex_exit(&(buf_pool->mutex)); if (mode == BUF_READ_IBUF_PAGES_ONLY) { mtr_commit(&mtr); } return(block); } We use a 'non-recursive' X-latch to protect the buffer frame while the page is being read in. From this analysis I can make a guess: the page read is starved, or Linux has hung in a pread() call. Please email me the whole .err log gzipped, if you have it :). Regards, Heikki
[1 Dec 2006 23:57]
Mark Callaghan
OK. Does this mean that the session made a prefetch request for this page and then waits for a background thread to handle the request? This mysqld is unmodified with respect to the background IO threads. In one of the *.err files I have checked, the read background thread was idle. I will soon provide the err files.
[12 Dec 2006 12:41]
Heikki Tuuri
Some output from the customer: " -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) ev set I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 51, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 >>> " The above suggests that this is a Linux bug in condition variables: the I/O thread does not proceed though its 'event' is set. Or a bug in InnoDB's 'event' implementation. Or a memory corruption bug in InnoDB that corrupts a thread stack or the event struct.
[3 Jan 2007 19:33]
Daniel Fiske
I am getting same behaviour in v5.0.26, Win32. Attached are my error logs and variable settings.
[3 Jan 2007 19:44]
Daniel Fiske
Could someone please attach the following file http://www.filefactory.com/file/155af8/ auto_increment_increment,1 auto_increment_offset,1 automatic_sp_privileges,ON back_log,50 basedir,C:\\mysql\\ binlog_cache_size,32768 bulk_insert_buffer_size,8388608 character_set_client,latin1 character_set_connection,latin1 character_set_database,latin1 character_set_filesystem,binary character_set_results,latin1 character_set_server,latin1 character_set_system,utf8 character_sets_dir,C:\\mysql\\share\\charsets\\ collation_connection,latin1_swedish_ci collation_database,latin1_swedish_ci collation_server,latin1_swedish_ci completion_type,0 concurrent_insert,1 connect_timeout,5 datadir,C:\\mysql\\Data\\ date_format,%Y-%m-%d datetime_format,%Y-%m-%d %H:%i:%s default_week_format,0 delay_key_write,ON delayed_insert_limit,100 delayed_insert_timeout,300 delayed_queue_size,1000 div_precision_increment,4 engine_condition_pushdown,OFF expire_logs_days,0 flush,OFF flush_time,1800 ft_boolean_syntax,+ -><()~*:""&| ft_max_word_len,84 ft_min_word_len,4 ft_query_expansion_limit,20 ft_stopword_file,(built-in) group_concat_max_len,1024 have_archive,YES have_bdb,NO have_blackhole_engine,NO have_compress,YES have_crypt,NO have_csv,NO have_dynamic_loading,YES have_example_engine,NO have_federated_engine,NO have_geometry,YES have_innodb,YES have_isam,NO have_merge_engine,YES have_ndbcluster,NO have_openssl,DISABLED have_query_cache,YES have_raid,NO have_rtree_keys,YES have_symlink,YES init_connect, init_file, init_slave, innodb_additional_mem_pool_size,10485760 innodb_autoextend_increment,8 innodb_buffer_pool_awe_mem_mb,0 innodb_buffer_pool_size,536870912 innodb_checksums,ON innodb_commit_concurrency,0 innodb_concurrency_tickets,500 innodb_data_file_path,ibdata1:10M:autoextend innodb_data_home_dir, innodb_doublewrite,ON innodb_fast_shutdown,1 innodb_file_io_threads,4 innodb_file_per_table,ON innodb_flush_log_at_trx_commit,1 innodb_flush_method, innodb_force_recovery,0 innodb_lock_wait_timeout,50 innodb_locks_unsafe_for_binlog,OFF innodb_log_arch_dir, innodb_log_archive,OFF innodb_log_buffer_size,5242880 innodb_log_file_size,268435456 innodb_log_files_in_group,2 innodb_log_group_home_dir,.\\ innodb_max_dirty_pages_pct,90 innodb_max_purge_lag,0 innodb_mirrored_log_groups,1 innodb_open_files,300 innodb_support_xa,ON innodb_sync_spin_loops,20 innodb_table_locks,ON innodb_thread_concurrency,8 innodb_thread_sleep_delay,10000 interactive_timeout,28800 join_buffer_size,131072 key_buffer_size,13631488 key_cache_age_threshold,300 key_cache_block_size,1024 key_cache_division_limit,100 language,C:\\mysql\\share\\english\\ large_files_support,ON large_page_size,0 large_pages,OFF lc_time_names,en_US license,GPL local_infile,ON log,OFF log_bin,ON log_bin_trust_function_creators,OFF log_error,.\error.log log_queries_not_using_indexes,OFF log_slave_updates,OFF log_slow_queries,ON log_warnings,1 long_query_time,10 low_priority_updates,OFF lower_case_file_system,ON lower_case_table_names,1 max_allowed_packet,1048576 max_binlog_cache_size,4294967295 max_binlog_size,1073741824 max_connect_errors,10 max_connections,800 max_delayed_threads,20 max_error_count,64 max_heap_table_size,16777216 max_insert_delayed_threads,20 max_join_size,4294967295 max_length_for_sort_data,1024 max_prepared_stmt_count,16382 max_relay_log_size,0 max_seeks_for_key,4294967295 max_sort_length,1024 max_sp_recursion_depth,0 max_tmp_tables,32 max_user_connections,0 max_write_lock_count,4294967295 multi_range_count,256 myisam_data_pointer_size,6 myisam_max_sort_file_size,107374182400 myisam_recover_options,OFF myisam_repair_threads,1 myisam_sort_buffer_size,30408704 myisam_stats_method,nulls_unequal net_buffer_length,16384 net_read_timeout,30 net_retry_count,10 net_write_timeout,60 new,OFF old_passwords,OFF open_files_limit,2048 optimizer_prune_level,1 optimizer_search_depth,62 port,3306 preload_buffer_size,32768 prepared_stmt_count,0 protocol_version,10 query_alloc_block_size,8192 query_cache_limit,1048576 query_cache_min_res_unit,4096 query_cache_size,84934656 query_cache_type,ON query_cache_wlock_invalidate,OFF query_prealloc_size,8192 range_alloc_block_size,2048 read_buffer_size,61440 read_only,OFF read_rnd_buffer_size,258048 relay_log_purge,ON relay_log_space_limit,0 rpl_recovery_rank,0 secure_auth,OFF shared_memory,OFF shared_memory_base_name,MYSQL server_id,1 skip_external_locking,ON skip_networking,OFF skip_show_database,OFF slave_compressed_protocol,OFF slave_load_tmpdir,C:\\WINDOWS\\TEMP\\ slave_net_timeout,3600 slave_skip_errors,OFF slave_transaction_retries,10 slow_launch_time,2 sort_buffer_size,262136 sql_big_selects,ON sql_mode, sql_notes,ON sql_warnings,OFF ssl_ca, ssl_capath, ssl_cert, ssl_cipher, ssl_key, storage_engine,InnoDB sync_binlog,0 sync_frm,ON system_time_zone,GMT Standard Time table_cache,619 table_lock_wait_timeout,50 table_type,InnoDB thread_cache_size,38 thread_stack,196608 time_format,%H:%i:%s time_zone,SYSTEM timed_mutexes,OFF tmp_table_size,30408704 tmpdir,C:\\WINDOWS\\TEMP\\ transaction_alloc_block_size,8192 transaction_prealloc_size,4096 tx_isolation,REPEATABLE-READ updatable_views_with_limit,YES version,5.0.26-community-log version_comment,MySQL Community Edition (GPL) version_compile_machine,ia32 version_compile_os,Win32 wait_timeout,28800
[3 Jan 2007 19:46]
Daniel Fiske
Essentially the behaviour is not a crash for me, but there appears to be a VERY long lock on queries that should normally take fractions of a second to run.
[4 Jan 2007 16:17]
Heikki Tuuri
Daniel, long queries do not fit the description of this bug. A crash should occur. I am setting this as 'Can't repeat' until someone can repeat this inside gdb. Then we might be able to see if this is a Linux bug. Regards, Heikki
[8 Mar 2007 16:29]
Mark Callaghan
This has occurred once in 2007, so it isn't a problem for us and the status should remain as 'Can't repeat'. A few details from the *.err log: 1) there is pending io, see 'Pending preads 1' 2) the event is set for the read thread, see 'ev set' 3) there is pending background IO, see 'Pending normal aio reads' 4) the pending IO doesn't get done, the many dumps for INNODB MONITOR OUTPUT show the same values for pending IO I have enhanced 'show innodb status' and the background IO thread code, so this output is slightly different than regular output. This problem occurs on our modified code and on unmodified code. This problem also occurs on MySQL compiled with gcc -O2 and -O3. InnoDB: Warning: a long semaphore wait: --Thread 294931 has waited at btr0cur.c line 401 for 241.00 seconds the semaphore: S-lock on RW-latch at 0x5513c99c created in file buf0buf.c line 436 a writer (thread id 294931) has reserved it in mode exclusive number of readers 0, waiters flag 1 Last time read locked in file btr0sea.c line 767 Last time write locked in file buf0buf.c line 1399 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 1, pwrites 0 ===================================== 070226 4:15:31 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 30 seconds ---------- BACKGROUND THREAD ---------- srv_master_thread loops: 804690 1_second, 804690 sleeps, 79256 10_second, 36773 background, 34148 flush srv_master_thread log flush: 804703 sync, 484270 async fsync callers: 370602 buffer pool, 263 other, 3 checkpoint, 155301 log aio, 759723 log sync, 0 archive ---------- SEMAPHORES ---------- Lock wait timeouts 0 OS WAIT ARRAY INFO: reservation count 367152, signal count 366736 --Thread 294931 has waited at btr0cur.c line 401 for 251.00 seconds the semaphore: S-lock on RW-latch at 0x5513c99c created in file buf0buf.c line 436 a writer (thread id 294931) has reserved it in mode exclusive number of readers 0, waiters flag 1 Last time read locked in file btr0sea.c line 767 Last time write locked in file buf0buf.c line 1399 Mutex spin waits 2477333, rounds 9970695, OS waits 136483 RW-shared spins 285179, OS waits 138400; RW-excl spins 113374, OS waits 65099 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 2884 writes 0 requests 1447 io secs 0.455815 io msecs/request 0.315007 max_io_wait 50.997000 I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 106259 requests 106259 io secs 1.652383 io msecs/request 0.015551 max_io_wait 0.085000 I/O thread 2 state: doing file i/o (read thread) reads 2021051 writes 0 requests 712265 io secs 1609.268539 io msecs/request 2.259368 max_io_wait 2886.951000 ev set I/O thread 3 state: waiting for i/o request (write thread) reads 0 writes 11921132 requests 8166003 io secs 323.549647 io msecs/request 0.039622 max_io_wait 155.304000 Pending normal aio reads: 43, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 1; buffer pool: 0 6071976 OS file reads, 17207657 OS file writes, 1285888 OS fsyncs 1 pending preads, 0 pending pwrites 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ---- output from a later version of INNODB MONITOR OUTPUT ---------- SEMAPHORES ---------- Lock wait timeouts 0 OS WAIT ARRAY INFO: reservation count 367152, signal count 366736 --Thread 294931 has waited at btr0cur.c line 401 for 347.00 seconds the semaphore: S-lock on RW-latch at 0x5513c99c created in file buf0buf.c line 436 a writer (thread id 294931) has reserved it in mode exclusive number of readers 0, waiters flag 1 Last time read locked in file btr0sea.c line 767 Last time write locked in file buf0buf.c line 1399 Mutex spin waits 2477333, rounds 9970695, OS waits 136483 RW-shared spins 285179, OS waits 138400; RW-excl spins 113374, OS waits 65099 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 2884 writes 0 requests 1447 io secs 0.455815 io msecs/request 0.315007 max_io_wait 50.997000 I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 106259 requests 106259 io secs 1.652383 io msecs/request 0.015551 max_io_wait 0.085000 I/O thread 2 state: doing file i/o (read thread) reads 2021051 writes 0 requests 712265 io secs 1609.268539 io msecs/request 2.259368 max_io_wait 2886.951000 ev set I/O thread 3 state: waiting for i/o request (write thread) reads 0 writes 11921132 requests 8166003 io secs 323.549647 io msecs/request 0.039622 max_io_wait 155.304000 Pending normal aio reads: 43, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 1; buffer pool: 0 6071976 OS file reads, 17207657 OS file writes, 1285888 OS fsyncs 1 pending preads, 0 pending pwrites 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
[10 May 2007 10:31]
Davide Ferrari
Part of the error log, there are many many more semaphore locks
Attachment: error_log (application/octet-stream, text), 12.29 KiB.
[10 May 2007 10:33]
Davide Ferrari
Soory for the attachment without comment, I thought that this bugzilla were smarter ;) Anyway, here it has just happened a similiar semaphore lock (as you can see in the attachment), the platform is MySQL 5.0.2 on 2.6.16-gentoo-r7 #1 SMP Mon May 15 17:08:44 CEST 2006 x86_64 AMD Opteron(tm) Processor 275 GNU/Linux
[18 Apr 2009 19:54]
Alberto Termanini
I have the same problem under MAX OS X 10.6, with PHP 5.2.8 and MySQL 5.1.30. I have a script that to do all queries (all SELECTS, repeated many times, with different values) take about 10 minutes, with CPU usage at 30%. But, if i the same script, i insert new query to do (the same INSERT repeated many times with different values), the script take 5 HOURS to complete, with CPU USAGE AT 5% !! Why this happen? In addition, i have other scritps that do only SELECTS that olso take very long time and the CPU USAGE (Intel Core 2 Duo 1.9 GHZ, 1GB RAM) IS ALWAYS BELOW 5-7 %. The php script is launched via web browser on the same machine, so there are no other users that use the same script or files, and there is only my user connected to mysql. Why this happen? It's a VERY big problem, becouse we arn't able to open the use of this service to other users becouse no users can wait for 5 hours... Please help , below i report my "SHOW ENGINE INNODB STATUS" when the script is running. Many Thanks and Best Regards Alberto Termanini Molecolar and Regenerative Medicine University of Modena and Reggio Emilia ===================================== 090418 21:41:18 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 29 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 1086, signal count 1077 Mutex spin waits 0, rounds 38376, OS waits 473 RW-shared spins 259, OS waits 128; RW-excl spins 254, OS waits 235 ------------ TRANSACTIONS ------------ Trx id counter 0 215149356 Purge done for trx's n:o < 0 205433694 undo n:o < 0 0 History list length 4 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, OS thread id 2958069760 MySQL thread id 743, query id 147366 localhost root SHOW ENGINE INNODB STATUS ---TRANSACTION 0 215032778, not started, OS thread id 2957864960 MySQL thread id 142, query id 29910 localhost root ---TRANSACTION 0 215112761, not started, OS thread id 2957660160 MySQL thread id 23, query id 110631 localhost root ---TRANSACTION 0 215149355, ACTIVE 1 sec, OS thread id 2957455360 inserting, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 5 lock struct(s), heap size 320, 2 row lock(s), undo log entries 1 MySQL thread id 19, query id 147365 localhost root update INSERT INTO USER_CHIP_PROBE_VALUES (ID_USER_CHIP,ID_PROBE,PM,MM) VALUES (33,109514,14.0,10.0) -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 1; buffer pool: 1 199084 OS file reads, 114552 OS file writes, 75018 OS fsyncs 203.20 reads/s, 16714 avg bytes/read, 120.86 writes/s, 65.51 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 264, free list len 139, seg size 404, 79621 inserts, 15616 merged recs, 3072 merges Hash table size 34679, node heap has 1 buffer(s) 243.75 hash searches/s, 410.54 non-hash searches/s --- LOG --- Log sequence number 2 1157752940 Log flushed up to 2 1157752775 Last checkpoint at 2 1157573190 1 pending log writes, 0 pending chkp writes 73759 log i/o's done, 63.76 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 17403288; in additional pool allocated 1048576 Dictionary memory allocated 123776 Buffer pool size 512 Free buffers 0 Database pages 511 Modified db pages 190 Pending reads 0 Pending writes: LRU 106, flush list 0, single page 0 Pages read 200053, created 616, written 54790 207.34 reads/s, 0.34 creates/s, 81.86 writes/s Buffer pool hit rate 913 / 1000 -------------- ROW OPERATIONS -------------- 1 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB Main thread id 2957045760, state: flushing log Number of rows inserted 72493, updated 0, deleted 0, read 178973 62.38 inserts/s, 0.00 updates/s, 0.00 deletes/s, 62.38 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================
[6 May 2009 2:43]
Jesper Wisborg Krogh
We saw this today running MySQL 5.0.45 with PHP 5.1.6 on CentOS 5.2. I will attach the relevant information from the error log as well as the result of "SHOW FULL PROCESSLIST" shortly before the crash. Our monitoring shows that the CPU utilization dropped to close to zero during the incident. The memory and disk usage in the partition containing temporary storage did not change during the period except for what could be expected from the piling up of Apache threads waiting for the response from MySQL. The statistics from MySQL also shows that the number of queries per second and the InnoDB rows read/secs went towards zero.
[6 May 2009 11:47]
Alberto Termanini
Please add at OS the MAC OS X 10.5.6 Leopard. In addition i would like to report that a friend using MySQL 4.x and Linux present the same problem: CPU go max to 5% , disk usage is like null, memory usage is normal, but the PHP script take hours and hours to complete. Thanks very much Alberto Termanini
[12 Aug 2010 6:17]
Jesper Wisborg Krogh
We had a server that kept running into this issue. It happened 6-8 times over a period of a month with at most a week between two crashes. We then tried to run OPTIMIZE TABLE on some of the most fragmented tables. Three weeks have now passed without a crash.
[6 Jan 2012 9:00]
liu hickey
error log of mysqld
Attachment: alert.log (application/octet-stream, text), 202.47 KiB.
[6 Jan 2012 9:03]
liu hickey
We still hit this bug on MySQL-5.1.48 on Linux-2.6.18 platform. Error log file is attached. I doubt it might be a problem of pread of linux, but I am more concern whether there is a solution instead of aborting of mysqld while just rollback there problematic transaction or kill that thread? Or some other ideas?