Bug #25422 | Hang with log tables | ||
---|---|---|---|
Submitted: | 4 Jan 2007 18:37 | Modified: | 15 Jul 2008 16:21 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Logging | Severity: | S3 (Non-critical) |
Version: | 5.1.15-beta-debug-log | OS: | Linux (SUSE 10.0 / 64-bit) |
Assigned to: | Marc ALFF | CPU Architecture: | Any |
[4 Jan 2007 18:37]
Peter Gulutzan
[4 Jan 2007 22:39]
MySQL Verification Team
Thank you for the bug report. Verified on FC 6.0 32-bit.
[25 Jan 2007 16:51]
Peter Gulutzan
Here is another test case, suggested by Brian Aker: Again, start the server with slow queries: /usr/local/mysql/libexec/mysqld --user=root --skip-networking --log-slow-queries --log-slow-admin-statements --log-output=TABLE,FILE --log --log-queries-not-using-indexes Make two connections, T1 and T2: On T1, say: delimiter // create table ta (s1 int, s2 varchar(1000)); create index ia on ta (s1); create procedure p () begin declare v int default 0; while v < 1000000 do select v; insert into ta values (v,repeat('a',rand()*1000)); update ta set s1 = s1 - 1 where s1 = v - 1; update ta set s1 = s1 + 1 where s1 = v - 1; truncate mysql.general_log; set v = v + 1; end while; end// call p()// On T2, say: call p();
[7 Feb 2007 13:58]
Kristofer Pettersson
Prel. reporting: Thread 1: 983:T@6 : | | | | | | | | | >open_log_table 984:T@6 : | | | | | | | | | | >simple_open_n_lock_tables 985:T@6 : | | | | | | | | | | | >open_tables 986:T@6 : | | | | | | | | | | | | >init_alloc_root 987:T@6 : | | | | | | | | | | | | | enter: root: 0x4bcd738 988:T@6 : | | | | | | | | | | | | | >_mymalloc 989:T@6 : | | | | | | | | | | | | | | enter: Size: 8040 990:T@6 : | | | | | | | | | | | | | | exit: ptr: 0x1afc368 991:T@6 : | | | | | | | | | | | | | <_mymalloc 992:T@6 : | | | | | | | | | | | | <init_alloc_root 993:T@6 : | | | | | | | | | | | | >open_table 994:T@6 : | | | | | | | | | | | | | >hash_first 995:T@6 : | | | | | | | | | | | | | | exit: found key at 5 996:T@6 : | | | | | | | | | | | | | <hash_first 997:T@6 : | | | | | | | | | | | | | note: Found table 'mysql.slow_log' with different refresh version 998:T@6 : | | | | | | | | | | | | | >close_old_data_files 999:T@6 : | | | | | | | | | | | | | <close_old_data_files 1000:T@6 : | | | | | | | | | | | | | >wait_for_condition Thread dead locked waiting for LOCK_open on condition COND_refresh Function: open_table(THD *thd, TABLE_LIST *table_list={.. mysql.slow_log ..}, MEM_ROOT *mem_root) Thread 2: 922:T@8 : | | | | | | | | | | <lock_table_name 923:T@8 : | | | | | | | | | | >remove_table_from_cache 924:T@8 : | | | | | | | | | | | enter: Table: 'mysql.slow_log' flags: 0 925:T@8 : | | | | | | | | | | | >hash_first 926:T@8 : | | | | | | | | | | | | exit: found key at 5 927:T@8 : | | | | | | | | | | | <hash_first 928:T@8 : | | | | | | | | | | | info: Table was in use by current thread. db_stat: 0 929:T@8 : | | | | | | | | | | | info: Table was in use by other thread 930:T@8 : | | | | | | | | | | | info: Removing table from table_def_cache 931:T@8 : | | | | | | | | | | | >hash_first 932:T@8 : | | | | | | | | | | | <hash_first 933:T@8 : | | | | | | | | | | <remove_table_from_cache 934:T@8 : | | | | | | | | | | >start_waiting_global_read_lock 935:T@8 : | | | | | | | | | | <start_waiting_global_read_lock 936:T@8 : | | | | | | | | | <lock_and_wait_for_table_name <next instruction is LOGGER:lock()> Thread dead locked on mutex LOCK_logger
[13 Feb 2007 9:49]
Kristofer Pettersson
Windows stack traces becomes corrupted for unknown reason, linux stack trace looks like this for two threads: #0 0x003a5402 in __kernel_vsyscall () #1 0x00435876 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x083e0051 in safe_cond_wait (cond=0x85f8ba0, mp=0x85f9080, file=0x84996ac "sql_base.cc", line=1711) at thr_mutex.c:202 #3 0x0820c4dc in wait_for_condition (thd=0x9f3c168, mutex=0x85f9080, cond=0x85f8ba0) at sql_base.cc:1711 #4 0x0821585d in open_table (thd=0x9f3c168, table_list=0x9f3a4f4, mem_root=0xb7e866c8, refresh=0xb7e866ff, flags=0) at sql_base.cc:2026 #5 0x0821623e in open_tables (thd=0x9f3c168, start=0xb7e86764, counter=0xb7e86754, flags=0) at sql_base.cc:2926 #6 0x082167dc in simple_open_n_lock_tables (thd=0x9f3c168, tables=0x9f3a4f4) at sql_base.cc:3180 #7 0x08263677 in Log_to_csv_event_handler::open_log_table (this=0x9f3a340, log_table_type=1) at log.cc:284 #8 0x08263bcd in Log_to_csv_event_handler::reopen_log_table (this=0x9f3a340, log_table_type=1) at log.cc:377 #9 0x08264876 in LOGGER::reopen_log_table (this=0x85fe380, log_table_type=1) at log.cc:815 #10 0x0825ff93 in mysql_truncate (thd=0x9fc36b0, table_list=0x9fe2ca0, dont_send_ok=false) at sql_delete.cc:960 #11 0x081e42d2 in mysql_execute_command (thd=0x9fc36b0) at sql_parse.cc:3564 #12 0x0832fe99 in sp_instr_stmt::exec_core (this=0x9fe2e48, thd=0x9fc36b0, nextp=0xb7e87444) at sp_head.cc:2559 #13 0x0832fc57 in sp_lex_keeper::reset_lex_and_exec_core (this=0x9fe2e70, thd=0x9fc36b0, nextp=0xb7e87444, open_tables=false, instr=0x9fe2e48) at sp_head.cc:2428 #14 0x083339bd in sp_instr_stmt::execute (this=0x9fe2e48, thd=0x9fc36b0, nextp=0xb7e87444) at sp_head.cc:2510 #15 0x0832d823 in sp_head::execute (this=0x9fe2238, thd=0x9fc36b0) at sp_head.cc:1098 #16 0x08332149 in sp_head::execute_procedure (this=0x9fe2238, thd=0x9fc36b0, args=0x9fc3c0c) at sp_head.cc:1729 #17 0x081e7a25 in mysql_execute_command (thd=0x9fc36b0) at sql_parse.cc:4691 #18 0x081e984e in mysql_parse (thd=0x9fc36b0, inBuf=0x9fd9778 "call p1()", length=9) at sql_parse.cc:6155 #19 0x081ea263 in dispatch_command (command=COM_QUERY, thd=0x9fc36b0, packet=0x9fcd729 "", packet_length=10) at sql_parse.cc:1857 #20 0x081eb2ba in do_command (thd=0x9fc36b0) at sql_parse.cc:1626 #21 0x081eb763 in handle_one_connection (arg=0x9fc36b0) at sql_parse.cc:1232 #22 0x00433bd4 in start_thread () from /lib/libpthread.so.0 #23 0x001db4fe in clone () from /lib/libc.so.6 ****************************************************************************** #0 0x003a5402 in __kernel_vsyscall () #1 0x00437f5e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x00434c7d in _L_mutex_lock_36 () from /lib/libpthread.so.0 #3 0xb7eb77d8 in ?? () #4 0xb7e88bb0 in ?? () #5 0x00000003 in ?? () #6 0x083dfcf5 in safe_mutex_lock (mp=0x85fe380, file=0x8454982 "log.h", line=520) at thr_mutex.c:115 #7 0x081d6520 in LOGGER::lock (this=0x85fe380) at log.h:520 #8 0x0825fdff in mysql_truncate (thd=0x9f7b728, table_list=0x9fee860, dont_send_ok=false) at sql_delete.cc:923 #9 0x081e42d2 in mysql_execute_command (thd=0x9f7b728) at sql_parse.cc:3564 #10 0x0832fe99 in sp_instr_stmt::exec_core (this=0x9feea08, thd=0x9f7b728, nextp=0xb7eb8444) at sp_head.cc:2559 #11 0x0832fc57 in sp_lex_keeper::reset_lex_and_exec_core (this=0x9feea30, thd=0x9f7b728, nextp=0xb7eb8444, open_tables=false, instr=0x9feea08) at sp_head.cc:2428 #12 0x083339bd in sp_instr_stmt::execute (this=0x9feea08, thd=0x9f7b728, nextp=0xb7eb8444) at sp_head.cc:2510 #13 0x0832d823 in sp_head::execute (this=0x9feddf8, thd=0x9f7b728) at sp_head.cc:1098 #14 0x08332149 in sp_head::execute_procedure (this=0x9feddf8, thd=0x9f7b728, args=0x9f7bc84) at sp_head.cc:1729 #15 0x081e7a25 in mysql_execute_command (thd=0x9f7b728) at sql_parse.cc:4691 #16 0x081e984e in mysql_parse (thd=0x9f7b728, inBuf=0x9fc76e8 "call p1()", length=9) at sql_parse.cc:6155 #17 0x081ea263 in dispatch_command (command=COM_QUERY, thd=0x9f7b728, packet=0x9fbf689 "", packet_length=10) at sql_parse.cc:1857 #18 0x081eb2ba in do_command (thd=0x9f7b728) at sql_parse.cc:1626 #19 0x081eb763 in handle_one_connection (arg=0x9f7b728) at sql_parse.cc:1232 #20 0x00433bd4 in start_thread () from /lib/libpthread.so.0 #21 0x001db4fe in clone () from /lib/libc.so.6
[13 Feb 2007 17:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/19778 ChangeSet@1.2379, 2007-02-13 18:25:43+01:00, thek@kpdesk.mysql.com +1 -0 Bug#25422 Hang with log tables - Truncate table can dead lock - Moved mutex LOGGER:lock to include the table lock.
[14 Feb 2007 8:20]
Kristofer Pettersson
Note: Peters example with ALTER TABLES in function p2() doesn't dead lock, but end up in a tight look consuming 100% CPU (or in the case of my windows can; 110% CPU resulting in a hard reboot)
[14 Feb 2007 11:27]
Kristofer Pettersson
Stack trace on running halt: THD1: ntdll.dll!7c90110a() mysqld.exe!pthread_cond_timedwait(pthread_cond_t * cond=0x00c59d88,<snip> mysqld.exe!remove_table_from_cache(THD * thd=0x01a30ad0, const char <snip> mysqld.exe!wait_while_table_is_used(THD * thd=0x01a30ad0, st_table <snip> mysqld.exe!close_cached_table(THD * thd=0x01a30ad0, st_table * tabl <snip> mysqld.exe!mysql_alter_table(THD * thd=0x01a30ad0, char * new_db=0x0<snip> mysqld.exe!mysql_execute_command(THD * thd=0x01a30ad0) Line 3225 + <snip> mysqld.exe!sp_instr_stmt::exec_core(THD * thd=0x01a30ad0, unsigned i<snip> mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core(THD * thd=0x01a30a<snip> mysqld.exe!sp_instr_stmt::execute(THD * thd=0x01a30ad0, unsigned int<snip> mysqld.exe!sp_head::execute(THD * thd=0x01a30ad0) Line 1078 + 0x20 <snip> mysqld.exe!sp_head::execute_procedure(THD * thd=0x01a30ad0, List<Ite<snip> mysqld.exe!mysql_execute_command(THD * thd=0x01a30ad0) Line 4710 + <snip> mysqld.exe!mysql_parse(THD * thd=0x01a30ad0, char * inBuf=0x01ab9460<ship> mysqld.exe!dispatch_command(enum_server_command command=COM_QUERY, T<snip> mysqld.exe!do_command(THD * thd=0x01a30ad0) Line 1638 + 0x31 bytes <snip> mysqld.exe!handle_one_connection(void * arg=0x01a30ad0) Line 1239 +<snip> mysqld.exe!pthread_start(void * param=0x0039f3d8) Line 62 + 0x9 byt<snip> mysqld.exe!_callthreadstart() Line 293 + 0xf bytes C mysqld.exe!_threadstart(void * ptd=0x01a93238) Line 277 C THD2: mysqld.exe!wait_for_tables(THD * thd=0x01a35408) Line 2457 C++ mysqld.exe!mysql_lock_tables(THD * thd=0x01a35408, st_table * * table<snip> mysqld.exe!open_ltable(THD * thd=0x01a35408, st_table_list * table_li<snip> mysqld.exe!mysql_alter_table(THD * thd=0x01a35408, char * new_db=0x01<snip> mysqld.exe!mysql_execute_command(THD * thd=0x01a35408) Line 3225 + 0<snip> mysqld.exe!sp_instr_stmt::exec_core(THD * thd=0x01a35408, unsigned in<snip> mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core(THD * thd=0x01a3540<snip> mysqld.exe!sp_instr_stmt::execute(THD * thd=0x01a35408, unsigned int <snip> mysqld.exe!sp_head::execute(THD * thd=0x01a35408) Line 1078 + 0x20 b<snip> mysqld.exe!sp_head::execute_procedure(THD * thd=0x01a35408, List<Item<snip> mysqld.exe!mysql_execute_command(THD * thd=0x01a35408) Line 4710 + 0<snip> mysqld.exe!mysql_parse(THD * thd=0x01a35408, char * inBuf=0x01ade0f8,<snip> mysqld.exe!dispatch_command(enum_server_command command=COM_QUERY, TH<snip> mysqld.exe!do_command(THD * thd=0x01a35408) Line 1638 + 0x31 bytes <snip> mysqld.exe!handle_one_connection(void * arg=0x01a35408) Line 1239 + <snip> mysqld.exe!pthread_start(void * param=0x01a2fd18) Line 62 + 0x9 byte<snip> mysqld.exe!_callthreadstart() Line 293 + 0xf bytes C mysqld.exe!_threadstart(void * ptd=0x01a45590) Line 277 C
[28 Jun 2007 22:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29909 ChangeSet@1.2516, 2007-06-28 16:13:43-06:00, malff@weblab.(none) +32 -0 Bug#25422 (Hang with log tables) Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes a deadlock) NOTE TO REVIEWER: There is still an opened issue with this patch, see mysql-tests/t/disabled.def The test rpl_optimize dead locks, but it's unclear if this is because: - a new bug was introduced by this patch - an existing unrelated bug was exposed by this patch To discuss during review, this point needs further investigation. ------ Prior to this fix, the server would hang when performing concurrent ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES, which are mysql.general_log and mysql.slow_log. The root cause traces to the following code: in sql_base.cc, open_table() if (table->in_use != thd) { /* wait_for_condition will unlock LOCK_open for us */ wait_for_condition(thd, &LOCK_open, &COND_refresh); } The problem with this code is that the current implementation of the LOGGER creates 'fake' THD objects, like - Log_to_csv_event_handler::general_log_thd - Log_to_csv_event_handler::slow_log_thd which are not associated to a real thread running in the server, so that waiting for these non-existing threads to release table locks cause the dead lock. In general, the design of Log_to_csv_event_handler does not fit into the general architecture of the server, so that the concept of general_log_thd and slow_log_thd has to be abandoned: - this implementation does not work with table locking - it will not work with commands like SHOW PROCESSLIST - having the log tables always opened does not integrate well with DDL operations / FLUSH TABLES / SET GLOBAL READ_ONLY With this patch, the fundamental design of the LOGGER has been changed to: - always open and close a log table when writing a log - remove totally the usage of fake THD objects - clarify how locking of log tables is implemented in general. Changing the locking design for log tables had some impact in several areas. The total change consist of: Part I: Technical changes 1) bool TABLE_SHARE::system was too close to bool TABLE_SHARE::system_table, and has been renamed bool TABLE_SHARE::one_record_table 2) bool TABLE_SHARE::system_table and bool TABLE_SHARE::log_table are related. Instead of multiplying various "is_X" "is_Y" boolean flags, the attribute TABLE_CATEGORY TABLE_SHARE::table_category has been introduced, that better describe the nature of the table. 3) The enum TABLE_CATEGORY has been introduced, see the comments there. Special exceptions regarding FLUSH TABLES WITH READ LOCK or SET GLOBAL READ_ONLY=ON, in lock.cc, are now implemented with more clarity with the use of honor_global_locks(). In general, helpers like honor_XYZ() are encouraged, as this breaks the correlation between - how a table behaves with regards to 1 property (global read lock, ...) - what the table is used for (user, system, ...) 4) my_bool TABLE_SHARE::locked_by_logger has been removed, this was an artificial hack. 5) The previous implementation confused: - 'system table' with ENGINE = MYISAM - 'log table' with ENGINE = CSV so that logic related to enforcing constraints for what the SQL layer considers 'system' or 'log' tables was not implemented in the SQL layer but in the storage engines. The methods: - handler::check_if_locking_is_allowed - handler::check_if_log_table_locking_is_allowed - ha_tina::check_if_locking_is_allowed - ha_myisam::check_if_locking_is_allowed have been removed, and the logic implemented in lock.cc instead. 6) The fake threads, Log_to_csv_event_handler::general_log_thd and Log_to_csv_event_handler::slow_log_thd, are removed 7) In general, a lot of code in log.h / log.cc dealing with opening / closing tables has been removed. 8) Writing to the log tables is still a privileged operation. Before, it was enforced by testing that the 'current thread is the logger thread'. Now, it's enforced by the use of the internal flag OPTION_INTERNAL_PRIVILEGE 9) TRUNCATE TABLE does not interfere with the logger implementation any more, all the code used to 'shutdown the logger, truncate, then restart the logger' is removed, and the operation is online. TRUNCATE TABLE uses OPTION_INTERNAL_PRIVILEGE, since it may need to use write locks. 10) RENAME TABLE does not interfere with the logger implementation any more, regular table locking rules should apply even with log tables. 11) Special rules for log tables in ALTER TABLES have been implemented in a dedicated helper/wrapper, for clarity. mysql_alter_table() uses OPTION_INTERNAL_PRIVILEGE, then invokes the ALTER TABLE implementation: mysql_alter_table_impl() This ensures that -- given the complexity of mysql_alter_table_impl() -- every path of execution properly unset OPTION_INTERNAL_PRIVILEGE at the end. Note that the operation is online, as the logger is not locked. Part II: functional changes 12) The previous implementation enforced that : - LOCK TABLE mysql.general_log WRITE - LOCK TABLE mysql.general_log READ was prohibited (even for SUPER), while - LOCK TABLE mysql.general_log READ LOCAL was allowed. The rational behind this was that WRITE and READ would freeze the server when running with SET GLOBAL GENERAL_LOG=ON, and should (rightfully) be prohibited. Allowing READ LOCAL was assuming that no dead locks can occur because of this lock, so that it's not harmful. However, with the following scenario: THD1: LOCK TABLE mysql.general_log READ LOCAL; THD2: TRUNCATE TABLE mysql.general_log; THD1: <any query, even "select 1 from dual> --> deadlock. The dead lock occurs because TRUNCATE uses a name lock and waits for *all* locks to disappear. THD1 dead locks while writing to the general_log, which can't get a write lock because of TRUNCATE. So, allowing even READ LOCAL on log tables can lead to a complete dead lock of the server, where 'complete' means : - new clients can't connect (stuck while printing in the general log) - new queries from existing clients can't execute (same) - the server does not respond to kill -15 (as seen during testing) The functional change introduced by this patch is to deny *any* form of locking what-so-ever on log tables, so that : LOCK TABLE mysql.general_log READ LOCAL is now denied. 13) FLUSH LOGS does not flush the log tables any more. 14) FLUSH TABLES does flush the log tables, it used to ignore them. Part III: more technical changes 15) Implementing the restriction 12) above exposed a bug in mysqldump, which would ignore some tables in dump_all_tables_in_db(), but forget to ignore the same in dump_all_views_in_db(). This has been fixed. 16) mysqldump would also issue an empty "LOCK TABLE" command when all the tables to lock are to be ignored (numrows == 0), instead of not issuing the query. This has been fixed. 17) Internal errors handlers could intercept errors but not warnings (see sql_error.cc), this has been fixed. It was found because writing to the general log or the slow log needs to filter out any error or warning, to avoid affecting the client performing the query with unrelated noise.
[20 Jul 2007 1:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/31209 ChangeSet@1.2516, 2007-07-19 19:43:15-06:00, malff@weblab.(none) +31 -0 WL#3984 (Revise locking of mysql.general_log and mysql.slow_log) Bug#25422 (Hang with log tables) Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes a deadlock) NOTE TO REVIEWER: There is still an opened issue with this patch, see mysql-tests/t/disabled.def The test rpl_optimize dead locks, but it's unclear if this is because: - a new bug was introduced by this patch - an existing unrelated bug was exposed by this patch To discuss during review, this point needs further investigation. ------ Prior to this fix, the server would hang when performing concurrent ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES, which are mysql.general_log and mysql.slow_log. The root cause traces to the following code: in sql_base.cc, open_table() if (table->in_use != thd) { /* wait_for_condition will unlock LOCK_open for us */ wait_for_condition(thd, &LOCK_open, &COND_refresh); } The problem with this code is that the current implementation of the LOGGER creates 'fake' THD objects, like - Log_to_csv_event_handler::general_log_thd - Log_to_csv_event_handler::slow_log_thd which are not associated to a real thread running in the server, so that waiting for these non-existing threads to release table locks cause the dead lock. In general, the design of Log_to_csv_event_handler does not fit into the general architecture of the server, so that the concept of general_log_thd and slow_log_thd has to be abandoned: - this implementation does not work with table locking - it will not work with commands like SHOW PROCESSLIST - having the log tables always opened does not integrate well with DDL operations / FLUSH TABLES / SET GLOBAL READ_ONLY With this patch, the fundamental design of the LOGGER has been changed to: - always open and close a log table when writing a log - remove totally the usage of fake THD objects - clarify how locking of log tables is implemented in general. Changing the locking design for log tables had some impact in several areas. The total change consist of: Part I: Technical changes 1) bool TABLE_SHARE::system was too close to bool TABLE_SHARE::system_table, and has been renamed bool TABLE_SHARE::one_record_table 2) bool TABLE_SHARE::system_table and bool TABLE_SHARE::log_table are related. Instead of multiplying various "is_X" "is_Y" boolean flags, the attribute TABLE_CATEGORY TABLE_SHARE::table_category has been introduced, that better describe the nature of the table. 3) The enum TABLE_CATEGORY has been introduced, see the comments there. Special exceptions regarding FLUSH TABLES WITH READ LOCK or SET GLOBAL READ_ONLY=ON, in lock.cc, are now implemented with more clarity with the use of honor_global_locks(). In general, helpers like honor_XYZ() are encouraged, as this breaks the correlation between - how a table behaves with regards to 1 property (global read lock, ...) - what the table is used for (user, system, ...) 4) my_bool TABLE_SHARE::locked_by_logger has been removed, this was an artificial hack. 5) The previous implementation confused: - 'system table' with ENGINE = MYISAM - 'log table' with ENGINE = CSV so that logic related to enforcing constraints for what the SQL layer considers 'system' or 'log' tables was not implemented in the SQL layer but in the storage engines. The methods: - handler::check_if_locking_is_allowed - handler::check_if_log_table_locking_is_allowed - ha_tina::check_if_locking_is_allowed - ha_myisam::check_if_locking_is_allowed have been removed, and the logic implemented in lock.cc instead. 6) The fake threads, Log_to_csv_event_handler::general_log_thd and Log_to_csv_event_handler::slow_log_thd, are removed 7) In general, a lot of code in log.h / log.cc dealing with opening / closing tables has been removed. 8) Writing to the log tables is still a privileged operation. Before, it was enforced by testing that the 'current thread is the logger thread'. Now, it's enforced by the use of the internal flag OPTION_INTERNAL_PRIVILEGE 9) TRUNCATE TABLE does not interfere with the logger implementation any more, all the code used to 'shutdown the logger, truncate, then restart the logger' is removed, and the operation is online. TRUNCATE TABLE uses OPTION_INTERNAL_PRIVILEGE, since it may need to use write locks. 10) RENAME TABLE does not interfere with the logger implementation any more, regular table locking rules should apply even with log tables. 11) Special rules for log tables in ALTER TABLES have been implemented in a dedicated helper/wrapper, for clarity. mysql_alter_table() uses OPTION_INTERNAL_PRIVILEGE, then invokes the ALTER TABLE implementation: mysql_alter_table_impl() This ensures that -- given the complexity of mysql_alter_table_impl() -- every path of execution properly unset OPTION_INTERNAL_PRIVILEGE at the end. Note that the operation is online, as the logger is not locked. Part II: functional changes 12) The previous implementation enforced that : - LOCK TABLE mysql.general_log WRITE - LOCK TABLE mysql.general_log READ was prohibited (even for SUPER), while - LOCK TABLE mysql.general_log READ LOCAL was allowed. The rational behind this was that WRITE and READ would freeze the server when running with SET GLOBAL GENERAL_LOG=ON, and should (rightfully) be prohibited. Allowing READ LOCAL was assuming that no dead locks can occur because of this lock, so that it's not harmful. However, with the following scenario: THD1: LOCK TABLE mysql.general_log READ LOCAL; THD2: TRUNCATE TABLE mysql.general_log; THD1: <any query, even "select 1 from dual> --> deadlock. The dead lock occurs because TRUNCATE uses a name lock and waits for *all* locks to disappear. THD1 dead locks while writing to the general_log, which can't get a write lock because of TRUNCATE. So, allowing even READ LOCAL on log tables can lead to a complete dead lock of the server, where 'complete' means : - new clients can't connect (stuck while printing in the general log) - new queries from existing clients can't execute (same) - the server does not respond to kill -15 (as seen during testing) The functional change introduced by this patch is to deny *any* form of locking what-so-ever on log tables, so that : LOCK TABLE mysql.general_log READ LOCAL is now denied. 13) FLUSH LOGS does not flush the log tables any more. 14) FLUSH TABLES does flush the log tables, it used to ignore them. Part III: more technical changes 15) Implementing the restriction 12) above exposed a bug in mysqldump, which would ignore some tables in dump_all_tables_in_db(), but forget to ignore the same in dump_all_views_in_db(). This has been fixed. 16) mysqldump would also issue an empty "LOCK TABLE" command when all the tables to lock are to be ignored (numrows == 0), instead of not issuing the query. This has been fixed. 17) Internal errors handlers could intercept errors but not warnings (see sql_error.cc), this has been fixed. It was found because writing to the general log or the slow log needs to filter out any error or warning, to avoid affecting the client performing the query with unrelated noise. 18) Implementing a nested call to open tables, for the performance schema tables, exposed an existing bug in remove_table_from_cache(), which would perform: in_use->some_tables_deleted=1; against another thread, without any consideration about thread locking. This call inside remove_table_from_cache() was not required anyway, since calling mysql_lock_abort() takes care of aborting -- cleanly -- threads that might hold a lock on a table. This line (in_use->some_tables_deleted=1) has been removed.
[24 Jul 2007 2:37]
Marc ALFF
Correction about the last patch CSET comments: The paragraph " NOTE TO REVIEWER: There is still an opened issue with this patch, see mysql-tests/t/disabled.def The test rpl_optimize dead locks, but it's unclear if this is because: - a new bug was introduced by this patch - an existing unrelated bug was exposed by this patch To discuss during review, this point needs further investigation. " should be ignored, as the issue with rpl_optimize has been resolved. Also, the patch happen to also fix the following bugs: - Bug#17876 - Bug#23044 for which test cases will be added to the patch.
[25 Jul 2007 20:44]
Konstantin Osipov
Sent a review over email.
[27 Jul 2007 6:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/31673 ChangeSet@1.2553, 2007-07-27 00:31:06-06:00, malff@weblab.(none) +39 -0 WL#3984 (Revise locking of mysql.general_log and mysql.slow_log) Bug#25422 (Hang with log tables) Bug 17876 (Truncating mysql.slow_log in a SP after using cursor locks the thread) Bug 23044 (Warnings on flush of a log table) Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes a deadlock) Prior to this fix, the server would hang when performing concurrent ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES, which are mysql.general_log and mysql.slow_log. The root cause traces to the following code: in sql_base.cc, open_table() if (table->in_use != thd) { /* wait_for_condition will unlock LOCK_open for us */ wait_for_condition(thd, &LOCK_open, &COND_refresh); } The problem with this code is that the current implementation of the LOGGER creates 'fake' THD objects, like - Log_to_csv_event_handler::general_log_thd - Log_to_csv_event_handler::slow_log_thd which are not associated to a real thread running in the server, so that waiting for these non-existing threads to release table locks cause the dead lock. In general, the design of Log_to_csv_event_handler does not fit into the general architecture of the server, so that the concept of general_log_thd and slow_log_thd has to be abandoned: - this implementation does not work with table locking - it will not work with commands like SHOW PROCESSLIST - having the log tables always opened does not integrate well with DDL operations / FLUSH TABLES / SET GLOBAL READ_ONLY With this patch, the fundamental design of the LOGGER has been changed to: - always open and close a log table when writing a log - remove totally the usage of fake THD objects - clarify how locking of log tables is implemented in general. See WL#3984 for details related to the new locking design. Additional changes (misc bugs exposed and fixed): 1) mysqldump which would ignore some tables in dump_all_tables_in_db(), but forget to ignore the same in dump_all_views_in_db(). 2) mysqldump would also issue an empty "LOCK TABLE" command when all the tables to lock are to be ignored (numrows == 0), instead of not issuing the query. 3) Internal errors handlers could intercept errors but not warnings (see sql_error.cc). 4) Implementing a nested call to open tables, for the performance schema tables, exposed an existing bug in remove_table_from_cache(), which would perform: in_use->some_tables_deleted=1; against another thread, without any consideration about thread locking. This call inside remove_table_from_cache() was not required anyway, since calling mysql_lock_abort() takes care of aborting -- cleanly -- threads that might hold a lock on a table. This line (in_use->some_tables_deleted=1) has been removed.
[27 Jul 2007 17:41]
Konstantin Osipov
Sent review over email (second).
[1 Aug 2007 23:27]
Bugs System
Pushed into 5.1.21-beta
[2 Aug 2007 17:21]
Paul DuBois
Noted in 5.1.21 changelog. Log table locking was redesigned, eliminating several lock-related problems: - Truncating mysql.slow_log in a stored procedure after use of a cursor caused the thread to lock. (Bug #17876) - Flushing a log table resulted in warnings. (Bug #23044) - The server would hang when performing concurrent ALTER TABLE or TRUNCATE TABLE statements against the log tables. (Bug #25422) - Changing the value of the general_log system variable while a global read lock was in place resulted in deadlock. (Bug #29129)