Bug #25422 Hang with log tables
Submitted: 4 Jan 2007 18:37 Modified: 15 Jul 2008 16:21
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Logging Severity:S3 (Non-critical)
Version:5.1.15-beta-debug-log OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Marc ALFF CPU Architecture:Any

[4 Jan 2007 18:37] Peter Gulutzan
Description:
If I use log tables i.e. mysql.slow_log or mysql.general_log,
and I have two simultaneous connections, I can cause a hang.

I can show this with two stored procedures.
One of them repeatedly says 'truncate mysql.slow_log'.
The other repeatedly says 'alter table mysql.general_log'.
If I run either of these procedures on two connections,
I see a hang within a minute.

How to repeat:
Start server with:
mysqld --user=root --skip-networking --log-slow-queries --log-slow-admin-statements --log-output=TABLE,FILE --log --log-queries-not-using-indexes

Start two mysql client connections, T1 and T2.

1. To hang with a procedure that says 'truncate':

On T1, say:
/* doesn't matter if it's mysql.slow_log or mysql.general_log */
delimiter //
use test//
create procedure p1 ()
begin
  declare v1 int default 0;
  declare continue handler for sqlexception select 'error';
  while v1 < 100000 do
    select v1;
    truncate mysql.slow_log;
    set v1 = v1 + 1;
    end while;
  end//
delimiter ;
call p1();

On T2, say:
use test;
call p1();

2. To hang with a procedure that says 'alter':

On T1, say:
delimiter //
use test//
create procedure p2 ()
begin
  declare v1 int default 0;
  declare continue handler for sqlexception select 'error';
  while v1 < 100000 do
    select v1;
    SET @old_log_state = @@global.general_log;
    SET GLOBAL general_log = 'OFF';
    ALTER TABLE mysql.general_log ENGINE = CSV;
    SET GLOBAL general_log = @old_log_state;
    set v1 = v1 + 1;
    end while;
  end//
delimiter ;
call p2();

On T2, say:
use test;
call p2();
[4 Jan 2007 22:39] MySQL Verification Team
Thank you for the bug report. Verified on FC 6.0 32-bit.
[25 Jan 2007 16:51] Peter Gulutzan
Here is another test case, suggested by Brian Aker:

Again, start the server with slow queries:
/usr/local/mysql/libexec/mysqld --user=root --skip-networking --log-slow-queries --log-slow-admin-statements --log-output=TABLE,FILE --log --log-queries-not-using-indexes

Make two connections, T1 and T2:

On T1, say:
delimiter //
create table ta (s1 int, s2 varchar(1000));
create index ia on ta (s1);
create procedure p ()
begin
  declare v int default 0;
  while v < 1000000 do
    select v;
    insert into ta values (v,repeat('a',rand()*1000));
    update ta set s1 = s1 - 1 where s1 = v - 1;
    update ta set s1 = s1 + 1 where s1 = v - 1;
    truncate mysql.general_log;
    set v = v + 1;
    end while;
  end//
call p()//

On T2, say:
call p();
[7 Feb 2007 13:58] Kristofer Pettersson
Prel. reporting:

Thread 1:
983:T@6    : | | | | | | | | | >open_log_table
984:T@6    : | | | | | | | | | | >simple_open_n_lock_tables
985:T@6    : | | | | | | | | | | | >open_tables
986:T@6    : | | | | | | | | | | | | >init_alloc_root
987:T@6    : | | | | | | | | | | | | | enter: root: 0x4bcd738
988:T@6    : | | | | | | | | | | | | | >_mymalloc
989:T@6    : | | | | | | | | | | | | | | enter: Size: 8040
990:T@6    : | | | | | | | | | | | | | | exit: ptr: 0x1afc368
991:T@6    : | | | | | | | | | | | | | <_mymalloc
992:T@6    : | | | | | | | | | | | | <init_alloc_root
993:T@6    : | | | | | | | | | | | | >open_table
994:T@6    : | | | | | | | | | | | | | >hash_first
995:T@6    : | | | | | | | | | | | | | | exit: found key at 5
996:T@6    : | | | | | | | | | | | | | <hash_first
997:T@6    : | | | | | | | | | | | | | note: Found table 'mysql.slow_log' 

with different refresh version
998:T@6    : | | | | | | | | | | | | | >close_old_data_files
999:T@6    : | | | | | | | | | | | | | <close_old_data_files
1000:T@6    : | | | | | | | | | | | | | >wait_for_condition

Thread dead locked waiting for LOCK_open on condition COND_refresh
Function: open_table(THD *thd, TABLE_LIST *table_list={.. mysql.slow_log 

..}, MEM_ROOT *mem_root)

Thread 2:
922:T@8    : | | | | | | | | | | <lock_table_name
923:T@8    : | | | | | | | | | | >remove_table_from_cache
924:T@8    : | | | | | | | | | | | enter: Table: 'mysql.slow_log'  flags: 0
925:T@8    : | | | | | | | | | | | >hash_first
926:T@8    : | | | | | | | | | | | | exit: found key at 5
927:T@8    : | | | | | | | | | | | <hash_first
928:T@8    : | | | | | | | | | | | info: Table was in use by current 

thread. db_stat: 0
929:T@8    : | | | | | | | | | | | info: Table was in use by other thread
930:T@8    : | | | | | | | | | | | info: Removing table from 

table_def_cache
931:T@8    : | | | | | | | | | | | >hash_first
932:T@8    : | | | | | | | | | | | <hash_first
933:T@8    : | | | | | | | | | | <remove_table_from_cache
934:T@8    : | | | | | | | | | | >start_waiting_global_read_lock
935:T@8    : | | | | | | | | | | <start_waiting_global_read_lock
936:T@8    : | | | | | | | | | <lock_and_wait_for_table_name
<next instruction is LOGGER:lock()>

Thread dead locked on mutex LOCK_logger
[13 Feb 2007 9:49] Kristofer Pettersson
Windows stack traces becomes corrupted for unknown reason, linux stack trace looks like this for two threads:

#0  0x003a5402 in __kernel_vsyscall ()
#1  0x00435876 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x083e0051 in safe_cond_wait (cond=0x85f8ba0, mp=0x85f9080, 
    file=0x84996ac "sql_base.cc", line=1711) at thr_mutex.c:202
#3  0x0820c4dc in wait_for_condition (thd=0x9f3c168, mutex=0x85f9080, 
    cond=0x85f8ba0) at sql_base.cc:1711
#4  0x0821585d in open_table (thd=0x9f3c168, table_list=0x9f3a4f4, 
    mem_root=0xb7e866c8, refresh=0xb7e866ff, flags=0) at sql_base.cc:2026
#5  0x0821623e in open_tables (thd=0x9f3c168, start=0xb7e86764, 
    counter=0xb7e86754, flags=0) at sql_base.cc:2926
#6  0x082167dc in simple_open_n_lock_tables (thd=0x9f3c168, tables=0x9f3a4f4)
    at sql_base.cc:3180
#7  0x08263677 in Log_to_csv_event_handler::open_log_table (this=0x9f3a340, 
    log_table_type=1) at log.cc:284
#8  0x08263bcd in Log_to_csv_event_handler::reopen_log_table (this=0x9f3a340, 
    log_table_type=1) at log.cc:377
#9  0x08264876 in LOGGER::reopen_log_table (this=0x85fe380, log_table_type=1)
    at log.cc:815
#10 0x0825ff93 in mysql_truncate (thd=0x9fc36b0, table_list=0x9fe2ca0, 
    dont_send_ok=false) at sql_delete.cc:960
#11 0x081e42d2 in mysql_execute_command (thd=0x9fc36b0) at sql_parse.cc:3564
#12 0x0832fe99 in sp_instr_stmt::exec_core (this=0x9fe2e48, thd=0x9fc36b0, 
    nextp=0xb7e87444) at sp_head.cc:2559
#13 0x0832fc57 in sp_lex_keeper::reset_lex_and_exec_core (this=0x9fe2e70, 
    thd=0x9fc36b0, nextp=0xb7e87444, open_tables=false, instr=0x9fe2e48)
    at sp_head.cc:2428
#14 0x083339bd in sp_instr_stmt::execute (this=0x9fe2e48, thd=0x9fc36b0, 
    nextp=0xb7e87444) at sp_head.cc:2510
#15 0x0832d823 in sp_head::execute (this=0x9fe2238, thd=0x9fc36b0)
    at sp_head.cc:1098
#16 0x08332149 in sp_head::execute_procedure (this=0x9fe2238, thd=0x9fc36b0, 
    args=0x9fc3c0c) at sp_head.cc:1729
#17 0x081e7a25 in mysql_execute_command (thd=0x9fc36b0) at sql_parse.cc:4691
#18 0x081e984e in mysql_parse (thd=0x9fc36b0, inBuf=0x9fd9778 "call p1()", 
    length=9) at sql_parse.cc:6155
#19 0x081ea263 in dispatch_command (command=COM_QUERY, thd=0x9fc36b0, 
    packet=0x9fcd729 "", packet_length=10) at sql_parse.cc:1857
#20 0x081eb2ba in do_command (thd=0x9fc36b0) at sql_parse.cc:1626
#21 0x081eb763 in handle_one_connection (arg=0x9fc36b0) at sql_parse.cc:1232
#22 0x00433bd4 in start_thread () from /lib/libpthread.so.0
#23 0x001db4fe in clone () from /lib/libc.so.6

******************************************************************************

#0  0x003a5402 in __kernel_vsyscall ()
#1  0x00437f5e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2  0x00434c7d in _L_mutex_lock_36 () from /lib/libpthread.so.0
#3  0xb7eb77d8 in ?? ()
#4  0xb7e88bb0 in ?? ()
#5  0x00000003 in ?? ()
#6  0x083dfcf5 in safe_mutex_lock (mp=0x85fe380, file=0x8454982 "log.h", 
    line=520) at thr_mutex.c:115
#7  0x081d6520 in LOGGER::lock (this=0x85fe380) at log.h:520
#8  0x0825fdff in mysql_truncate (thd=0x9f7b728, table_list=0x9fee860, 
    dont_send_ok=false) at sql_delete.cc:923
#9  0x081e42d2 in mysql_execute_command (thd=0x9f7b728) at sql_parse.cc:3564
#10 0x0832fe99 in sp_instr_stmt::exec_core (this=0x9feea08, thd=0x9f7b728, 
    nextp=0xb7eb8444) at sp_head.cc:2559
#11 0x0832fc57 in sp_lex_keeper::reset_lex_and_exec_core (this=0x9feea30, 
    thd=0x9f7b728, nextp=0xb7eb8444, open_tables=false, instr=0x9feea08)
    at sp_head.cc:2428
#12 0x083339bd in sp_instr_stmt::execute (this=0x9feea08, thd=0x9f7b728, 
    nextp=0xb7eb8444) at sp_head.cc:2510
#13 0x0832d823 in sp_head::execute (this=0x9feddf8, thd=0x9f7b728)
    at sp_head.cc:1098
#14 0x08332149 in sp_head::execute_procedure (this=0x9feddf8, thd=0x9f7b728, 
    args=0x9f7bc84) at sp_head.cc:1729
#15 0x081e7a25 in mysql_execute_command (thd=0x9f7b728) at sql_parse.cc:4691
#16 0x081e984e in mysql_parse (thd=0x9f7b728, inBuf=0x9fc76e8 "call p1()", 
    length=9) at sql_parse.cc:6155
#17 0x081ea263 in dispatch_command (command=COM_QUERY, thd=0x9f7b728, 
    packet=0x9fbf689 "", packet_length=10) at sql_parse.cc:1857
#18 0x081eb2ba in do_command (thd=0x9f7b728) at sql_parse.cc:1626
#19 0x081eb763 in handle_one_connection (arg=0x9f7b728) at sql_parse.cc:1232
#20 0x00433bd4 in start_thread () from /lib/libpthread.so.0
#21 0x001db4fe in clone () from /lib/libc.so.6
[13 Feb 2007 17:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19778

ChangeSet@1.2379, 2007-02-13 18:25:43+01:00, thek@kpdesk.mysql.com +1 -0
  Bug#25422 Hang with log tables
  
  - Truncate table can dead lock
  - Moved mutex LOGGER:lock to include the table lock.
[14 Feb 2007 8:20] Kristofer Pettersson
Note: Peters example with ALTER TABLES in function p2() doesn't dead lock, but end up in a tight look consuming 100% CPU (or in the case of my windows can; 110% CPU resulting in a hard reboot)
[14 Feb 2007 11:27] Kristofer Pettersson
Stack trace on running halt:
THD1:
  	ntdll.dll!7c90110a() 	
 	mysqld.exe!pthread_cond_timedwait(pthread_cond_t * cond=0x00c59d88,<snip>
 	mysqld.exe!remove_table_from_cache(THD * thd=0x01a30ad0, const char <snip>
 	mysqld.exe!wait_while_table_is_used(THD * thd=0x01a30ad0, st_table <snip>
	mysqld.exe!close_cached_table(THD * thd=0x01a30ad0, st_table * tabl <snip>
 	mysqld.exe!mysql_alter_table(THD * thd=0x01a30ad0, char * new_db=0x0<snip>
 	mysqld.exe!mysql_execute_command(THD * thd=0x01a30ad0)  Line 3225 + <snip>
 	mysqld.exe!sp_instr_stmt::exec_core(THD * thd=0x01a30ad0, unsigned i<snip>
 	mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core(THD * thd=0x01a30a<snip>
 	mysqld.exe!sp_instr_stmt::execute(THD * thd=0x01a30ad0, unsigned int<snip>
 	mysqld.exe!sp_head::execute(THD * thd=0x01a30ad0)  Line 1078 + 0x20 <snip>
 	mysqld.exe!sp_head::execute_procedure(THD * thd=0x01a30ad0, List<Ite<snip>
 	mysqld.exe!mysql_execute_command(THD * thd=0x01a30ad0)  Line 4710 + <snip>
 	mysqld.exe!mysql_parse(THD * thd=0x01a30ad0, char * inBuf=0x01ab9460<ship>
 	mysqld.exe!dispatch_command(enum_server_command command=COM_QUERY, T<snip>
 	mysqld.exe!do_command(THD * thd=0x01a30ad0)  Line 1638 + 0x31 bytes <snip>
 	mysqld.exe!handle_one_connection(void * arg=0x01a30ad0)  Line 1239 +<snip>
 	mysqld.exe!pthread_start(void * param=0x0039f3d8)  Line 62 + 0x9 byt<snip>
 	mysqld.exe!_callthreadstart()  Line 293 + 0xf bytes	C
 	mysqld.exe!_threadstart(void * ptd=0x01a93238)  Line 277	C

THD2:
	mysqld.exe!wait_for_tables(THD * thd=0x01a35408)  Line 2457	C++
 	mysqld.exe!mysql_lock_tables(THD * thd=0x01a35408, st_table * * table<snip>
 	mysqld.exe!open_ltable(THD * thd=0x01a35408, st_table_list * table_li<snip>
 	mysqld.exe!mysql_alter_table(THD * thd=0x01a35408, char * new_db=0x01<snip>
 	mysqld.exe!mysql_execute_command(THD * thd=0x01a35408)  Line 3225 + 0<snip>
 	mysqld.exe!sp_instr_stmt::exec_core(THD * thd=0x01a35408, unsigned in<snip>
 	mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core(THD * thd=0x01a3540<snip>
 	mysqld.exe!sp_instr_stmt::execute(THD * thd=0x01a35408, unsigned int <snip>
 	mysqld.exe!sp_head::execute(THD * thd=0x01a35408)  Line 1078 + 0x20 b<snip>
 	mysqld.exe!sp_head::execute_procedure(THD * thd=0x01a35408, List<Item<snip>
 	mysqld.exe!mysql_execute_command(THD * thd=0x01a35408)  Line 4710 + 0<snip>
 	mysqld.exe!mysql_parse(THD * thd=0x01a35408, char * inBuf=0x01ade0f8,<snip>
 	mysqld.exe!dispatch_command(enum_server_command command=COM_QUERY, TH<snip>
 	mysqld.exe!do_command(THD * thd=0x01a35408)  Line 1638 + 0x31 bytes  <snip>
 	mysqld.exe!handle_one_connection(void * arg=0x01a35408)  Line 1239 + <snip>
 	mysqld.exe!pthread_start(void * param=0x01a2fd18)  Line 62 + 0x9 byte<snip>
 	mysqld.exe!_callthreadstart()  Line 293 + 0xf bytes	C
 	mysqld.exe!_threadstart(void * ptd=0x01a45590)  Line 277	C
[28 Jun 2007 22:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29909

ChangeSet@1.2516, 2007-06-28 16:13:43-06:00, malff@weblab.(none) +32 -0
  Bug#25422 (Hang with log tables)
  Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes
             a deadlock)
  
  NOTE TO REVIEWER:
  There is still an opened issue with this patch, see mysql-tests/t/disabled.def
  The test rpl_optimize dead locks, but it's unclear if this is because:
  - a new bug was introduced by this patch
  - an existing unrelated bug was exposed by this patch
  To discuss during review, this point needs further investigation.
  
  ------
  
  Prior to this fix, the server would hang when performing concurrent
  ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES,
  which are mysql.general_log and mysql.slow_log.
  
  The root cause traces to the following code:
  in sql_base.cc, open_table()
    if (table->in_use != thd)
    {
      /* wait_for_condition will unlock LOCK_open for us */
      wait_for_condition(thd, &LOCK_open, &COND_refresh);
    }
  The problem with this code is that the current implementation of the
  LOGGER creates 'fake' THD objects, like
  - Log_to_csv_event_handler::general_log_thd
  - Log_to_csv_event_handler::slow_log_thd
  which are not associated to a real thread running in the server,
  so that waiting for these non-existing threads to release table locks
  cause the dead lock.
  
  In general, the design of Log_to_csv_event_handler does not fit into the
  general architecture of the server, so that the concept of general_log_thd
  and slow_log_thd has to be abandoned:
  - this implementation does not work with table locking
  - it will not work with commands like SHOW PROCESSLIST
  - having the log tables always opened does not integrate well with DDL
  operations / FLUSH TABLES / SET GLOBAL READ_ONLY
  
  With this patch, the fundamental design of the LOGGER has been changed to:
  - always open and close a log table when writing a log
  - remove totally the usage of fake THD objects
  - clarify how locking of log tables is implemented in general.
  
  Changing the locking design for log tables had some impact in several areas.
  The total change consist of:
  
  Part I: Technical changes
  
  1)
  
  bool TABLE_SHARE::system was too close to bool TABLE_SHARE::system_table,
  and has been renamed bool TABLE_SHARE::one_record_table
  
  2)
  
  bool TABLE_SHARE::system_table and bool TABLE_SHARE::log_table are related.
  Instead of multiplying various "is_X" "is_Y" boolean flags, the attribute
  TABLE_CATEGORY TABLE_SHARE::table_category has been introduced, that better
  describe the nature of the table.
  
  3)
  
  The enum TABLE_CATEGORY has been introduced, see the comments there.
  
  Special exceptions regarding FLUSH TABLES WITH READ LOCK or
  SET GLOBAL READ_ONLY=ON, in lock.cc, are now implemented with more clarity
  with the use of honor_global_locks().
  
  In general, helpers like honor_XYZ() are encouraged, as this breaks the
  correlation between
  - how a table behaves with regards to 1 property (global read lock, ...)
  - what the table is used for (user, system, ...)
  
  4)
  
  my_bool TABLE_SHARE::locked_by_logger has been removed, this was an artificial
  hack.
  
  5)
  
  The previous implementation confused:
  - 'system table' with ENGINE = MYISAM
  - 'log table' with ENGINE = CSV
  so that logic related to enforcing constraints for what the SQL layer
  considers 'system' or 'log' tables was not implemented in the SQL layer but
  in the storage engines.
  
  The methods:
  - handler::check_if_locking_is_allowed
  - handler::check_if_log_table_locking_is_allowed
  - ha_tina::check_if_locking_is_allowed
  - ha_myisam::check_if_locking_is_allowed
  have been removed, and the logic implemented in lock.cc instead.
  
  6)
  
  The fake threads, Log_to_csv_event_handler::general_log_thd and
  Log_to_csv_event_handler::slow_log_thd, are removed
  
  7)
  
  In general, a lot of code in log.h / log.cc dealing with opening / closing
  tables has been removed.
  
  8)
  
  Writing to the log tables is still a privileged operation.
  Before, it was enforced by testing that the 'current thread is the logger
  thread'. Now, it's enforced by the use of the internal flag
  OPTION_INTERNAL_PRIVILEGE
  
  9)
  
  TRUNCATE TABLE does not interfere with the logger implementation any more,
  all the code used to 'shutdown the logger, truncate, then restart the logger'
  is removed, and the operation is online.
  
  TRUNCATE TABLE uses OPTION_INTERNAL_PRIVILEGE, since it may need to use
  write locks.
  
  10)
  
  RENAME TABLE does not interfere with the logger implementation any more,
  regular table locking rules should apply even with log tables.
  
  11)
  
  Special rules for log tables in ALTER TABLES have been implemented in a
  dedicated helper/wrapper, for clarity.
  
  mysql_alter_table() uses OPTION_INTERNAL_PRIVILEGE, then invokes the
  ALTER TABLE implementation: mysql_alter_table_impl()
  
  This ensures that -- given the complexity of mysql_alter_table_impl() --
  every path of execution properly unset OPTION_INTERNAL_PRIVILEGE at the end.
  
  Note that the operation is online, as the logger is not locked.
  
  Part II: functional changes
  
  12)
  
  The previous implementation enforced that :
  - LOCK TABLE mysql.general_log WRITE
  - LOCK TABLE mysql.general_log READ
  was prohibited (even for SUPER), while
  - LOCK TABLE mysql.general_log READ LOCAL
  was allowed.
  
  The rational behind this was that WRITE and READ would freeze the server
  when running with SET GLOBAL GENERAL_LOG=ON, and should (rightfully) be
  prohibited.
  
  Allowing READ LOCAL was assuming that no dead locks can occur because of this
  lock, so that it's not harmful.
  
  However, with the following scenario:
  THD1: LOCK TABLE mysql.general_log READ LOCAL;
  THD2: TRUNCATE TABLE mysql.general_log;
  THD1: <any query, even "select 1 from dual> --> deadlock.
  
  The dead lock occurs because TRUNCATE uses a name lock and waits for *all*
  locks to disappear.
  THD1 dead locks while writing to the general_log, which can't get a write lock
  because of TRUNCATE.
  
  So, allowing even READ LOCAL on log tables can lead to a complete dead lock of
  the server, where 'complete' means :
  - new clients can't connect (stuck while printing in the general log)
  - new queries from existing clients can't execute (same)
  - the server does not respond to kill -15 (as seen during testing)
  
  The functional change introduced by this patch is to deny *any* form of
  locking what-so-ever on log tables, so that :
  LOCK TABLE mysql.general_log READ LOCAL is now denied.
  
  13)
  
  FLUSH LOGS does not flush the log tables any more.
  
  14)
  
  FLUSH TABLES does flush the log tables, it used to ignore them.
  
  Part III: more technical changes
  
  15)
  
  Implementing the restriction 12) above exposed a bug in mysqldump, which would
  ignore some tables in dump_all_tables_in_db(), but forget to ignore the same
  in dump_all_views_in_db().
  
  This has been fixed.
  
  16)
  
  mysqldump would also issue an empty "LOCK TABLE" command when all the tables
  to lock are to be ignored (numrows == 0), instead of not issuing the query.
  
  This has been fixed.
  
  17)
  
  Internal errors handlers could intercept errors but not warnings
  (see sql_error.cc), this has been fixed.
  
  It was found because writing to the general log or the slow log needs to
  filter out any error or warning, to avoid affecting the client performing
  the query with unrelated noise.
[20 Jul 2007 1:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/31209

ChangeSet@1.2516, 2007-07-19 19:43:15-06:00, malff@weblab.(none) +31 -0
  WL#3984 (Revise locking of mysql.general_log and mysql.slow_log)
  Bug#25422 (Hang with log tables)
  Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes
             a deadlock)
  
  NOTE TO REVIEWER:
  There is still an opened issue with this patch, see mysql-tests/t/disabled.def
  The test rpl_optimize dead locks, but it's unclear if this is because:
  - a new bug was introduced by this patch
  - an existing unrelated bug was exposed by this patch
  To discuss during review, this point needs further investigation.
  
  ------
  
  Prior to this fix, the server would hang when performing concurrent
  ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES,
  which are mysql.general_log and mysql.slow_log.
  
  The root cause traces to the following code:
  in sql_base.cc, open_table()
    if (table->in_use != thd)
    {
      /* wait_for_condition will unlock LOCK_open for us */
      wait_for_condition(thd, &LOCK_open, &COND_refresh);
    }
  The problem with this code is that the current implementation of the
  LOGGER creates 'fake' THD objects, like
  - Log_to_csv_event_handler::general_log_thd
  - Log_to_csv_event_handler::slow_log_thd
  which are not associated to a real thread running in the server,
  so that waiting for these non-existing threads to release table locks
  cause the dead lock.
  
  In general, the design of Log_to_csv_event_handler does not fit into the
  general architecture of the server, so that the concept of general_log_thd
  and slow_log_thd has to be abandoned:
  - this implementation does not work with table locking
  - it will not work with commands like SHOW PROCESSLIST
  - having the log tables always opened does not integrate well with DDL
  operations / FLUSH TABLES / SET GLOBAL READ_ONLY
  
  With this patch, the fundamental design of the LOGGER has been changed to:
  - always open and close a log table when writing a log
  - remove totally the usage of fake THD objects
  - clarify how locking of log tables is implemented in general.
  
  Changing the locking design for log tables had some impact in several areas.
  The total change consist of:
  
  Part I: Technical changes
  
  1)
  
  bool TABLE_SHARE::system was too close to bool TABLE_SHARE::system_table,
  and has been renamed bool TABLE_SHARE::one_record_table
  
  2)
  
  bool TABLE_SHARE::system_table and bool TABLE_SHARE::log_table are related.
  Instead of multiplying various "is_X" "is_Y" boolean flags, the attribute
  TABLE_CATEGORY TABLE_SHARE::table_category has been introduced, that better
  describe the nature of the table.
  
  3)
  
  The enum TABLE_CATEGORY has been introduced, see the comments there.
  
  Special exceptions regarding FLUSH TABLES WITH READ LOCK or
  SET GLOBAL READ_ONLY=ON, in lock.cc, are now implemented with more clarity
  with the use of honor_global_locks().
  
  In general, helpers like honor_XYZ() are encouraged, as this breaks the
  correlation between
  - how a table behaves with regards to 1 property (global read lock, ...)
  - what the table is used for (user, system, ...)
  
  4)
  
  my_bool TABLE_SHARE::locked_by_logger has been removed, this was an artificial
  hack.
  
  5)
  
  The previous implementation confused:
  - 'system table' with ENGINE = MYISAM
  - 'log table' with ENGINE = CSV
  so that logic related to enforcing constraints for what the SQL layer
  considers 'system' or 'log' tables was not implemented in the SQL layer but
  in the storage engines.
  
  The methods:
  - handler::check_if_locking_is_allowed
  - handler::check_if_log_table_locking_is_allowed
  - ha_tina::check_if_locking_is_allowed
  - ha_myisam::check_if_locking_is_allowed
  have been removed, and the logic implemented in lock.cc instead.
  
  6)
  
  The fake threads, Log_to_csv_event_handler::general_log_thd and
  Log_to_csv_event_handler::slow_log_thd, are removed
  
  7)
  
  In general, a lot of code in log.h / log.cc dealing with opening / closing
  tables has been removed.
  
  8)
  
  Writing to the log tables is still a privileged operation.
  Before, it was enforced by testing that the 'current thread is the logger
  thread'. Now, it's enforced by the use of the internal flag
  OPTION_INTERNAL_PRIVILEGE
  
  9)
  
  TRUNCATE TABLE does not interfere with the logger implementation any more,
  all the code used to 'shutdown the logger, truncate, then restart the logger'
  is removed, and the operation is online.
  
  TRUNCATE TABLE uses OPTION_INTERNAL_PRIVILEGE, since it may need to use
  write locks.
  
  10)
  
  RENAME TABLE does not interfere with the logger implementation any more,
  regular table locking rules should apply even with log tables.
  
  11)
  
  Special rules for log tables in ALTER TABLES have been implemented in a
  dedicated helper/wrapper, for clarity.
  
  mysql_alter_table() uses OPTION_INTERNAL_PRIVILEGE, then invokes the
  ALTER TABLE implementation: mysql_alter_table_impl()
  
  This ensures that -- given the complexity of mysql_alter_table_impl() --
  every path of execution properly unset OPTION_INTERNAL_PRIVILEGE at the end.
  
  Note that the operation is online, as the logger is not locked.
  
  Part II: functional changes
  
  12)
  
  The previous implementation enforced that :
  - LOCK TABLE mysql.general_log WRITE
  - LOCK TABLE mysql.general_log READ
  was prohibited (even for SUPER), while
  - LOCK TABLE mysql.general_log READ LOCAL
  was allowed.
  
  The rational behind this was that WRITE and READ would freeze the server
  when running with SET GLOBAL GENERAL_LOG=ON, and should (rightfully) be
  prohibited.
  
  Allowing READ LOCAL was assuming that no dead locks can occur because of this
  lock, so that it's not harmful.
  
  However, with the following scenario:
  THD1: LOCK TABLE mysql.general_log READ LOCAL;
  THD2: TRUNCATE TABLE mysql.general_log;
  THD1: <any query, even "select 1 from dual> --> deadlock.
  
  The dead lock occurs because TRUNCATE uses a name lock and waits for *all*
  locks to disappear.
  THD1 dead locks while writing to the general_log, which can't get a write lock
  because of TRUNCATE.
  
  So, allowing even READ LOCAL on log tables can lead to a complete dead lock of
  the server, where 'complete' means :
  - new clients can't connect (stuck while printing in the general log)
  - new queries from existing clients can't execute (same)
  - the server does not respond to kill -15 (as seen during testing)
  
  The functional change introduced by this patch is to deny *any* form of
  locking what-so-ever on log tables, so that :
  LOCK TABLE mysql.general_log READ LOCAL is now denied.
  
  13)
  
  FLUSH LOGS does not flush the log tables any more.
  
  14)
  
  FLUSH TABLES does flush the log tables, it used to ignore them.
  
  Part III: more technical changes
  
  15)
  
  Implementing the restriction 12) above exposed a bug in mysqldump, which would
  ignore some tables in dump_all_tables_in_db(), but forget to ignore the same
  in dump_all_views_in_db().
  
  This has been fixed.
  
  16)
  
  mysqldump would also issue an empty "LOCK TABLE" command when all the tables
  to lock are to be ignored (numrows == 0), instead of not issuing the query.
  
  This has been fixed.
  
  17)
  
  Internal errors handlers could intercept errors but not warnings
  (see sql_error.cc), this has been fixed.
  
  It was found because writing to the general log or the slow log needs to
  filter out any error or warning, to avoid affecting the client performing
  the query with unrelated noise.
  
  18)
  
  Implementing a nested call to open tables, for the performance schema tables,
  exposed an existing bug in remove_table_from_cache(), which would perform:
    in_use->some_tables_deleted=1;
  against another thread, without any consideration about thread locking.
  This call inside remove_table_from_cache() was not required anyway,
  since calling mysql_lock_abort() takes care of aborting -- cleanly -- threads
  that might hold a lock on a table.
  This line (in_use->some_tables_deleted=1) has been removed.
[24 Jul 2007 2:37] Marc ALFF
Correction about the last patch CSET comments:

The paragraph
"
  NOTE TO REVIEWER:
  There is still an opened issue with this patch, see mysql-tests/t/disabled.def
  The test rpl_optimize dead locks, but it's unclear if this is because:
  - a new bug was introduced by this patch
  - an existing unrelated bug was exposed by this patch
  To discuss during review, this point needs further investigation.
"

should be ignored, as the issue with rpl_optimize has been resolved.

Also, the patch happen to also fix the following bugs:
- Bug#17876
- Bug#23044
for which test cases will be added to the patch.
[25 Jul 2007 20:44] Konstantin Osipov
Sent a review over email.
[27 Jul 2007 6:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/31673

ChangeSet@1.2553, 2007-07-27 00:31:06-06:00, malff@weblab.(none) +39 -0
  WL#3984 (Revise locking of mysql.general_log and mysql.slow_log)
  Bug#25422 (Hang with log tables)
  Bug 17876 (Truncating mysql.slow_log in a SP after using cursor locks the
            thread)
  Bug 23044 (Warnings on flush of a log table)
  Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes
             a deadlock)
  
  Prior to this fix, the server would hang when performing concurrent
  ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES,
  which are mysql.general_log and mysql.slow_log.
  
  The root cause traces to the following code:
  in sql_base.cc, open_table()
    if (table->in_use != thd)
    {
      /* wait_for_condition will unlock LOCK_open for us */
      wait_for_condition(thd, &LOCK_open, &COND_refresh);
    }
  The problem with this code is that the current implementation of the
  LOGGER creates 'fake' THD objects, like
  - Log_to_csv_event_handler::general_log_thd
  - Log_to_csv_event_handler::slow_log_thd
  which are not associated to a real thread running in the server,
  so that waiting for these non-existing threads to release table locks
  cause the dead lock.
  
  In general, the design of Log_to_csv_event_handler does not fit into the
  general architecture of the server, so that the concept of general_log_thd
  and slow_log_thd has to be abandoned:
  - this implementation does not work with table locking
  - it will not work with commands like SHOW PROCESSLIST
  - having the log tables always opened does not integrate well with DDL
  operations / FLUSH TABLES / SET GLOBAL READ_ONLY
  
  With this patch, the fundamental design of the LOGGER has been changed to:
  - always open and close a log table when writing a log
  - remove totally the usage of fake THD objects
  - clarify how locking of log tables is implemented in general.
  
  See WL#3984 for details related to the new locking design.
  
  Additional changes (misc bugs exposed and fixed):
  
  1)
  
  mysqldump which would ignore some tables in dump_all_tables_in_db(),
   but forget to ignore the same in dump_all_views_in_db().
  
  2)
  
  mysqldump would also issue an empty "LOCK TABLE" command when all the tables
  to lock are to be ignored (numrows == 0), instead of not issuing the query.
  
  3)
  
  Internal errors handlers could intercept errors but not warnings
  (see sql_error.cc).
  
  4)
  
  Implementing a nested call to open tables, for the performance schema tables,
  exposed an existing bug in remove_table_from_cache(), which would perform:
    in_use->some_tables_deleted=1;
  against another thread, without any consideration about thread locking.
  This call inside remove_table_from_cache() was not required anyway,
  since calling mysql_lock_abort() takes care of aborting -- cleanly -- threads
  that might hold a lock on a table.
  This line (in_use->some_tables_deleted=1) has been removed.
[27 Jul 2007 17:41] Konstantin Osipov
Sent review over email (second).
[1 Aug 2007 23:27] Bugs System
Pushed into 5.1.21-beta
[2 Aug 2007 17:21] Paul DuBois
Noted in 5.1.21 changelog.

Log table locking was redesigned, eliminating several
lock-related problems:
  - Truncating mysql.slow_log in a stored procedure
    after use of a cursor caused the thread to lock.
    (Bug #17876) 
  - Flushing a log table resulted in warnings.
    (Bug #23044)
  - The server would hang when performing concurrent
    ALTER TABLE or TRUNCATE TABLE statements against the 
    log tables. (Bug #25422) 
  - Changing the value of the general_log system
    variable while a global read lock was in place
    resulted in deadlock. (Bug #29129)