Bug #59019 question about rentrant use of LOCK_logger
Submitted: 18 Dec 2010 1:25 Modified: 4 Jan 2011 11:09
Reporter: Mark Callaghan Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.52, 5.1, 5.6.1 bzr OS:Any
Assigned to: Marc Alff CPU Architecture:Any
Tags: general, LOCK_logger, log
Triage: Needs Triage: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[18 Dec 2010 1:25] Mark Callaghan
This isn't a high priority issue for me but I wonder if there is either a design flaw in the logging code or whether the performance schema code used by it is incorrectly raising errors when it should not.

I am working on server-side error logging using MySQL 5.1.  I would use the new audit plugin API were I using 5.5, but I need something right now. My initial code called general_log_write from my_message_sql and hung the server in the main.log_tables test.

The problem is that LOGGER::LOCK_logger is not re-entrant and errors that occur when it is already locked cause a hang. It is locked in my case by LOGGER::activate_log_handler from the call to lock_exclusive.

I wonder if this is a design flaw (or misuse of the code by me) and there isn't a clean way to avoid the problem. The problem is that there are things that lock LOCK_logger without either pushing an error handler or otherwise setting state to indicate that the thread should not attempt to lock it again.

I can't just do this by pushing an error handler in dispatch_command to call general_log_write because by the time my error handler is called, LOCK_logger has already been locked by activate_log_handler.

Is this a problem with the performance schema code that raises errors when it shouldn't (when called by Log_to_csv_event_handler::activate_log?

If all callers to lock_exclusive() or lock_shared() either set a variable in the THD or pushed an error handler to prevent something lower in the call stack from attempting to lock it again or even use the general log then this problem could be avoided.

There is an interesting comment in "bool Log_to_csv_event_handler::log_general"
    1) open_performance_schema_table generates an error of the
    table can not be opened or is corrupted.
    2) "INSERT INTO general_log" can generate warning sometimes.

    Suppress these warnings and errors, they can't be dealt with
    properly anyway.

    QQ: this problem needs to be studied in more detail.
    Comment this 2 lines and run "cast.test" to see what's happening.
  thd->push_internal_handler(& error_handler);
  need_pop= TRUE;

How to repeat:
/s/bld/dev502noalarm/sql/mysqld(LOGGER::lock_shared()+0x44) [0x743de8]
/s/bld/dev502noalarm/sql/mysqld(LOGGER::general_log_write(THD*, enum_server_command, char const*, unsigned int, char const*, int)+0xde) [0x743f2a]
/s/bld/dev502noalarm/sql/mysqld(general_log_write(THD*, enum_server_command, char const*, unsigned int, char const*, int)+0x67) [0x744283]
/s/bld/dev502noalarm/sql/mysqld(Log1_error_handler::handle_error(unsigned int, char const*, MYSQL_ERROR::enum_warning_level, THD*)+0x7b) [0x6385f7]
/s/bld/dev502noalarm/sql/mysqld(THD::handle_error(unsigned int, char const*, MYSQL_ERROR::enum_warning_level)+0x4d) [0x62cc81]
/s/bld/dev502noalarm/sql/mysqld(my_message_sql+0x11d) [0x650505]
/s/bld/dev502noalarm/sql/mysqld(my_error+0x291) [0xa6fdf5]
/s/bld/dev502noalarm/sql/mysqld(open_table_error(st_table_share*, int, int, int)+0xf9) [0x6cd621]
/s/bld/dev502noalarm/sql/mysqld(open_table_def(THD*, st_table_share*, unsigned int)+0x614) [0x6d36f6]
/s/bld/dev502noalarm/sql/mysqld(get_table_share(THD*, TABLE_LIST*, char*, unsigned int, unsigned int, int*)+0x162) [0x6bfff8]
/s/bld/dev502noalarm/sql/mysqld [0x6c0469]
/s/bld/dev502noalarm/sql/mysqld [0x6c07c4]
/s/bld/dev502noalarm/sql/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x109c) [0x6c3996]
/s/bld/dev502noalarm/sql/mysqld(open_ltable(THD*, TABLE_LIST*, thr_lock_type, unsigned int)+0xe1) [0x6c4415]
/s/bld/dev502noalarm/sql/mysqld(open_performance_schema_table(THD*, TABLE_LIST*, Open_tables_state*)+0x90) [0x6c46be]
/s/bld/dev502noalarm/sql/mysqld(Log_to_csv_event_handler::activate_log(THD*, unsigned int)+0x189) [0x7427a7]
/s/bld/dev502noalarm/sql/mysqld(LOGGER::activate_log_handler(THD*, unsigned int)+0x140) [0x743cc8]
/s/bld/dev502noalarm/sql/mysqld(sys_var_log_state::update(THD*, set_var*)+0x157) [0x67f207]
/s/bld/dev502noalarm/sql/mysqld(set_var::update(THD*)+0x6a) [0x674c20]
/s/bld/dev502noalarm/sql/mysqld(sql_set_variables(THD*, List<set_var_base>*)+0xfa) [0x67dce6]
/s/bld/dev502noalarm/sql/mysqld(mysql_execute_command(THD*, unsigned long long*)+0x48b1) [0x6690e5]
/s/bld/dev502noalarm/sql/mysqld(mysql_parse(THD*, char*, unsigned int, char const**, unsigned long long*)+0x335) [0x66e5b7]
/s/bld/dev502noalarm/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xe0b) [0x670997]
/s/bld/dev502noalarm/sql/mysqld(do_command(THD*)+0x3af) [0x6723a3]
/s/bld/dev502noalarm/sql/mysqld(handle_one_connection+0x156) [0x65c1f8]

Suggested fix:
Prevent rentrant use of LOCK_logger. I think I safely added server-side error logs by:
* pushing an error handler in dispatch_command()  that calls general_log_write only when THD::lock_logger =0
* add THD::lock_logger
* increment THD::lock_logger when my error handler is called and when LOGGER::lock_shared() and LOGGER::lock_exclusive are called
* decrement THD::lock_logger at the right time
[18 Dec 2010 10:35] Sveta Smirnova
Thank you for the report.

Hang verified as described with 5.1-release and 5.6.1-debug versions. 5.1-debug crashes, but does not hang.

Stack trace a bit different. See file attached.

I used following modification in code (against 5.1):

=== modified file 'sql/mysqld.cc'
--- sql/mysqld.cc       2010-11-30 23:32:51 +0000
+++ sql/mysqld.cc       2010-12-18 07:00:21 +0000
@@ -2977,6 +2977,7 @@
       push_warning(thd, MYSQL_ERROR::WARN_LEVEL_ERROR, error, str);
       thd->no_warnings_for_error= FALSE;
+general_log_write(thd, COM_QUERY, str, strlen(str));
   /* When simulating OOM, skip writing to error log to avoid mtr errors */
[18 Dec 2010 10:36] Sveta Smirnova

Attachment: bug59019.trace (application/octet-stream, text), 21.56 KiB.

[18 Dec 2010 10:37] Sveta Smirnova
Regarding to proper usage of LOCK_logger please either wait if one of developers volunteer to answer here or create support ticket.
[4 Jan 2011 11:07] Marc Alff
Bug#59019 question about re entrant use of LOCK_logger

Hi Mark.

Please find here some clarifications about the code:

1) In 5.1, there are functions in sql/sql_base.cc named
open_performance_schema_table(), close_performance_schema_table().

The names refer to "performance schema", because a long time ago
we considered moving the log tables to the performance schema.

This has been abandoned since, but the name in the code remained in 5.1.

Note that in 5.5, these functions are properly named
open_log_table() and close_log_table() to avoid confusion with
the performance schema implementation in 5.5,
which is a completely different feature compared to log tables.

2) my_message_sql(), and related functions or methods like:
- my_error()
- push_warning()
- THD::raise_error() in 5.5
- THD::raise_warning() in 5.5
are all about the SQL diagnostics area.

In the diagnostics area:
- SQL conditions (a warning, a not found, an error) are raised by a statement,
- conditions can be intercepted in SQL by SQL exception handlers in stored programs,
- when not handled, SQL conditions are ultimately sent to the client, in the client/server protocol.

3) general_log_write() and related functions are about the general log.

The general log can be written in various locations (table, file, etc),
but the content of the general log is ultimately:
- intended to be for the DBA or a monitoring application,
- not sent to the client connection doing a query.

4) general_log_write() and the logging code is not intended to be recursive.
If something fails when writing to the log, re entrancy of the logger lock itself
is not the only issue the code execution path is going to stumble on,
a log of things after that will break.
To name a few:
- recursive memory allocation errors
- recursive internal table open / close
- recursive log file open / close
- if writing to the log failed the first time,
  writing more data to the same place is likely to fail.

The code change that was attempted is not the proper way to hack the code.
In particular:
- general_log_write() is a much "higher" level code compared to my_message_sql(),
invoking general_log_write() from my_message_sql() is bound to create re entrancy issues.
- general_log_write() is about the general log, which has nothing to do with
the SQL diagnostics area.
Calling general_log_write() from my_message_sql() mixes different concepts.

For example, assuming a SQL condition is raised in a stored program,
and properly handled in a calling stored program, I expect one would *not* want to
see the error logged in the general log then, since there is no error after all.

Now, with all these clarifications, I still agree with you that the code itself
could be more documented, so that it is easier to modify to implement a new feature,
since I assume this is what you are attempting to do.

There are two issues to resolve here.

A) Where in the code to capture the data about a statement execution

Please look at where existing instrumentation points are located, as done with:
- the DTRACE instrumentation
- the SHOW PROFILE instrumentation

Code to search for:
#if defined(ENABLED_PROFILING) in 5.1 and 5.5

This is a good starting point.

B) Where to put the captured data

The general log is a possibility.
Information schema tables are another.
Performance schema tables are also another.

From a technical point of view, between the following designs:

(i) call general_log_write(), and have the thread executing the query
pay for the overhead of pushing the data all the way thru

(ii) insert the data in memory, like the performance schema does,
and have the reader thread read from memory when "select * from performance_schema.XXX"
is executed

my preference is obviously (ii), but this is not available in 5.1

I think I can guess what kind of feature you are working on with the code change that
was attempted, and I happen to have some knowledge about this area.

As far as Bug#59019 "question about re entrant use of LOCK_logger"
is concerned, the question is valid, but I consider the status to be "not a bug",
since how the code was used is a misuse.

I will be more than happy to continue this discussion in a different forum (not bugs.mysql.com),
since this discussion is really about implementing a new feature and not about a bug in the code.

-- Marc

And Happy New Year ;)
Sorry it took some time to get a reply (end of year).