Bug #72701 mysqlbinlog uses localtime() to print events, causes kernel mutex contention
Submitted: 21 May 2014 0:45 Modified: 29 Oct 2014 22:09
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S5 (Performance)
Version:5.6, * OS:Linux
Assigned to: CPU Architecture:Any

[21 May 2014 0:45] Domas Mituzas
Description:
when parsing mysqlbinlogs mysql will call localtime(), which stats /etc/localtime on every invocation
that may cause kernel contention like seen here:

Events: 658K cycles                                                                                                                                                                                                                                                                         
-  58.75%      mysqlbinlog  [kernel.kallsyms]                            [k] _raw_spin_lock                                                                                                                                                                                                ◆
   - _raw_spin_lock                                                                                                                                                                                                                                                                        ▒
      - 49.41% dput                                                                                                                                                                                                                                                                        ▒
         - 50.09% path_put                                                                                                                                                                                                                                                                 ▒
            - 97.72% do_path_lookup                                                                                                                                                                                                                                                        ▒
               - 100.00% user_path_at                                                                                                                                                                                                                                                      ▒
                    vfs_fstatat                                                                                                                                                                                                                                                            ▒
                    vfs_stat                                                                                                                                                                                                                                                               ▒
                    sys_newstat                                                                                                                                                                                                                                                            ▒
                  - system_call                                                                                                                                                                                                                                                            ▒
                     - 100.00% __xstat64                                                                                                                                                                                                                                                   ▒
                        - 100.00% tzset_internal                                                                                                                                                                                                                                           ▒
                             __tz_convert                                                                                                                                                                                                                                                  ▒
                             Log_event::print_timestamp(st_io_cache*, long*)                                                                                                                                                                                                               ▒
                           + Log_event::print_header(st_io_cache*, st_print_event_info*, bool)                                                                                                                                                                                             ▒
            + 2.27% vfs_fstatat                                                                                                                                                                                                                                                            ▒
         - 49.91% link_path_walk                                                                                                                                                                                                                                                           ▒
            + do_path_lookup                                                                                                                                                                                                                                                               ▒
      - 25.12% nameidata_drop_rcu                                                                                                                                                                                                                                                          ▒
           link_path_walk                                                                                                                                                                                                                                                                  ▒
         + do_path_lookup                                                                                                                                                                                                                                                                  ▒
      - 24.03% path_get                                                                                                                                                                                                                                                                    ▒
         - 99.99% nameidata_drop_rcu                                                                                                                                                                                                                                                       ▒
              link_path_walk                                                                                                                                                                                                                                                               ▒
            + do_path_lookup                                                                                                                                                                                                                                                               ▒
      + 1.18% __d_lookup                                                                              

How to repeat:
strace -e stat mysqlbinlog ...file...

Suggested fix:
provide an option to use gmtime()
get a fix into libc that makes localtime() usable
[21 May 2014 11:58] MySQL Verification Team
Hello Domas,

Thank you for the bug report.
Verified as described.

Thanks,
Umesh
[21 May 2014 11:59] MySQL Verification Team
// 5.6.17

mysql-5.6.17]# strace -e stat bin/mysqlbinlog /tmp/master/master-bin.000002
...
..

stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1215524
#140523  7:22:46 server id 3306  end_log_pos 1215675 CRC32 0x65ac35f9   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1400809966/*!*/;
insert into help_relation (help_topic_id,help_keyword_id) values (366,484);
/*!*/;
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1215675
#140523  7:22:46 server id 3306  end_log_pos 1215757 CRC32 0xebdbd766   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1400809966/*!*/;
COMMIT
/*!*/;
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1215757
#140523  7:22:46 server id 3306  end_log_pos 1215838 CRC32 0x10e2c320   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1400809966/*!*/;
BEGIN
/*!*/;
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1215838
#140523  7:22:46 server id 3306  end_log_pos 1215989 CRC32 0x5aa1c5d7   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1400809966/*!*/;
insert into help_relation (help_topic_id,help_keyword_id) values (248,485);
/*!*/;
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1215989
#140523  7:22:46 server id 3306  end_log_pos 1216071 CRC32 0xf2358c56   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1400809966/*!*/;
COMMIT
/*!*/;
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=265, ...}) = 0
# at 1216071
#140523  7:22:46 server id 3306  end_log_pos 1216094 CRC32 0x21a99a99   Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[6 Jun 2014 7:42] zhai weixiang
Hi, Domas

Is this bug similar to such stall in mysql server ?

      1 __lll_lock_wait_private(libc.so.6),_L_lock_2164(libc.so.6),__tz_convert(libc.so.6),Time_zone_system::gmt_sec_to_TIME(tztime.cc:1079),gmt_sec_to_TIME(tztime.h:60),set_datetime(tztime.h:60),Item_func_now::fix_length_and_dec(tztime.h:60),Item_func::fix_fields(item_func.cc:231),setup_fields(sql_base.cc:8229),mysql_update(sql_update.cc:326),mysql_execute_command(sql_parse.cc:3690),mysql_parse(sql_parse.cc:6729),dispatch_command(sql_parse.cc:1634),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1858),start_thread(libpthread.so.0),clone(libc.so.6)

and quoted code
[6 Jun 2014 16:50] Domas Mituzas
that is similar albeit different issue, one of workarounds is to use MySQL internal time zone support and not rely on localtime. There is no real way to fix localtime there. 

I wrote about it at https://www.facebook.com/MySQLatFacebook/posts/10152018528871696:0
[8 Jun 2014 13:23] zhai weixiang
Thanks very much. After setting time_zone to a non-default value, I got a much better performance. 

bellow is my test:

mysqlslap --no-defaults -uxx --create-schema=test -h$host  -P$port --number-of-queries=1000000000 --concurrency=100  --query='select now()’

system time_zone: 15W QPS
set time_zone=“Asia/Shangehai”: 34W QPS 

So why the mysql server doesn't set this option automatically. For example:  run mysql_tzinfo_to_sql  when installing,  reading "/etc/sysconfig/clock" and setting time_zone when the mysql server was  starting up. Anyway, this should be another problem. :)
[9 Jul 2014 8:34] liu hickey
Another pt-pmp info with lots of now() in UPDATE/INSERTs without setting MySQL's internal time zone:

Wed Jul  9 10:42:45 CST 2014
     82 __lll_lock_wait_private(libc.so.6),_L_lock_2163(libc.so.6),__tz_convert(libc.so.6),Time_zone_system::gmt_sec_to_TIME(tztime.cc:1079),gmt_sec_to_TIME(tztime.h:60),set_datetime(tztime.h:60),Item_func_now::fix_length_and_dec(tztime.h:60),Item_func::fix_fields(item_func.cc:231),setup_fields(sql_base.cc:8229),mysql_update(sql_update.cc:326),mysql_execute_command(sql_parse.cc:3690),mysql_parse(sql_parse.cc:6729),dispatch_command(sql_parse.cc:1634),threadpool_process_request(threadpool_common.cc:246),handle_event(threadpool_unix.cc:1553),worker_main(threadpool_unix.cc:1553),start_thread(libpthread.so.0),clone(libc.so.6)
     37 __lll_lock_wait_private(libc.so.6),_L_lock_2163(libc.so.6),__tz_convert(libc.so.6),Time_zone_system::gmt_sec_to_TIME(tztime.cc:1079),gmt_sec_to_TIME(tztime.h:60),set_datetime(tztime.h:60),Item_func_now::fix_length_and_dec(tztime.h:60),Item_func::fix_fields(item_func.cc:231),setup_fields(sql_base.cc:8229),mysql_prepare_insert(sql_insert.cc:1529),mysql_insert(sql_insert.cc:750),mysql_execute_command(sql_parse.cc:3840),mysql_parse(sql_parse.cc:6729),dispatch_command(sql_parse.cc:1634),threadpool_process_request(threadpool_common.cc:246),handle_event(threadpool_unix.cc:1553),worker_main(threadpool_unix.cc:1553),start_thread(libpthread.so.0),clone(libc.so.6)
      1 __tzstring(libc.so.6),__tzset_parse_tz(libc.so.6),__tzfile_compute(libc.so.6),__tz_convert(libc.so.6),Time_zone_system::gmt_sec_to_TIME(tztime.cc:1079),gmt_sec_to_TIME(tztime.h:60),set_datetime(tztime.h:60),Item_func_now::fix_length_and_dec(tztime.h:60),Item_func::fix_fields(item_func.cc:231),setup_fields(sql_base.cc:8229),mysql_update(sql_update.cc:326),mysql_execute_command(sql_parse.cc:3690),mysql_parse(sql_parse.cc:6729),dispatch_command(sql_parse.cc:1634),threadpool_process_request(threadpool_common.cc:246),handle_event(threadpool_unix.cc:1553),worker_main(threadpool_unix.cc:1553),start_thread(libpthread.so.0),clone(libc.so.6)
[29 Oct 2014 22:09] David Moss
Thank you for your feedback. The following was added to the 5.5.41, 5.6.22 and 5.7.6 release notes with commit 4541:
A kernel mutex contention was being caused because mysqlbinlog was calling localtime() for every event read, which in turn called stat(/etc/localtime). This fix ensures that mysqlbinlog uses localtime_r(), which is optimized to store the read only timezone internal structure. This also means that mysqlbinlog now establishes the time zone at the beginning of processing and you can not change it during processing. This is the same behavior as MySQL server.
[3 Dec 2014 15:16] Laurynas Biveinis
$ bzr log -r 4724
------------------------------------------------------------
revno: 4724
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.5
timestamp: Wed 2014-10-08 21:54:35 +0530
message:
  Bug #18808072 MYSQLBINLOG USES LOCALTIME() TO PRINT EVENTS, CAUSES KERNEL MUTEX CONTENTION
  
  Problem: For every event read, mysqlbinlog calls localtime() which in turn
  calls stat(/etc/localtime) which is causing kernel mutex contention.
  
  Analysis and Fix:
  localtime() calls stat(/etc/localtime) for every instance of the call
  where as localtime_r() the reentrant version was optimized to store
  the read only tz internal structure. Hence it will not call
  stat(/etc/localtime). It will call only once at the beginning.
  The mysql server is calling localtime_r() and mysqlbinlog tool is
  one place where we are still using localtime().
  
  Once the process (mysqlbinlog) is started if timezone is changed
  it will be not picked up the the process and it will continue
  with the same values as the beginning of the process. This
  behavior is in-lined with mysql server.
  
  Also adding localtime_r() and gmtime_r() support for windows.