Bug #81985 rsync of database files crashes mysql server configured with skip-grant-tables
Submitted: 23 Jun 2016 9:48 Modified: 16 Jun 2017 9:04
Reporter: Pascal KISSIAN Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7 OS:Ubuntu (16.04)
Assigned to: CPU Architecture:Any
Tags: crash, MySQL, rsync

[23 Jun 2016 9:48] Pascal KISSIAN
Description:
I have a rsync cron job that each hour, makes a copy of all (databases and not databases) files to a backup server...

That worked fine in ubuntu 12.04 with mysql 5.5
I recently upgraded to ubuntu 16.04 with mysql 5.7
and now, each our, the mysql server is crashing.

I tested replacing rsync version with the one taht was on ubuntu 12.04: same crash of mysql server.

my specific mysql ini parameters:
(I do not know if changing one of them makes a difference, but I still use root without password as there is no need for security for this server, and it remains compatible with other servers that are still in previous versions)

skip-grant-tables 
net_read_timeout    = 600
net_write_timeout   = 600
query_cache_type    = 1
open_files_limit    = 4096
key_buffer_size		= 512M
max_allowed_packet	= 512M
collation_server	= utf8_unicode_ci
character_set_server	= utf8
default-storage-engine	= MYISAM
sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

The error message that I get in php after executing the rsync is:
unable to connect to mysql server on 'localhost' server, port:'3306' 
ERROR: Plugin 'auth_socket' is not loaded

How to repeat:

same ini parameters, do a rsync on database files...
[24 Jun 2016 6:10] Lars Tangvald
I'm not sure why copying out the server files would cause this, but in general I'm fairly certain using rsync on a running server isn't a supported backup method. Or does the cron job also shut down the server first?

As for the error: MySQL 5.7 tries to be more secure by default, so if on upgrade it detects a passwordless root account it will install and enable unix socket authentication for the root user. When logging in as root it then needs to load this plugin, causing that error if the plugin is no longer available.

When you upgraded to Ubuntu 16, did you do an rsync from the backup to the database? If you installed 5.7, then copied the 5.5 database over it without deleting the 5.7 database you may have ended up with a database that's part 5.5 and part 5.7, so it has the new user tables from 5.7 but the plugin list from 5.5.

A possible way to fix this is to put the following into a file:
ALTER USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' BY '';
SHUTDOWN;

Make sure the file is readable for the mysql user
Then stop the server, and run sudo mysqld --user=mysql --init-file=[file]

This will disable the auth_socket plugin for the user.
You could also run INSTALL PLUGIN auth_socket SONAME 'auth_socket.so'; instead of ALTER USER.

Then restart the server and see if that fixes it. We would strongly recommend running mysql_upgrade after this, since 5.7 will not work well with a 5.5 database.
[24 Jun 2016 6:23] Lars Tangvald
Also, there's a simple test you can do to check if you have any configuration issues that may prevent the server from starting:
sudo mysqld --verbose --help 2>&1>/dev/null

If that command gives no output, your configuration should be fine.
[24 Jun 2016 8:10] Pascal KISSIAN
please let's speak about facts, not opinions.

mysqld --verbose --help 2>&1>/dev/null  gives no output

my 100% reproductible blocker critical issue:

1) service mysql restart   --->    all is working ok 
2) rsync -avz              --->    mysql server does not work anymore
3) service mysql restart   --->    all is working ok again...

I tested replacing rsync with a tar : all is ok with tar ( some hours later... a rsync takes at most 1 mn...)
I agree with you, a simple read of files SHOULD NOT crash the server... but it does!!! :o(  ... 
( surprisingly, bugs have behaviours that you don't expect)
I suspect file lock conflicts problems......

btw, I experienced 2 more bugs that I did'nt report because I found work arounds.
 - When I upgraded to Ubuntu 16, I tried with:
   "mysql_upgrade --force" and "mysqlcheck --all-databases --check-upgrade" 
   both crashed at about 40% of expected...
   I then tried with a php program doing a mysqldump then "mysql < dump"  for all databases ... it also crashed at about 40%!
   then I included in my php program a "service myql restart" each 25% of the work... and that did the job!!!!

 - The "apt upgrade" of the mysql server update hanged!!!!!
   the mysql update script was trying to execute  a query located in a temporary file in which there was a query like:
   "ALTER USER 'root'@'localhost' .... ; SHUTDOWN;"
   I had to kill the mysqld process in order to have the update scripts of all other packages to be run....

More details about my mysql databases structure:
  - All databases (except core mysql ones) are myisam type and there is a symlink in the datadir directory for each database.
  - the symlink points to a database located in a different directory according to year-month of creation.
  - database size is about 120 Gb , stored in 22000 databases, most databases have about 50 tables within,
    the biggest database has 169 tables and is 4Gb large with about 40,000,000 records.

I do not need, nor want, any kind of security features on the database.
[24 Jun 2016 9:55] Lars Tangvald
When you use rsync and the server crashes, do you get an error in the mysql error log (default /var/log/mysql/error.log)?

How about when mysql_upgrade fails?
[24 Jun 2016 10:13] Pascal KISSIAN
Hi Lars,

There is only one entry in error log file when the server crashes after rsync:
[Warning] The plugin 'auth_socket' used to authenticate user 'root'@'localhost' is not loaded. Nobody can currently login using this account.

for mysql_upgrade failure, there is much more info:
I just tried "mysql_upgrade --force" on the already updated machine, and it  failed with the following message in the console:
Error occurred: Error during call to mysql_check.

and the error.log file contains:
2016-06-24T09:58:21.748761Z 5 [Warning] The plugin 'auth_socket' used to authenticate user 'root'@'localhost' is not loaded. Nobody can currently login using this account.
10:03:41 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=536870912
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 584285 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7ff1e400eb10
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff21c042e70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xe79e0b]
/usr/sbin/mysqld(handle_fatal_signal+0x489)[0x783379]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7ff2788fa3d0]
/usr/sbin/mysqld(_Z29create_table_share_index_statPK11TABLE_SHAREj+0xae)[0xf14bbe]
/usr/sbin/mysqld(_ZN15PFS_table_share25find_or_create_index_statEPK11TABLE_SHAREj+0x50)[0xf151f0]
/usr/sbin/mysqld(_Z26find_or_create_table_shareP10PFS_threadbPK11TABLE_SHARE+0x43f)[0xf1702f]
/usr/sbin/mysqld(_Z15get_table_shareP3THDP10TABLE_LISTPKcmjPij+0x1f6)[0xbe66b6]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0x9fa)[0xbe761a]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x79b)[0xbee86b]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTjP19Prelocking_strategy+0x43)[0xbeeee3]
/usr/sbin/mysqld[0xda8c6e]
/usr/sbin/mysqld(_ZN19Sql_cmd_check_table7executeEP3THD+0x97)[0xdaa717]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x233c)[0xc419ac]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3dd)[0xc462fd]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xf6a)[0xc472ba]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1c7)[0xc48737]
/usr/sbin/mysqld(handle_connection+0x288)[0xd067d8]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xef4a44]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7ff2788f06fa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff277d85b5d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ff1e4013750): is an invalid pointer
Connection ID (thread ID): 5
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2016-06-24T10:03:41.283654Z 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4161)
2016-06-24T10:03:41.283688Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 2000)
2016-06-24T10:03:41.415248Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-06-24T10:03:41.415735Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.12-0ubuntu1.1) starting as process 5940 ...
2016-06-24T10:03:41.427575Z 0 [Note] InnoDB: PUNCH HOLE support available
2016-06-24T10:03:41.427589Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-06-24T10:03:41.427592Z 0 [Note] InnoDB: Uses event mutexes
2016-06-24T10:03:41.427594Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2016-06-24T10:03:41.427596Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-06-24T10:03:41.427598Z 0 [Note] InnoDB: Using Linux native AIO
2016-06-24T10:03:41.427854Z 0 [Note] InnoDB: Number of pools: 1
2016-06-24T10:03:41.428173Z 0 [Note] InnoDB: Using CPU crc32 instructions
2016-06-24T10:03:41.431871Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2016-06-24T10:03:41.435872Z 0 [Note] InnoDB: Completed initialization of buffer pool
2016-06-24T10:03:41.437117Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2016-06-24T10:03:41.450154Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2016-06-24T10:03:41.508605Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 3760109
2016-06-24T10:03:41.508617Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 3760118
2016-06-24T10:03:41.508673Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 3760118
2016-06-24T10:03:41.508675Z 0 [Note] InnoDB: Database was not shutdown normally!
2016-06-24T10:03:41.508677Z 0 [Note] InnoDB: Starting crash recovery.
2016-06-24T10:03:41.630703Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2016-06-24T10:03:41.630718Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2016-06-24T10:03:41.630736Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2016-06-24T10:03:41.657905Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2016-06-24T10:03:41.658342Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2016-06-24T10:03:41.658347Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2016-06-24T10:03:41.658598Z 0 [Note] InnoDB: Waiting for purge to start
2016-06-24T10:03:41.708716Z 0 [Note] InnoDB: 5.7.12 started; log sequence number 3760118
2016-06-24T10:03:41.708877Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2016-06-24T10:03:41.709136Z 0 [Note] InnoDB: Buffer pool(s) load completed at 160624 12:03:41
2016-06-24T10:03:41.709347Z 0 [Note] Plugin 'FEDERATED' is disabled.
2016-06-24T10:03:41.710934Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
2016-06-24T10:03:41.710941Z 0 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306
2016-06-24T10:03:41.710945Z 0 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-06-24T10:03:41.710956Z 0 [Note] Server socket created on IP: '127.0.0.1'.
2016-06-24T10:03:41.714249Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.12-0ubuntu1.1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)
[24 Jun 2016 11:22] Pascal KISSIAN
Hi Lars,

I made another test...

1) I commented out the skip-grant-tables option in ini file.
2) I created a dummy user with no password and all privileges
CREATE USER 'dummy'@'localhost' ;
GRANT ALL PRIVILEGES ON *.* TO 'dummy'@'localhost' WITH GRANT OPTION;
FLUSH PRIVILEGES;

no more crash with rsync ......

I will have to change within a few hundreds of scripts  "-u root" with "-u dummy" .....

So I have my workaround for now....

Regards,
Pascal
[24 Jun 2016 11:41] Lars Tangvald
Great that you have a workaround. We'll look into the hard crash you saw before (I can't reproduce it with a simple test database). In 5.7 the core tables are innodb, while in 5.5 they were MyISAM. This might be why you didn't see problems with 5.5.

I need to stress, though, that backing up the database with rsync while the server is running is very unsafe and could lead to data corruption, so it would be better if the job stops the server first. If the downtime is too big an issue, you might test doing the rsync like now, then shutting down the server and doing another rsync to cover anything that might have changed while the bigger job was running.
[24 Jun 2016 12:59] Pascal KISSIAN
even without skip-grant-tables, "mysql_upgrade --force" crashes the same way... (exactly same call stack)...

The mysql server has to be alive 24/24, I cannot stop it.

I make snapshots every hour, running the rsync twice and executing "mysqladmin flush-logs flush-privileges flush-tables" before each rsync.. the second one takes a few seconds as all filesystem tree is cached in memory...
(I have a 32Gb RAM system)
[24 Jun 2016 13:55] Lars Tangvald
We'd need some way of reproducing the segfault to find and fix the cause. Is there any chance you could find the data that's causing it? If mysql_upgrade always crashes the server at the same spot, it might be a single database or table that contains data it can't handle.
[24 Jun 2016 14:15] Pascal KISSIAN
I do not think that data is in cause, but amount of data ...
(when I manually did the job splitting upgrades in 4 parts, it worked)

when executing "mysql_upgrade --force" the error occurs a litle bit before than executing "mysqlcheck --all-databases --check-upgrade" : same database but about 10 tables before....

when I execute "mysqlcheck the_database_that_failed --check-upgrade" it works ok....
[24 Jun 2016 14:18] Pascal KISSIAN
the error message is not exactly the same:

mysql_upgrade ->
Error occurred: Error during call to mysql_check.

mysqlcheck --all-databases --check-upgrade -->
mysqlcheck: Got error: 2013: Lost connection to MySQL server during query when executing 'CHECK TABLE ...  FOR UPGRADE'
[24 Jun 2016 14:18] Lars Tangvald
We can try generating a similar setup, then.
[27 Jun 2016 7:16] Lars Tangvald
A possible cause other than the switch to innodb for the system tables is that with 5.7 in Ubuntu 16.04 systemd is used, and it imposes its own limits that may override your server configuration. Some options may also need to be specified for the systemd service, as mentioned here: https://dev.mysql.com/doc/refman/5.7/en/server-management-using-systemd.html

It might be you're hitting a memory or file descriptor limit

Could you try running:
systemctl edit mysqld
Then adding the following
[Service]
LimitNOFILE=4096

Then running

systemctl daemon-reload
systemctl restart mysqld

And see if it helps any? The default file limit in systemd is 1028
Alternately if that has no effect, can you try running mysqld manually instead of with the service system and see if the issue still occurs?
[27 Jun 2016 7:44] Pascal KISSIAN
tested with LimitNOFILE=4096 ... crashing exactly the same way...

running mysqld from command line quits after a few seconds...
tested with mysqld_safe from command line : same crash...
[27 Jun 2016 8:20] Lars Tangvald
What's in the error log after trying to run manually?
Note that it's not possible to run two servers using the same data directory, so the service would need to be stopped first.
[27 Jun 2016 8:25] Pascal KISSIAN
I had stopped the service before running manualy...
the error log is the following:
2016-06-27T07:41:51.793626Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.12-0ubuntu1.1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)
07:44:05 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=536870912
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 584285 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fa1c0000ae0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fa1f32d3e70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xe79e0b]
/usr/sbin/mysqld(handle_fatal_signal+0x489)[0x783379]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7fa2528123d0]
/usr/sbin/mysqld(_Z29create_table_share_index_statPK11TABLE_SHAREj+0xae)[0xf14bbe]
/usr/sbin/mysqld(_ZN15PFS_table_share25find_or_create_index_statEPK11TABLE_SHAREj+0x50)[0xf151f0]
/usr/sbin/mysqld(_Z26find_or_create_table_shareP10PFS_threadbPK11TABLE_SHARE+0x43f)[0xf1702f]
/usr/sbin/mysqld(_Z15get_table_shareP3THDP10TABLE_LISTPKcmjPij+0x1f6)[0xbe66b6]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0x9fa)[0xbe761a]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x79b)[0xbee86b]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTjP19Prelocking_strategy+0x43)[0xbeeee3]
/usr/sbin/mysqld[0xda8c6e]
/usr/sbin/mysqld(_ZN19Sql_cmd_check_table7executeEP3THD+0x97)[0xdaa717]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x233c)[0xc419ac]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3dd)[0xc462fd]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xf6a)[0xc472ba]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1c7)[0xc48737]
/usr/sbin/mysqld(handle_connection+0x288)[0xd067d8]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xef4a44]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7fa2528086fa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa251c9db5d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fa1c00054c0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2016-06-27T07:44:06.050999Z mysqld_safe Number of processes running now: 0
2016-06-27T07:44:06.052431Z mysqld_safe mysqld restarted
[23 Aug 2016 10:28] Pascal KISSIAN
after updating to mysql-server 5.7.13-0ubuntu0.16.04.2

both 
# mysql_upgrade --force
and 
# mysqlcheck --all-databases --check-upgrade

work again without crashing......

did you find and fixed that misterious issue?

... and having a user different than root was a workaround that is enough to me for the problem of crash with rsync...

I think that the issue can be closed for now....

Best Regards,
Pascal
[16 Jun 2017 9:04] MySQL Verification Team
closing the issue as requested by reporter in previous comment on [23 Aug 2016 10:28] Pascal KISSIAN.