Bug #6334 Flush Privileges causes Hang
Submitted: 29 Oct 2004 23:36 Modified: 16 Dec 2004 9:44
Reporter: Erik Perrohe Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.3-beta OS:Linux (Linux/Fedora Core 1)
Assigned to: CPU Architecture:Any

[29 Oct 2004 23:36] Erik Perrohe
Description:
The apparent cause is -- When a mysql.user.host refers to an unresovable host name and --skip-name-resolve  is true  then doing a Flush Privileges; causes the server to Hang and tables to be corrupted.

However, attempts to isolate this problem have been frustrating.  I thought I had nailed the exact cause but then inadvertantly discovered yet another way to repro the hang while avoiding my isolation scenario.  

How to repeat:
The easist way to reproduce this problem is to:
1) Setup a mysql test server on an alternate port such as 4306 and using an alternate socket path.  Something like this...

/path/mysqld --defaults-file=/etc/mysql_tst1.cnf --port=4306 --verbose --basedir=/usr/local/mysql --datadir=/var/db/mysql_tst1 --pid-file=/var/db/mysql_tst1/mysqld.pid --log-error=/var/log/mysql_tst1.err.log --log-warnings --user=mysql --socket=/var/db/mysql_tst1/mysql.sock --concurrent-insert --skip-name-resolve

2) Install http://sourceforge.net/projects/wikipedia/
this will create a bunch of tables and privileges

3) start mysql -P4306 -h127.0.0.1 -p

4) Change all instances of "localhost.localdomain" to x  (why?  well because I discoved my dns was not configured properly and the host was unresolvable.  later I discovered that *any* unresolvable host will cause this Hang -- which is why it's a very serious bug).

update user set host='x' where host='localhost.localdomain';

4) flush privileges;

==========

Result:  Client and Server Hang
Client can be killed with ctrl-c
daemon must be killed with -SIGKILL

Any Tables that were open will be corrupt

for example:

--- mySQLd with id = tst  and port 3306
Fri Oct 29 13:24:43 PDT 2004, Starting mysql tst
...
InnoDB: Unable to lock ./ibdata1, error: 11InnoDB: Error in opening ./ibdata1
041029 13:24:43  InnoDB: Operating system error number 11 in a file operation.
InnoDB: Error number 11 means 'Resource temporarily unavailable'.
InnoDB: See also section 13.2 at http://www.innodb.com/ibman.php
InnoDB: about operating system error numbers.
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!
041029 13:24:43  Can't init databases
041029 13:24:43  Aborting

041029 13:24:43  /usr/local/mysql/bin/mysqld: Shutdown complete

I also had trouble doing the repairs and resorted to doing a restore from backup.  I did save a copy of the corrupted data if it's needed (but it does have some confidential info so we would have to make arrangements for appropriate handling).

[root@207-234-145-137 mysql_usr]# /usr/local/mysql/bin/myisamchk -s ./*/*.MYI
myisamchk: MyISAM file ./mysql/columns_priv.MYI
myisamchk: warning: 1 client is using or hasn't closed the table properly
MyISAM-table './mysql/columns_priv.MYI' is usable but should be fixed

[root@207-234-145-137 mysql_usr]# /usr/local/mysql/bin/myisamchk -s -r ./*/*.MYI
Segmentation fault

[root@207-234-145-137 mysql_usr]# /usr/local/mysql/bin/myisamchk -s -o ./mysql/columns_priv.MYI

[root@207-234-145-137 mysql_usr]# /usr/local/mysql/bin/myisamchk -s ./*/*.MYI
myisamchk: MyISAM file ./mstore/countries.MYI
myisamchk: error: Found key at page 6144 that points to record outside datafile
myisamchk: error: Checksum for key:  1 doesn't match checksum for records
myisamchk: error: Checksum for key:  2 doesn't match checksum for records
MyISAM-table './mstore/countries.MYI' is corrupted

(Yikes!!!  did a restore instead...)

===================
Here are what my privlege tables look like (OUTFILE format, irrelevant items ommitted).  Note that they have been modified from the default wiki install.  But this config will cause the hang per the above repro steps of changing the host to 'x' and then doing a Flush. (many vairations on the theme also cause the hang).

Note: Because I am using an alternate port & socket 'localhost' is not valid, so everything has been set to 127.0.0.1.  For security reasons, I don't allow non-local access.

<pre>

[root]# cat user
127.0.0.1       wikiSESuser     ***      NN       N       N       N       N       N       N       N       N       N      NN       N       N       N       N       N       N       N       N              00       0
127.0.0.1       wikiSqluser     ***      NN       N       N       N       N       N       N       N       N       N      NN       N       N       N       N       N       N       N       N              00       0

[root]# cat db
127.0.0.1       wikiSESdb       wikiSESuser     Y       Y       Y       Y      NN       N       N       N       N       N       N

[root]# cat tables_priv
127.0.0.1       wikiSESdb       wikiSqluser     user    root@127.0.0.1  2004-10-06 07:16:36             Select
127.0.0.1       wikiSESdb       wikiSqluser     cur     root@127.0.0.1  2004-10-06 07:16:36     Select
127.0.0.1       wikiSESdb       wikiSqluser     old     root@127.0.0.1  2004-10-06 07:16:36     Select
127.0.0.1       wikiSESdb       wikiSqluser     archive root@127.0.0.1  2004-10-06 07:16:36     Select
127.0.0.1       wikiSESdb       wikiSqluser     links   root@127.0.0.1  2004-10-06 07:16:36     Select
127.0.0.1       wikiSESdb       wikiSqluser     brokenlinks     root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     imagelinks      root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     site_stats      root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     ipblocks        root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     image   root@127.0.0.1  2004-10-06 07:16:36     Select
127.0.0.1       wikiSESdb       wikiSqluser     oldimage        root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     recentchanges   root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     watchlist       root@127.0.0.1 2004-10-06 07:16:36      Select
127.0.0.1       wikiSESdb       wikiSqluser     math    root@127.0.0.1  2004-10-06 07:16:36     Select

[root]# cat columns_priv
127.0.0.1       wikiSESdb       wikiSqluser     user    user_id 2004-10-01 16:09:53     Select
127.0.0.1       wikiSESdb       wikiSqluser     user    user_name       2004-10-01 16:09:53     Select
127.0.0.1       wikiSESdb       wikiSqluser     user    user_rights     2004-10-01 16:09:53     Select
127.0.0.1       wikiSESdb       wikiSqluser     user    user_options    2004-10-01 16:09:53     Select

</pre>

Suggested fix:
Apparently the server is going into an endless loop trying to resolve the host name.

In some scenarios, cpu usage gets pegged at 99%.  But in others it goes into an idle loop at 0%.

I have tried letting it sit for extended amount of time to see if it ever gives up and times out, it did not.

I expect that any unresolvable host name should time out after a reasonable number of attempts.  Of course I could be wrong about the cause; but this is what the symptoms are generally pointing to.

Flush Privileges works just fine as long as all of the host names are valid.  This despite the fact that it is supposed to be ignoring name resolution.

Further Note:  After doing a Flush Privileges and just before it Hangs, I do see complaints like this:

041029 13:38:43  Warning: 'user' entry 'wikiSESuser@x' ignored in --skip-name-resolve mode.
[30 Oct 2004 16:06] MySQL Verification Team
First of all, your defaults file would be needed to repeat a bug.

Second, do we truly need to install entire Wikipedia or similar large sofware in order
to reproduce a bug with privilege tables ??
[11 Nov 2004 13:12] Erik Perrohe
Q: do we truly need to install entire Wikipedia or similar large sofware in order to reproduce a bug with privilege tables ??

A: No, I said that this would be the easiest way to recreate the environment.
Not the required way.  I think if you just enter the privlegs tables as they appear in the bug... they are in dump file format...  or a stripped down version...  that you should be able to repro.  I am pretty sure the key to this bug is to have an unresolvable domain in the user table.  But I just don't have time available to test all these possibilities.

-----
Q: First of all, your defaults file would be needed to repeat a bug

A: My config file is mostly empty...  but here it is...

[mysqld]
open-files-limit=30000
delayed_insert_timeout=60
max_delayed_threads=20
max_allowed_packet=10000000

[safe_mysqld]

[mysql]

[mysqladmin]
[15 Nov 2004 14:10] Marko Mäkelä
Erik,
the excerpt
"InnoDB: Unable to lock ./ibdata1, error: 11InnoDB: Error in opening ./ibdata1
041029 13:24:43  InnoDB: Operating system error number 11 in a file operation.
InnoDB: Error number 11 means 'Resource temporarily unavailable'."
indicates that you were probably running two instances of mysqld on the same InnoDB tablespace. It doesn't indicate any corruption - the advisory file locking was introduced in order to avoid corruption.
[15 Nov 2004 22:58] Erik Perrohe
This server is running multiple copies of mysqld.  But each has a different port and socket and of course a different db.

The database does not even contain any innodb tables, other then those that mysqld insists upon automatically generating.

As a result of this bug, the mysqld did have a catastrophic shutdown which would have left the advisory lock file behind.

The innodb messages sound pretty dire, glad to know all it's complaining about is a lock file.  A search of the docs did not reveal any apparent way to manually repair a corrupt innodb (myisamcheck equivlent); apparently it is just never supposed to need repair???  

I was not able to make the innodb error message go away...   and mysqld refused to start, I found nothing in the docs about repairing this innodb problem.  since the innodb tables were empty I resorted to deleting them.  

Now I know that all I have to do is delete the Lock file, what is the name of this file?  How about documenting this?  If I had actually been using innodb tables, and in my search of repair howtos, not finding any docs about the lockfile, I would have been up the creek without a paddle.
[15 Nov 2004 23:07] Erik Perrohe
Whooaa How Can You Say this is Not A Bug??!!!

The Bug being reported here is a Hang/Crash of mysqld because of an unresolvable host name in the User Table.

Under no circuimstances should doing a Flush Privileges cause a crash, but it does, and that is the primary issue of this bug.  The rest is just documenting what happened after the crash.

The fact that the innodb lock file prevented being able to restart mysqld after the crash is a "side show" and is worthy of a seperate bug in it's own right.
[15 Nov 2004 23:14] Erik Perrohe
I can reproduce this crash/hang 100% of the time on my server.  And in fact must be very careful to avoid causing it to happen.

Doing a Flush Privileges with an unresolvable Host name should be considered a normal use scenario.  There is no way to guarantee that a host will always be resolvable.

If needed, I will happily arrange for a dev to have access to my server.
[16 Nov 2004 9:44] Marko Mäkelä
I wouldn't say "not a bug", but we need a reduced, repeatable test case, e.g., commands to type in the mysql client in order to get the crash on a freshly installed mysqld. Maybe you should enable the query log in order to see the statements your application is trying to execute.

At startup, InnoDB tries to acquire locks on the contents of the data files. There's no separate data file. If any of the data files have been locked by some process (typically mysqld), InnoDB refuses to start. Have you checked with "ps ax" or "ps -fe" that no instance of mysqld is running? When mysqld terminates abnormally, some threads could be left running. I don't think that the InnoDB file locking is relevant to this bug report.
[14 Feb 2005 22:54] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".