Bug #58625 invalid file descriptor -1 in syscall write() after killing connection
Submitted: 1 Dec 2010 9:23 Modified: 2 Dec 2010 11:24
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:5.6.1-debug OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[1 Dec 2010 9:23] Shane Bester
Description:
After killing a connection valgrind balks about this:

Warning: invalid file descriptor -1 in syscall write()
at: ??? (syscall-template.S:82)
by: vio_write (viosocket.c:115)
by: net_real_write (net_serv.cc:642)
by: net_flush (net_serv.cc:348)
by: net_write_command (net_serv.cc:488)
by: net_send_error_packet (protocol.cc:405)
by: Protocol::send_error (protocol.cc:577)
by: Protocol::end_statement (protocol.cc:505)
by: dispatch_command (sql_parse.cc:1427)
by: do_command (sql_parse.cc:812)
by: do_handle_one_connection (sql_connect.cc:745)
by: handle_one_connection (sql_connect.cc:684)
by: start_thread (pthread_create.c:301)

How to repeat:
run mysqld in valgrind:
valgrind -v --leak-check=full --show-reachable=yes --db-attach=yes  --track-origins=yes --tool=memcheck --num-callers=50 ./bin/mysqld --no-defaults --basedir=. --datadir=./data --skip-gr --skip-na --myisam-recover=force  --open-files-limit=20000 --port=3306 --gdb

conn1: select sleep(555);
conn2: kill 1;

Suggested fix:
don't pass known invalid handles to write()
[1 Dec 2010 10:48] Susanne Ebrecht
Hello Shane,

on which file system do you get it? ext3? ufs? xfs? ntfs?
[1 Dec 2010 10:51] Susanne Ebrecht
Also interesting for Unix systems would be the output of:
ulimit -a
or ulimit -n

We need to see the setting for open files.
[2 Dec 2010 3:16] MySQL Verification Team
were you not able to repeat it on linux?

/dev/mapper/vg_levovo-lv_home on /home type ext4 (rw)
Linux levovo 2.6.33.3-85.fc13.x86_64 #1 SMP Thu May 6 18:09:49 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Fedora release 13 (Goddard)
valgrind-3.5.0
glibc-2.12-3.x86_64
[2 Dec 2010 9:44] Davi Arnaut
Shane,

It's a old race condition due to KILL closing the socket of the connection being killed.

Make it a duplicate of Bug#37780.
[2 Dec 2010 10:44] Valeriy Kravchuk
Verified just as described on 32-bit Ubuntu:

...
101202 12:40:59 [Note] libexec/mysqld: ready for connections.
Version: '5.6.1-m5-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
==2407== Warning: invalid file descriptor -1 in syscall write()
==2407==    at 0x4048EDB: ??? (syscall-template.S:82)
==2407==    by 0x82784B9: net_real_write (net_serv.cc:642)
==2407==    by 0x8277CF0: net_flush (net_serv.cc:348)
==2407==    by 0x82799B2: net_send_eof(THD*, unsigned int, unsigned int) (protocol.cc:297)
==2407==    by 0x827A08D: Protocol::send_eof(unsigned int, unsigned int) (protocol.cc:562)
==2407==    by 0x8279E8E: Protocol::end_statement() (protocol.cc:509)
==2407==    by 0x82952E9: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1411)
==2407==    by 0x829388B: do_command(THD*) (sql_parse.cc:796)
==2407==    by 0x8291BED: do_handle_one_connection(THD*) (sql_connect.cc:745)
==2407==    by 0x8291A4D: handle_one_connection (sql_connect.cc:684)
==2407==    by 0x404196D: start_thread (pthread_create.c:300)
==2407==    by 0x4252A4D: clone (clone.S:130)
...

5.1.54 does not have this problem, so it is a regression.
[2 Dec 2010 11:24] Davi Arnaut
It's not a regression, it just depends on the platform.

Closed as a duplicate of Bug#37780.
[2 Dec 2010 11:47] Valeriy Kravchuk
It worked with 5.1.54-debug this morning on the same 32-bit Ubuntu 10.04 where it fails in current mysql-trunk. Platform was the same...
[3 Dec 2010 0:27] Davi Arnaut
KILL closes the socket of the session being killed in order to wake up the session thread in case the said thread is waiting for I/O on the socket. This is inherently race prone and can even lead to a thread reading from a unrelated file descriptor (for example, if the file descriptor number is reused quickly). In 5.1, this was only done in some platforms (like Windows), but in 5.5 this was hardwired to all platforms. So, it's not exactly a regression, but its a problem and it is going to be addressed in the context of Bug#37780.