Bug #56893 mysql_store_result() generates a SIGABRT from vio_delete()
Submitted: 21 Sep 2010 13:09 Modified: 15 Oct 2010 21:16
Reporter: Andy Riebs Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:5.1.50 OS:Linux (CentOS 5.3)
Assigned to: CPU Architecture:Any

[21 Sep 2010 13:09] Andy Riebs
Description:
Even with MySQL compiled --with-debug=full, Slurm (cluster resource management program) is causing a SIGABRT to be generated from libc.so's free() routine.

Here is the backtrace:

#0  0x00000031ab030215 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00000031ab030215 in raise () from /lib64/libc.so.6
#1  0x00000031ab031cc0 in abort () from /lib64/libc.so.6
#2  0x00000031ab06a7fb in __libc_message () from /lib64/libc.so.6
#3  0x00000031ab071ce2 in _int_free () from /lib64/libc.so.6
#4  0x00000031ab07590c in free () from /lib64/libc.so.6
#5  0x00002b535eb07b0f in _myfree (ptr=0x1c019af8, filename=<value optimized out>, lineno=<value optimized out>, 
    myflags=<value optimized out>) at safemalloc.c:326
#6  0x00002b535eb2c60a in vio_delete (vio=0x1b8e20b8) at vio.c:238
#7  0x00002b535eb276e0 in end_server (mysql=0x1b8283b8) at client.c:949
#8  0x00002b535eb279a8 in cli_safe_read (mysql=0x359a) at client.c:702
#9  0x00002b535eb27f0a in cli_read_rows (mysql=0x1b8283b8, mysql_fields=0x2aaaac30cc98, fields=2) at client.c:1389
#10 0x00002b535eb2602b in mysql_store_result (mysql=<value optimized out>) at client.c:2954
#11 0x00002b535e8d5e0c in _get_first_result (mysql_db=0x1b8283b8) at mysql_common.c:59
#12 0x00002b535e8d76fe in mysql_db_query_ret (mysql_db=0x1b8283b8, 
    query=0x2aaaac001978 "select cpu_count, cluster_nodes from cluster_event_table where cluster=\"andytest\" and period_end=0 and node_name='' limit 1", last=false) at mysql_common.c:617
#13 0x00002b535e8ca901 in clusteracct_storage_p_cluster_procs (mysql_conn=0x1b828118, cluster=0x1b824018 "andytest", 
    cluster_nodes=0x2aaaac0016b8 "node[11,13-17]", procs=40, event_time=1285026316) at accounting_storage_mysql.c:10505
#14 0x0000000000526286 in clusteracct_storage_g_cluster_procs (db_conn=0x1b828118, cluster=0x1b824018 "andytest", 
    cluster_nodes=0x2aaaac0016b8 "node[11,13-17]", procs=40, event_time=1285026316) at slurm_accounting_storage.c:8402
#15 0x0000000000425b86 in _accounting_cluster_ready () at controller.c:1057
#16 0x000000000042653c in _slurmctld_background (no_data=0x0) at controller.c:1353
#17 0x0000000000424d7f in main (argc=1, argv=0x7fff4c424098) at controller.c:525
(gdb) up 5
#5  0x00002b535eb07b0f in _myfree (ptr=0x1c019af8, filename=<value optimized out>, lineno=<value optimized out>, 
    myflags=<value optimized out>) at safemalloc.c:326
326	  free((char*) irem);
(gdb) print irem
$1 = (struct st_irem *) 0x1c019ad0
(gdb) print *irem
$2 = {next = 0x1b8e2090, prev = 0x1c01db10, filename = 0x2b535eb3b9e3 "vio.c", datasize = 16384, linenum = 44, 
  SpecialValue = 3957108073}
(gdb) 

How to repeat:
This is a sporadic problem that typically takes 1-3 hours and 60,000 Slurm jobs to reproduce using a test program that puts a heavy load on Slurm.

If I can find an easy way to reproduce, I'll post it here.

Suggested fix:
1. It looks like vio_delete() should do a bit more argument checking, even if it is called with a non-null argument.

2. It looks like safemalloc.c is also missing a check.  (I assume that since the SIGABRT emanates from libc, that safemalloc missed the problem.)
[15 Oct 2010 21:16] Andy Riebs
It seems we were violating the rules for multi-threaded applications.

We shall remember to read the fine documentation in the future.