Bug #19938 Valgrind error (race) in handle_slave_sql()
Submitted: 19 May 2006 10:46 Modified: 31 May 2006 12:05
Reporter: Kristian Nielsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0.22 OS:Linux (Linux/All)
Assigned to: Magnus Blåudd CPU Architecture:Any

[19 May 2006 10:46] Kristian Nielsen
Description:
There is a race condition at the end of handle_slave_io() in sql/slave.cc:

  // tell the world we are done
  pthread_mutex_unlock(&rli->run_lock);
#ifndef DBUG_OFF // TODO: reconsider the code below
  if (abort_slave_event_count && !rli->events_till_abort)
    goto slave_begin;
#endif  
  my_thread_end();
  pthread_exit(0);
  DBUG_RETURN(0);				// Can't return anything here

After pthread_mutex_unlock(&rli->run_lock), another thread may call free() on the memory pointed to by rli, causing the rli->events_till_abort expression to reference invalid memory.

The error triggers this Valgrind error:

VALGRIND: 'Invalid read of size 4'
    COUNT: 1
    FUNCTION: handle_slave_sql    FILES:    slave.err
    TESTS:    rpl000013
    STACK: at 0x6D05D7: handle_slave_sql (slave.cc:3955)
             by 0x4C3CC63: start_thread (in /lib64/tls/libpthread-0.60.so)
             by 0x52F8242: clone (in /lib64/tls/libc-2.3.2.so)
           Address 0x550504C is 9,668 bytes inside a block of size 11,832 free'd
             at 0x4A19622: free (vg_replace_malloc.c:235)
             by 0x6CB317: end_slave() (sql_class.h:162)
             by 0x5AC203: close_connections() (mysqld.cc:795)
             by 0x5A94D4: kill_server(void*) (mysqld.cc:978)
             by 0x5A78BF: kill_server_thread (mysqld.cc:1004)
             by 0x4C3CC63: start_thread (in /lib64/tls/libpthread-0.60.so)
             by 0x52F8242: clone (in /lib64/tls/libc-2.3.2.so)

Note that the problem only appears in a debug build, because of the #ifdef.

How to repeat:
See 5.0 pushbuild.

Or mysql-test-run --valgrind-all rpl000013, but since it is a race it may not repeat easily depending on exact timing.

Suggested fix:
Remove the offending code:

#ifndef DBUG_OFF // TODO: reconsider the code below
  if (abort_slave_event_count && !rli->events_till_abort)
    goto slave_begin;
#endif  

It has already been removed in 5.1.
[23 May 2006 8:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6753
[23 May 2006 14:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6773
[23 May 2006 18:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6787
[24 May 2006 7:04] Magnus Blåudd
Pushed to 5.0.22
[31 May 2006 12:05] Paul DuBois
No changelog entry needed.