Bug #65296 Segmentation fault; libmysqlclient.so.16.0.0
Submitted: 12 May 2012 19:02 Modified: 19 May 2012 17:19
Reporter: Charles A. Benner Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:5.1.62 OS:Linux (Ubuntu 11.04.1)
Assigned to: CPU Architecture:Any
Tags: libmysqlclient, segmentation fault

[12 May 2012 19:02] Charles A. Benner
Description:
Ubuntu 11.04, mySQL 5.1.62-0ubuntu0.11.04.1.

During routine (nothing at all new about scenario) use of C-app, segmentation fault occurred.  Repeatable problem.

DMESG posted with the following:
 
[ 2467.176774] edr3.o[1703]: segfault at 76452820 ip 0054f242 sp bfb5a870 error 4 in libmysqlclient.so.16.0.0[4df000+19b000]

C-App attempting a select query, as follows:

SELECT * FROM hrace3 WHERE trk = "OP" AND horse = "Speed-Par" AND rtype = "MSW" AND level = "5" AND dist = "6F" AND sex = "" AND age = "3U" AND surf = "D".  

Same query executes ok using MYSQL-GUI tool.  One row is selected.  

How to repeat:
Repeatable only by using C-app.
[13 May 2012 7:45] MySQL Verification Team
please run the program in gdb and paste here a stack trace:

gdb /path/to/app
set pagination off
set print pretty on
set print elements 1000
r

when it segfaults, then:
thread apply all bt
bt full
[13 May 2012 18:57] Charles A. Benner
Hello,

gdb trace file attached.

Regards,
[13 May 2012 19:06] Charles A. Benner
Hello,

This post refers to the diagnostic information posted immediately before this post.  After re-installing MySQL and re-compiling C-app, I note that the symptom has changed subtly.  The Select query previously causing the seg fault no longer does so, and the C-app proceeds a little farther before a similar seg fault occurs for a different select query, as seen in the gdb trace.  I have added a diagnostic message of my own to show the select query at the time of error.

Regards,
[14 May 2012 3:24] MySQL Verification Team
Hi!

So the crash is non-deterministic.  It could be some memory corruption or timeout related?   Either way, please get the valgrind output.  This will let us know the very first memory related error.  For example:

valgrind --track-origins=yes --tool=memcheck --db-attach=no --verbose --num-callers=50 --show-reachable=yes --leak-check=full /path/to/app
[14 May 2012 11:20] Charles A. Benner
Hello Shane,

Thanks for picking this one up.  The problem does appear to be deterministic, actually.  A given edr3.o compiled object is definitely behaving in a perfectly repeatable manner against a static database, and does so across reboots.  The gdb traces look identical for repeated runs.  I've used MySQL for many years with minimal trouble operating within my C-app, first under Redhat 8.0, then Ubuntu 10.10, and finally Ubuntu 11.04 as of about a month ago.  The Ubuntu upgrade contained the MySQL upgrade to 5.1.62.  I was running fine for a couple of weeks, inserting thousands of DB rows, etc., and all was well.  Then this segfault issue began occurring for a routine select query which works ok under MySQL-GUI.  I'm hoping this issue is one of my using a deprecated part of the C API or something else on the client side of things, we'll see.  In any case I will get the valgrind output and post it.  Regards.
[15 May 2012 11:08] Charles A. Benner
valgrind for segfault

Attachment: gdb_valgrind_20120514_edr3.o_segfault.txt (text/plain), 193.50 KiB.

[15 May 2012 11:56] MySQL Verification Team
The last straw, that triggered the crash was:
1 errors in context 1 of 90:
Invalid read of size 4
   at 0x40AF242: net_clear (in /usr/lib/libmysqlclient.so.16.0.0)
   by 0x40ABF78: cli_advanced_command (in /usr/lib/libmysqlclient.so.16.0.0)
   by 0x40A99A6: mysql_send_query (in /usr/lib/libmysqlclient.so.16.0.0)
   by 0x40A9A3F: mysql_real_query (in /usr/lib/libmysqlclient.so.16.0.0)
   by 0x40786B4: mysql_query (in /usr/lib/libmysqlclient.so.16.0.0)
   by 0x805454E: algor() (edr3.c:4893)
   by 0x804A299: main (edr3.c:1928)
  Address 0x76452820 is not stack'd, malloc'd or (recently) free'd

But before that there are far too many errors.
You need to investigate or fix these previous valgrind warnings in the app.  The problems there that might or might not affect other parts of code..

Run valgrind with --db-attach=yes to give you the option to break into gdb to examine application code when the error occurs.  Or just look at the filename/line number.

==19512== Conditional jump or move depends on uninitialised value(s)
==19512==    at 0x40270BC: strcpy (mc_replace_strmem.c:311)
==19512==    by 0x8050E60: ftoken(char*) (edr3.c:3741)
==19512==    by 0x8085822: openEntries() (edr3.c:20320)
==19512==    by 0x8049FEF: main (edr3.c:1848)
==19512==  Uninitialised value was created by a stack allocation
==19512==    at 0x8050D6A: ftoken(char*) (edr3.c:3724)

and:

==19512== Source and destination overlap in strcpy(0x809bdc0, 0x809bdc0)
==19512==    at 0x402713C: strcpy (mc_replace_strmem.c:311)
==19512==    by 0x8051983: picktime() (edr3.c:4101)
==19512==    by 0x8086D52: raceParse() (edr3.c:20742)
==19512==    by 0x804A12F: main (edr3.c:1878)

and:

==19512== Conditional jump or move depends on uninitialised value(s)
==19512==    at 0x8050BC5: findstr(char*, int, char*) (edr3.c:3658)
==19512==    by 0x8051FBF: getname(char*) (edr3.c:4249)
==19512==    by 0x80894BE: entryParse() (edr3.c:21224)
==19512==    by 0x804A1A9: main (edr3.c:1898)
==19512==  Uninitialised value was created by a stack allocation
==19512==    at 0x8051D86: getname(char*) (edr3.c:4211)

and:

==19512== Conditional jump or move depends on uninitialised value(s)
==19512==    at 0x8078C5C: gauge_FC(int) (edr3.c:16851)
==19512==    by 0x806DAEC: prepHstats() (edr3.c:13798)
==19512==    by 0x804A305: main (edr3.c:1936)
==19512==  Uninitialised value was created by a stack allocation
==19512==    at 0x8076B4B: updtBestPPspeed(int, int, char*, char*) (edr3.c:16106)

etc.

On a similar note, make sure the app compiles with no warnings... -Wall
[15 May 2012 20:18] Charles A. Benner
Hello Shane,

I must admit being alarmed by all the things valgrind pointed out.  Apparently  they never caused a problem before, at least of a segfault magnitude.  Perhaps new code in the C API of MySQL is less tolerant of app sloppiness than before.  I will fix the app issues and re-group.

Regards,
[19 May 2012 17:19] Charles A. Benner
Hello Shane,

Using the valgrind data, I found a storage overlay in my C-app.  When I corrected it, the segfault during mysql_query() no longer occurred.  I haven't fully analyzed the overlay and how it affected MySQL, but it would not be surprising to discover that C-app storage allocated for MySQL use had been corrupted due to the overlay somehow.  Sorry to waste your time on my own problem, latent for four years within my C-app.  I appreciate the help with tools such as gdb/bt/valgrind which I never used before now.  If I ever think of reporting another bug in MySQL, I will do the homework on my end first, using these tools.

Regards,
Charlie Benner