Bug #644 | crash in add_key_field when using prepared statements | ||
---|---|---|---|
Submitted: | 12 Jun 2003 14:56 | Modified: | 11 Mar 2004 8:59 |
Reporter: | Dennis Haney | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.1 | OS: | Linux (linux 2.4) |
Assigned to: | Oleksandr Byelkin | CPU Architecture: | Any |
[12 Jun 2003 14:56]
Dennis Haney
[18 Jun 2003 4:51]
Dennis Haney
Its seems the problem is centered around prepared statements. When I changed my program to using normal mysql_query, mysql_fetch_row, mysql_free_result it works fine.
[8 Jul 2003 0:42]
Venu Anuganti
Hi !! Could you please provide me the table schema to reproduce the bug ? Also, is it from the latest Bk tree or from 4.1.0 release version ? Thanks Venu
[8 Jul 2003 5:26]
Dennis Haney
Latest bk of the 4.1 tree. Crash happens with the rubis benchmarks (which is this crash dump), but also with a normal tpcw bechmark without thinktime. I happens every time I try to run more than two threads using the prepared stetements, but not at the same statement (usually the same type though), and usually within a few seconds of starting.
[19 Aug 2003 14:31]
Venu Anuganti
will be fixed in the next release ...
[20 Feb 2004 8:07]
Min Xu
I have encounted the same problem on solaris with the latest 4.1.1 release. I wonder if this bug is fixed in the latest development tree? From the behavior is the bug, it looks to me like a race condition. Since I am currently working on an automatic race detector, I try applied my detector with the execution. I subsequently found a few races in the buggy execution. I wonder do you want me to submit such information? I can either post the races to this bug or file new bug report for each of the race, or simple post them on the internals@ mailing list. Which method do you prefer? Just let me know. I understand the reason the race is not fixed for half year is perhaps 1. nobody is using prepared statement feature yet. 2. It perhaps is also because Venu left MySQL and then the bug is left unnoticed. 3. Or on the other hand, maybe this bug is harder to repeat comparing deterministic ones. 4. Or finally, maybe this bug is just harder to understand without a good tool to help find the race condition. Which of the four do you think contributed more of the non-fix? Is the bug easy to find by just look at the stack trace provided by Dennis? Thank you very much! -Min
[21 Feb 2004 7:24]
Sergei Golubchik
it was forgotten mostly because of "latest bk" in the version field. This bug didn't show up when we were trying to make sure that no bugs are left unfixed before doing a release. But after your email to mailing list (thank you!) I changed "version" field to a more correct "4.1", and reassigned a bug to a developer who is fixing it.
[21 Feb 2004 13:38]
Min Xu
Thanks very much for the reply. So it looks like it is still a bug. Should I post some potential races I found to this bug report or some places for people to look at?
[22 Feb 2004 6:45]
Oleksandr Byelkin
Can you provide more information about bug (schema, data). I tested it, but how I can be sure that it is not just wrong data/schema to repeat bug. Or you might be able to test last bk repository version (I have pushed a lot of changes recently).
[22 Feb 2004 15:11]
Min Xu
Hi Oleksandr Byelkin, Thanks for your reply. This is gonna be rather long comments, since I will put as much as I know about this bug in the comments. :-) I am running 4.1.1 on solaris. I haven't try the latest BK since I am not familiar with bitkeeper. But I will try when I have more time. In the mean time let me give you information on when does the bug show up on my system and what my experimental tool says about the potential race conditions (they may or may not be the causes of the bug). I hope this can save you time in finding the bug. I use a script called "pupulate_db.sh" to create the DB and insert data before my test. The script is based on Dennis's code. You can find the script here: (I cannot upload the file to the bug database since I am not the submitter or the developer, sorry.) http://www.cs.wisc.edu/~files/ [1] As you can see, the schema is very simple. And the table size is quite small too. I actually found the crash happens more often when the table size is small. The script should generate a query trace file called populate_db.txt (at the same directory [1]). The trace contains simple transactions than only perform a join on a select query). The client is writen in c++, in a file called runtrans.cc (also from Dennis, thanks Dennis!). I modified it to use prepared statement for the queries instead of the straight ones. When running the client with multiple threads, ( I've tested 5 and 9 threads), the server crashed a few seconds after the client starts. Here is the sample error message I got from the error log: =========================================================================== 040212 11:27:00 mysqld started 040212 11:27:00 InnoDB: Started; log sequence number 0 43634 /scratch/mysql/libexec/mysqld: ready for connections. Version: '4.1.1-alpha' socket: '/tmp/mysql.sock' port: 3306 mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388600 read_buffer_size=131072 max_used_connections=8 max_connections=100 threads_connected=9 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. -e \nNumber of processes running now: 0 040212 11:30:51 mysqld restarted 040212 11:30:52 InnoDB: Started; log sequence number 0 43634 /scratch/mysql/libexec/mysqld: ready for connections. ... =========================================================================== BTW, the machine I am running this test on is a SUN SPARC box with 8 processors and 32 GB main memory. OK, that's all about how to reproduce the bug. It seems it is not hard to reproduce since the DB schema was not quite uncommon. In fact, Dennis told me that I used a different one than the one he used. The bug doesn't seem to be specific to DB schema. It does only happen with prepared statement though. Now, if you wish, the following are potential race conditions found by my experimental race detector during execution close to the crash of the server. They may have caused the bug. They are multiple of them. I am sure not all of them are really harmful. You know, it was found by an automatic tool, and my knowledge of MySQL doesn't allow me distinguish which is real bug and which is not. First of all, let me say a few words on the format of the following report. My detector reports two piece of information: * Two statements executed from two threads that raced. One of the statement is a read from a shared variable, and the other is a write to the same variable. * A trace of statements (a program slice, if you wish) that was affected by the read statement from the first bullet. The trace of statements have the format of read A -> write B -> read B -> write C -> etc. Between each read/write pair of statements, there is some data dependency, i.e. A's value affect B's value. The trace finally ends with a write statement. The race happens (i.e. the write statement from the first bullet happens just before the final write statement but after the previous write in the trace. (Notice that, these statements are not necesarily from within the same function. They may came from different function, or multiple calls of the same the function.) The tool is still experimental. It is possible to make mistakes. Some race may seem non-sense, but keep in mind it may due to context switch, where the tool thinks all accesses from the same _processor_ are all from the same _thread_, where in fact they can be from different threads. But don't worry, I have done initial screening as much as I could. The obviouly false ones are not reported here. I will sort these races from the most likely to the least likely. ============================================================================ queue_remove <--> queue_insert At least two races on queue->elements and queue->root[*] was found in the two functions above. 1. The racing statements are: mysys/queues.c:202 queue->root[idx]=queue->root[queue->elements--]; and mysys/queues.c:175 idx= ++queue->elements; After "queue->elements" was loaded, decremented and stored on line 202, it was used to compute half_queue on line 227 (half_queue=(elements=queue->elements) >> 1). Finally the value of half_queue was saved by a kernel context switch. But before the save finishes, the original "queue->elements" value was incremented by another thread on line 175. The context switch should happen before queue_remove returns (_downheap was inlined in my code). The value of "half_queue" may or may not be still useful when the context switch happens. If it is useful, then it is likely a race condition. 2. The racing statements are: queues.c:202 queue->root[idx]=queue->root[queue->elements--]; and queues.c:187 queue->root[idx]=element; After the read of "queue->root[queue->elements--]" on line 202, the value was passed to "element" in the inlined function of _down_heap on line 226(element=queue->root[idx]). Then the value was passed to "queue->root[idx]" on line 246 (queue->root[idx]=element). Then another invocation of queue_remove passed the value to "queue->root[idx]" (idx maybe different from the previous idx) on line 202 and similarly the value was passed to "queue->root[idx]" at line 246. But before the write on line 246 finishes, the original value of "queue->root[queue->elements--]" was changed by a different thread on line 187 (queue->root[idx]=element). This may not be harmful. I don't know if the value of "queue->root[queue->elements--]" should hold during the two calls of queue_remove. In summary, the race was: "line 202 read" --> "line 226" --> "line 246 write" --> "line 202 read" --> "line 226" --> "line 246 write" Another race is quite similar: "line 202 read" --> "line 202 write" --> "line 202 read" --> "line 226" --> "line 246 write" =========================================================================== queue_remove <--> queue remove One race on queue->root[] was found. The racing statements are: 201 byte *element=queue->root[++idx]; /* Intern index starts from 1 */ and 246 queue->root[idx]=element; The value of "queue->root[++idx]" was first copied into a register used for element. Then a context switch happens (this should happen before the function returns, otherwise, the register won't be saved since sparc has windowed register and function returns pop a register window). During the context switch the register was saved, but before the save finishes, the original value of "queue->root[++idx]" was updated by a different thread on line 246. =========================================================================== hash_search <--> my_hash_insert The race condition is on "pos->data" variable. The two racing statements are: mysys/hash.c:157 DBUG_RETURN (pos->data); and hash.c:338 pos->data=(byte*) record; After pos->data is read, it is used in open_table (line 830 in sql_base.cc). "for (table=(TABLE*) hash_search(&open_cache,(byte*) key,key_length)" Then in the same function, the value of "table" is passed to thd->open_tables on line 897. Later on, in function close_thread_table on line 424, the value is passed to "table", finally "table" is saved in "unused_table" and "table->next" and "table->prev" in the same function on line 455. But before the value is saved another thread modifies pos->data on line 338 of hash.c in function my_hash_insert(). This may or may not be a bug. It depends on does the value of the hash table has to hold for the time of open_table and close_thread_table. Similarly, in hash_next, on line 183 of hash.c, the variable "pos->next" is raced between two thread. The dependency again goes into both open_table and close_thread_table. And finally, the overwriting statement is on line 339 of hash.c in my_hash_insert. ============================================================================ mi_open <--> mi_open The two racing statements are: myisam/mi_open.c:57 for (pos=myisam_open_list ; pos ; pos=pos->next) and myisam/mi_open.c:583 myisam_open_list=list_add(myisam_open_list,&m_info->open_list) "myisam_open_list" is used to compute pos then the data dependency is eventually propagated to 580 thr_lock_data_init(&share->lock,&m_info->lock,(void*) m_info); by "share->lock" varible. Finally the dependency is propagated into thr_lock.c:661 pthread_mutex_lock(&lock->mutex); However, in the mean time another thread executes mi_open() and changed myisam_open_list on line 583. This may not be a bug. It is perhaps the case once myisam_open_list is used to compute some pointer in the for loop, it can be freely changed. ============================================================================ my_hash_insert <--> my_hash_insert This is also similar to the race in hash_search and my_hash_insert. The two racing statements are: hash.c:324 idx=hash_mask(rec_hashnr(info,record),info->blength,info->records+1); and hash.c:349 info->blength+= info->blength; "info->blength" was used to compute idx. The value of idx is then passed to empty[0] on line 334 (empty[0]=pos[0];). Then the value is used in hash_next on line 183 (hash->current_record= idx;). But before the value of "hash->current_record" is saved, another thread changed "info->blength" in my_hash_insert on line 349. Again, this might be false alarm since info->blength does have to be fixed by the time hash_next is called. ============================================================================ _mi_read_rnd_static_record <--> mi_lock_database The two racing statements are: myisam/mi_statrec.c:241 if ((! cache_read || share->base.reclength > cache_length) && share->tot_locks == 0) and myisam/mi_locking.c:160 share->tot_locks++; The race is on variable "share->tot_locks". Once the value of it was read on line 241, a context switch may occur. Even though the variable only controls the if statement, the value can remain in a register until the context switch happens. I don't know if the switch happens before or after the if statement finishes. But during the save of register of the context switch another thread come and changed "share->tot_locks" on line 160 of mi_locking.c. Anyway, this might be false alarm. But if the store of "share->tot_locks" is before the if-statement finishes, the race might be likely harmful. ============================================================================ The following are definitely not bugs. But I thought it might be useful to report how often they happen. Basically, by default MySQL compile without SAFE_STATISTIC flag being set. Therefore, I have seen races on the statistic variables in various functions. The following are relative frequency of them: times line statement 1 sql/net_serv.cc:556 statistic_add(bytes_sent,length,&LOCK_bytes_sent); 3 sql/sql_parse.cc:1800 statistic_increment(com_stat[lex->sql_command],&LOCK_status); 5 sql/ha_myisam.cc:979 statistic_increment(ha_read_next_count,&LOCK_status); 22 sql/net_serv.cc:768 statistic_add(bytes_received,(ulong) length,&LOCK_bytes_received); 24 sql/ha_myisam.cc:1029 statistic_increment(ha_read_rnd_next_count,&LOCK_status); -Min
[22 Feb 2004 15:13]
Min Xu
Sorry, the url was: http://www.cs.wisc.edu/~xu/files/ not the incorrect one in my last comment.
[23 Feb 2004 10:18]
Sergei Golubchik
Min, could you do the following, please: 1. move this race-condition report to a separate bug entry (basically copy-paste) ? I could do it myself, but it would be nice to have you as an original author, for you to get email notifications, to be able to attach files, etc. 2. could you extend your tool to include backtrace for each race condition ? many races you report happen in low-level functions that are not expected to be thread safe, instead a caller should take care of necessary locking. the code to get a stack trace you may find in sql/stacktrace.[ch] as for statistic_add/statistic_increment you were right - it's not a bug but an intentional optimization.These macros are used to increment statistical values - e.g. number of hits/misses in key cache. So, it is allowed for these variables to be slightly "off" the actual value because of not thread-safe increment.As comment says: /* statistics_xxx functions are for not essential statistic */ Still, if atomic_add() is available it is used. If not - we gain some concurrency by losing few updates to these variables.
[23 Feb 2004 10:55]
Min Xu
Sergei, thanks for quick reply. I have opened another bug for these potential race conditions. It is bug #2926. (As you suggested it was just copy & paste :-) Collecting a stack trace is always on my todo list. I am painfully aware of that it might not be easy for me to do it soon, since my tool works on a binary level. My tool runs inside a simulator where everything available was binary instruction stream. But I will definitely look at the files you pointed out and try to figure out how to added this feature. It was also my impression these functions are not intended thread-safe. This was quite an uncommon multithreading practice to me. I previously thought the most common way to write multithreaded programs is to write thread-safe functions and use those functions without worrying about races. One thing I thought might be true in MySQL is that it seems the developer some times put a data structure (queue, hash, etc) local to a thread. Therefore, the data structure is only accessible through the current thread pointer. In these cases, there is no need for any locking to the data structure really. Is this what's happening in MySQL as an optimization? About the statistics updates. I should mention the frequency I was reporting was the actually harmful races one each of them. It means each time the race happens at least one thread's update to the stats is lost. -Min
[25 Feb 2004 6:31]
Konstantin Osipov
Could you provide us with a repeatable test case for this bug, which contains: - tables definition - data to populate tables - query that makes mysqld crash. We were fixing this piece of code recently, but can't check up whether the bug is still present. Thank you!
[25 Feb 2004 7:02]
Konstantin Osipov
Sorry, behind the long stacktrace I haven't noticed your URL. Please ignore the previous note.
[26 Feb 2004 6:46]
Oleksandr Byelkin
Thank you for bugreport. I have runned your program 3 times (5.9.5 threads). It looks like it is fixed already.
[26 Feb 2004 9:23]
Min Xu
Can you tell me what version of the code you are using? Thanks.
[27 Feb 2004 0:28]
Oleksandr Byelkin
If you mean MySQL server then it was last bk tree sources of version 4.1
[27 Feb 2004 12:19]
Min Xu
Lastest bk, doesn't compile: Making all in libmysql make[2]: Entering directory `/scratch/mysql/src/mysql-4.1/libmysql' source='libmysql.c' object='libmysql.lo' libtool=yes \ depfile='.deps/libmysql.Plo' tmpdepfile='.deps/libmysql.TPlo' \ depmode=gcc3 /bin/bash ../depcomp \ --preserve-dup-deps --mode=compile gcc -DDEFAULT_CHARSET_HOME="\"/scratch/mysql\"" -DDATADIR="\"/scratch/mysql/var\"" -DSHAREDIR="\"/scratch/mysql/share/mysql\"" -DUNDEF_THREADS_HACK -DDONT_USE_RAID -I. -I. -I.. -I../include -O3 -DDBUG_OFF -D_FILE_OFFSET_BITS=64 -DHAVE_CURSES_H -I/scratch/mysql/src/mysql-4.1/include -DHAVE_RWLOCK_T -c -o libmysql.lo `test -f libmysql.c || echo './'`libmysql.c ../depcomp: --preserve-dup-deps: command not found make[2]: *** [libmysql.lo] Error 127 make[2]: Leaving directory `/scratch/mysql/src/mysql-4.1/libmysql' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/scratch/mysql/src/mysql-4.1' make: *** [all] Error 2
[11 Mar 2004 8:34]
Min Xu
Sorry for my last msg. I have found an installation error in my automake/autoconf. I tested this bug on the latest source from BK, the crash didn't happen. I think some changes after 4.1.1 release must have fixed the bugs. Please feel free to close the bug. And I really appreciate your time working with me.
[11 Mar 2004 8:59]
Oleksandr Byelkin
Thank you for bug reporting, too. IMHO state "can't repeat" is good enoiugh, it mean that bug was fixed as side effect of some fix, but we can't spot (point) that fix exactly.
[11 Mar 2004 9:32]
Min Xu
Thanks! BTW, do you know any other likely race conditions, fixed or not fixed, that I can try to help to track down? I want to test the effectiveness with my detector.
[11 Mar 2004 9:52]
Konstantin Osipov
We are currently working on stabilization of prepared statements code. Many bugs haven't been fixed yet. That's why now there is a good chance that you spot already reported bug with your detector. I would delay your activity until bugs #2274, #2401, #2473, #2654, #2794, #2812, and several others are fixed.
[11 Mar 2004 10:19]
Min Xu
Great! Thanks for all the information. Please remember I will be more than happy to help in hunting down potential bugs. Should you have a bug that you can't find the root cause, I can try to apply my detectors, provided a repeatable execution. I currently have two detectors: The first one tells you the pairs of conflicting accesses for which no synchronizations are observed in between. (Conflicting means at least one access of the pair is write. And the two accesses are from two threads of course) The second detectors tells you when a write happens, if one of the input loads, which may affect the store (address & value) has been modified by other threads. It also tells you how does the load affect the store by giving you a trace of instructions from the load to the store. Just let me know.
[28 Nov 16:13]
Martha Simons
table_vector = (st_table **) 0x88d4408
Attachment: Error _ mysql.pdf (application/pdf, text), 78.29 KiB.