Bug #2926 | some potential race conditions | ||
---|---|---|---|
Submitted: | 23 Feb 2004 10:41 | Modified: | 31 Mar 2004 7:34 |
Reporter: | Min Xu | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.1.1 | OS: | Solaris (solaris 8 & 9) |
Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[23 Feb 2004 10:41]
Min Xu
[23 Feb 2004 14:23]
Sergei Golubchik
> > Collecting a stack trace is always on my todo list. I am > painfully aware of that it might not be easy for me to do > it soon, since my tool works on a binary level. My tool > runs inside a simulator where everything available was > binary instruction stream. But I will definitely look at That's enough, assuming you have access to registers of the analyzed process. > the files you pointed out and try to figure out how to > added this feature. > > It was also my impression these functions are not intended > thread-safe. This was quite an uncommon multithreading > practice to me. I previously thought the most common way > to write multithreaded programs is to write thread-safe > functions and use those functions without worrying about > races. One thing I thought might be true in MySQL is that > it seems the developer some times put a data structure > (queue, hash, etc) local to a thread. Therefore, the data > structure is only accessible through the current thread > pointer. In these cases, there is no need for any locking > to the data structure really. Is this what's happening in > MySQL as an optimization? Not exactly. And a practice, actually, is common and very logical. There is some simple data structure, and a library to manage it, e.g. priority hash, binary tree, or a linked list. Of course such a generic library need not and cannot bother about threads, it can be run in single-threaded or multi-threaded environment. But if such a structure is used to store some shared data then a caller should of course take precausions to avoid race conditions. > About the statistics updates. I should mention the > frequency I was reporting was the actually harmful races > one each of them. It means each time the race happens at > least one thread's update to the stats is lost. Yes, this is exactly what I meant - using statistic_increment() one says "increment a variable really fast, and I don't care if this increment sometimes - but rarely - will be lost".
[26 Feb 2004 11:55]
Min Xu
After two "painful" (my eyes hurt :-)) days to added a rough stack trace feature into my detector. I was able get more information about the races I've reported. By looking at the calling context of each reported race condition here, I think I have ruled out two of them to be not-a-bug. However, there are still two I cannot tell if they are bugs. Basically, for the race conditions on queue_insert and queue_remove, the two functions are called from the context of thr_end_alarm and thr_alarm, which are properly locked. Therefore, this race is not a bug. For the race condition on hash_search and my_hash_insert, these two functions are called from the context of open_tables()->open_table(). The race on pos->data (table) is a false alarm since after the table is found in the hash, it is OK for another thread to modifiy the pos->data pointer (during adding another new table into the hash). Another race I couldn't tell is the race on the field "blength". The racing statements are: mysys/hash.c:324 idx=hash_mask(rec_hashnr(info,record),info->blength,info->records+1); mysys:hash.c:349 info->blength+= info->blength; According the the stack trace, idx is computed based on info->blength, then used to compute pos=data+idx and then empty[0]. Then the same thread called hash_next() in open_table(), which uses the value of empty[0] to compute something. In the mean time, info->blength was doubled when records in the table overflows. Can this be a bug? The trace shows after my_hash_insert, a position variable is compuated based on the old blength, then the position is used in addressing the record in hash_next. However, blength changed, which might affect the compuated pos? Is the hash table here a dynamic hash? Do buckets in the hash move around when the size of the hash is changed? (I assume blength is the number of buckets in the hash table) Finally, the race may really be related to bug #644 is the _mi_read_rnd_static_record vs. mi_lock_database race. The rough stack traces are: _mi_read_rnd_static_record+0xd4 mi_scan _ZN9ha_myisam8rnd_nextEPc _Z13rr_sequentialP14st_read_record+0x14 _Z10sub_selectP4JOINP13st_join_tableb _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure _ZN4JOIN4execEv and: mi_lock_database _Z15unlock_externalP3THDPP8st_tablej _Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock _ZN4JOIN9join_freeEb _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure _ZN4JOIN4execEv It seems apparently locking was not used in do_select function. The race was on the share->tot_locks variable. After ./myisam/mi_statrec.c:241 if ((! cache_read || share->base.reclength > cache_length) && share->tot_locks == 0) My detector observed the value shared->tot_locks was changed by another thread at: ./myisam/mi_locking.c:160 share->tot_locks++; before the first thread exit the function _mi_read_rnd_static_record. I don't know if the update happens before the if statement finishes though. OK, that's all I know for now. I am not sure if I explained everything well. Please ask for explanation if you need it. Looking forward to your comments! Thanks for your time!
[26 Mar 2004 13:17]
Min Xu
Hi There, I have just tested a version of mysql 4.1 from bk repository using my race detector. This new version of 4.1 doesn't crash on the prepared queries that used to crash 4.1.1 (refer to bug #644 for detail of that bug). Here I want to report what I have found. The new version of mysql doesn't not crash and my detector also didn't report one of the race I have reported here (#2926). The race is the hash_insert and my_hash_insert race I reported. Therefore, this disappearing race may have contributed to the race in #644. On the other hand, the race I reported between _mi_read_rnd_static_record and mi_lock_database is still reported in the new version of the server. This means my detector may have made a mistake. That was not a real bug. In any cases, please feel free to close this bug since the crash doesn't happen anymore with the new version of the server from BK repository. These two potential races reported here are either removed or false. Thanks for your time.
[28 Mar 2004 9:35]
Aleksey Kishkin
Dear Min Xu, thank you for your work. I close this issue.
[30 Mar 2004 5:25]
Michael Widenius
I have take a quick look on the reported things: - In MySQL all queues and hashes that are used by many threads have their own locks. - The code in myisam/mi_open.c (test_if_reopen() and list_add()) is safe because this code is protected by the mutex THR_LOCK_myisam. - The code in myisam/mi_static.c is safe as tot_locks was also incremented by this thread (as all MySQL/MyISAM code are using table locks per statement) in mi_lock.cc and we are only intersted to know if value != 0. The given code is only relevant when you are using MyISAM as a library without using table locks. My guess of what happend was that the prepared statement code used the same table object for both clients, which would lead to a lot of strange things. This issue has already been fixed in 4.1 Regards, Monty
[30 Mar 2004 6:27]
Min Xu
Thanks a lot for your comments. It more useful than you think it is to me. :-) > My guess of what happend was that the prepared statement code used the same > table object for both clients, which would lead to a lot of strange things. What you have said here makes a lot of sense to me. In fact, my detector reported many races. I did the initial screening and I remember there was races involving table structure. Basically, I have seen races to the fields of table or JOIN data structures, especially when one table is in use, it can re-initialized by a different thread. But somehow I didn't report them and I though they are false alarms. :-( > This issue has already been fixed in 4.1 It is after 4.1.1, right? The latest version I tested with the problem is 4.1.1.
[31 Mar 2004 7:34]
Sergei Golubchik
yes. But if you compile MySQL yourself anyway, you can get a daily source snapshot from http://downloads.mysql.com/snapshots.php and give it a try