Bug #41092 | Maria: timeout in waiting_threads-t | ||
---|---|---|---|
Submitted: | 28 Nov 2008 12:56 | Modified: | 2 Apr 2009 7:03 |
Reporter: | Guilhem Bichot | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Maria storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0-maria | OS: | Linux (itanium) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | pushbuild, sporadic, test failure, timeout |
[28 Nov 2008 12:56]
Guilhem Bichot
[28 Nov 2008 12:57]
Guilhem Bichot
Suggest I4 as this is a unit test, random and rare failure on one Itanium platform.
[8 Dec 2008 10:02]
Guilhem Bichot
Serg said that we don't use in the server the code which is tested by the timing-out portion ("YOUNGEST") so this test failure is low priority, we will disable this portion if requested.
[10 Dec 2008 18:09]
Ingo Strüwing
Would be great, if you can do something about it. I have similar on Debian GNU/Linux unstable x86_64. After today's pull from 6.0-bugteam (revision-id:satya.bn@sun.com-20081210052907-zy3mj8sj4sqejerz) I had an infinite loop for 10 hours (!). I wonder why there is no timeout. The loop happened inside libpthread: #0 0x00002b2d39b7cd29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000041474a in safe_cond_wait (cond=0x762f40, mp=0x761660, file=0x44950e "thr_template.c", line=47) at thr_mutex.c:249 #2 0x000000000040c385 in test_concurrently (test=0x44959d "waiting_threads", handler=0x40c538 <test_wt>, n=0, m=3000) at thr_template.c:47 #3 0x000000000040cc88 in do_one_test () at waiting_threads-t.c:144 #4 0x000000000040d45b in do_tests () at waiting_threads-t.c:254 #5 0x000000000040c4f5 in main (argc=1, argv=0x7fff71789cf8) at thr_template.c:79 do_tests () is at: test_kill_strategy(YOUNGEST); do_one_test () is at: test_concurrently("waiting_threads", test_wt, THREADS, CYCLES); test_concurrently () is at: pthread_cond_wait(&cond, &mutex); safe_cond_wait () is at: error=pthread_cond_wait(cond,&mp->mutex); When I hit ddd's 'next' button at this line, the CPU goes to 100% again and the function does not return. I have seen hour-long consumption of CPU before, but it used to time out. Casual checks with 'top' used to show waiting_threads-t at the top. So it doesn't seem to be a seldom problem. At least not on my machine. [Why do all problems manifest on my system? Sigh.]
[12 Feb 2009 10:12]
Sergei Golubchik
http://lists.mysql.com/maria/442
[17 Feb 2009 11:46]
Bugs System
Pushed into 6.0.10-alpha (revid:serg@mysql.com-20090217113558-vpsqsyjule7nz0gk) (version source revid:guilhem@mysql.com-20090213163054-rsg204z5qzcekbfe) (merge vers: 6.0.10-alpha) (pib:6)
[4 Mar 2009 13:01]
Jonathan Perkin
This appears to still happen in 6.0.10. I noticed it in 6.0.9 but it seemed limited to solaris 64bit and linux-powerpc platforms, however now it has spread to all Solaris platforms. If I leave the test running for a few days it finally completes (at least on reasonably fast machines), but this is unworkable for release builds. truss gives no output, I have pstack output upon request. I will leave the tests running on a couple of machines for now in case you want to live debug.
[4 Mar 2009 16:14]
Jonathan Perkin
I left truss running for a while and managed to get some output: Base time stamp: 1236172669.3688 [ Wed Mar 4 13:17:49 Europe/Oslo 2009 ] 13885: 2945.2669 lwp_unpark(81, 1) = 0 13885: 2945.2676 lwp_park(0x00000000, 0) = 0 13885: 2945.2683 lwp_unpark(81, 1) = 0 13885: 2945.2686 lwp_park(0x00000000, 0) = 0 13885: 2945.2690 lwp_park(0x00000000, 0) = 0 13885: 2945.2689 lwp_unpark(73, 1) = 0 13885: 2945.2788 lwp_park(0x00000000, 0) = 0 13885: 2945.2791 lwp_park(0x00000000, 0) = 0 13885: 2945.2795 lwp_unpark(91, 1) = 0 13885: 2945.2795 lwp_park(0x00000000, 0) = 0 13885: 2945.2799 lwp_park(0x7EA7BC70, 91) Err#62 ETIME 13885: 2945.2803 lwp_park(0x7F67BC70, 91) Err#62 ETIME 13885: 2945.2806 lwp_unpark(81, 1) = 0 13885: 2945.2809 lwp_park(0x00000000, 0) = 0 13885: 4701.5802 lwp_park(0x00000000, 0) = 0 13885: 4701.5801 lwp_unpark(70, 1) = 0 13885: 4701.5811 lwp_unpark(70, 1) = 0 13885: 4701.5815 lwp_park(0x00000000, 0) = 0 13885: 4701.5820 lwp_park(0x00000000, 0) = 0 13885: 4701.5824 lwp_park(0x00000000, 0) = 0 13885: 4701.5828 lwp_park(0x00000000, 0) = 0 13885: 2945.2799 lwp_park(0x7EA7BC70, 91) Err#62 ETIME 13885: 2945.2803 lwp_park(0x7F67BC70, 91) Err#62 ETIME 13885: 2945.2806 lwp_unpark(81, 1) = 0 13885: 2945.2809 lwp_park(0x00000000, 0) = 0 13885: 4701.5802 lwp_park(0x00000000, 0) = 0 13885: 4701.5801 lwp_unpark(70, 1) = 0 13885: 4701.5811 lwp_unpark(70, 1) = 0 13885: 4701.5815 lwp_park(0x00000000, 0) = 0 13885: 4701.5820 lwp_park(0x00000000, 0) = 0 13885: 4701.5824 lwp_park(0x00000000, 0) = 0 13885: 4701.5828 lwp_park(0x00000000, 0) = 0 13885: 4701.5831 lwp_park(0x7F47BC70, 91) Err#62 ETIME 13885: 4701.5834 lwp_park(0x7D27BC70, 85) Err#62 ETIME 13885: 4701.5837 lwp_park(0x7D87BC70, 72) Err#62 ETIME 13885: 8428.9742 lwp_unpark(81, 1) = 0 13885: 8428.9748 lwp_park(0x00000000, 0) = 0 13885: 8428.9753 lwp_unpark(81, 1) = 0 13885: 8428.9757 lwp_park(0x00000000, 0) = 0 13885: 8428.9760 lwp_unpark(72, 1) = 0 13885: 8428.9763 lwp_park(0x00000000, 0) = 0 13885: 8428.9766 lwp_unpark(72, 1) = 0 13885: 8428.9769 lwp_park(0x00000000, 0) = 0 13885: 9697.6838 lwp_unpark(70, 1) = 0 13885: 9697.6848 lwp_park(0x00000000, 0) = 0 13885: 9697.8719 lwp_unpark(70, 1) = 0 13885: 9697.8723 lwp_park(0x00000000, 0) = 0 13885: 9697.8727 lwp_unpark(81, 1) = 0 13885: 9697.8730 lwp_park(0x00000000, 0) = 0 13885: 9697.8733 lwp_unpark(81, 1) = 0 13885: 9697.8736 lwp_park(0x00000000, 0) = 0 13885: 9697.8739 lwp_park(0x00000000, 0) = 0 13885: 9697.9116 lwp_park(0x7EA7BC70, 85) Err#62 ETIME [...]
[10 Mar 2009 8:04]
Alexander Nozdrin
See also Bug#43359.
[2 Apr 2009 7:03]
Guilhem Bichot
duplicate of BUG#43359