Description:
srv_conc_enter_innodb_with_atomics() has back off heuristics to control number of threads running inside InnoDB. Relevant code:
240
241 if (srv_adaptive_max_sleep_delay > 0) {
242 if (srv_thread_sleep_delay > 20
243 && n_sleeps == 1) {
244
245 --srv_thread_sleep_delay;
246 }
247
248 if (srv_conc.n_waiting == 0) {
249 srv_thread_sleep_delay >>= 1;
250 }
251 }
...
...
...
281 sleep_in_us = srv_thread_sleep_delay;
282
283 /* Guard against overflow when adaptive sleep delay is on. */
284
285 if (srv_adaptive_max_sleep_delay > 0
286 && sleep_in_us > srv_adaptive_max_sleep_delay) {
287
288 sleep_in_us = srv_adaptive_max_sleep_delay;
289 srv_thread_sleep_delay = static_cast<ulong>(sleep_in_us);
290 }
291
292 os_thread_sleep(sleep_in_us);
293
294 trx->op_info = "";
295
296 ++n_sleeps;
297
298 if (srv_adaptive_max_sleep_delay > 0 && n_sleeps > 1) {
299 ++srv_thread_sleep_delay;
300 }
* The amount by which a thread sleeps is controlled by srv_thread_sleep_delay. We start with 10ms and move between 20us - 150ms (or configured value of srv_adaptive_max_sleep_delay). It is incremented if a thread has to sleep more than once (line 299). It is decremented if the thread has to sleep once (line 245). However, it is not decremented if the thread doesn't have to sleep at all. I think we should consider a 'no sleep' as a sign of well oiled system and decrement the count.
* srv_thread_sleep_delay is reduced to half (line 249) if there are no waiting threads. Typically, in a loaded system we'll have a continuous inflow of requests meaning there always will be some waiting thread i.e.: we're unlikely to hit line 249.
If we consider the above two conditions and assume a loaded server forcing threads to sleep and always having some threads in waiting state we'll end up in a spiral where srv_thread_sleep_delay will keep getting incremented and will never get decremented (or even if decrement happens it will be only for the few cases where n_sleeps == 1). Eventually our sleep time will climb to 150ms. Typically our stay inside InnoDB is much less than 150ms. This can potentially cause serious performance regression when compared with 5.5.
How to repeat:
I've tried simple sysbench in following way:
* 2 instances of sysbench. One running 100 threads doing random inserts and other running 200 threads doing random point select.
* Start with an empty table
* Set innodb_thread_concurrency = 8
* Compare results with that of 5.5
Suggested fix:
I think few things need to be considered:
* How much we sleep should be a random selection from a range
* Currently sleep interval is only function of number of sleeps. I think it should be a function of:
-- number of waiting threads
-- srv_thread_concurrency
-- some heuristic value on how much time we spend inside InnoDB during one dive
So, for example, if we have srv_thread_concurrency = 8 and we have 128 waiting threads and we have somehow calculated that threads are spending 500us inside InnoDB on avg. we can then reason that in worst case the thread we are putting to sleep has to wait (128/8) * 500 microseconds. We can then randomize sleep time between zero and this value to get some staggered middle value.