MySQL Bugs: #73633: extra mysql_mutex_lock in pop_jobs

Bug #73633	extra mysql_mutex_lock in pop_jobs_item
Submitted:	19 Aug 2014 7:18	Modified:	7 Aug 2020 8:51
Reporter:	qinglin zhang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6, 5.7	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	extra mysql_mutex_lock in pop_jobs_item

Description:
vim -t pop_jobs_item
when slave sql thread wait for new jobitem, it run the following logic:

    head_queue(&worker->jobs, job_item);
    if (job_item->data == NULL)
    {    
      worker->wq_empty_waits++;
      thd->ENTER_COND(&worker->jobs_cond, &worker->jobs_lock,
                               &stage_slave_waiting_event_from_coordinator,
                               &old_stage);
      mysql_cond_wait(&worker->jobs_cond, &worker->jobs_lock);
      thd->EXIT_COND(&old_stage);
      mysql_mutex_lock(&worker->jobs_lock);
    } 
when slave worker get signal of worker->jobs_cond, it will lock worker->jobs_lock again, that is , mysql_mutex_lock(&worker->jobs_lock) is no more needed.

How to repeat:
code reading.

Suggested fix:
remove the extra lock line:
mysql_mutex_lock(&worker->jobs_lock);

Hi, thanks for the report. I tend to agree with your assessment of the code so verifying this. Let's see what the replication team will say.

all best
Bogdan

Posted by developer:
 
In 5.6, thd->EXIT_COND will unlock the mutex. So the mutex is not held at the time of the lock.

The logic is more explicit in 5.7+8.0, where EXIT_COND was changed so it does not unlock the mutex. To compensate for that change, an explicit unlock was added in pop_jobs_item.

So everything works as expected.

However, this code (and several other places in rpl_rli_pdb.cc) follows the wrong pattern. The correct way to wait on a condition variable in MySQL is:

  PSI_stage_info old_stage;
  mysql_mutex_lock(&mutex);
  thd->ENTER_COND(&cond, &lock, &stage, &old_stage);
  while (<predicate> && !thd->killed)
    mysql_cond_wait(&cond, &mutex);
  mysql_mutex_unlock(&mutex);
  thd->EXIT_COND(&old_stage);

The code in rpl_rli_pdb.cc has two disadvantages:
- It does an extra unlock/lock inside the loop. On the normal execution path where it only makes one iteration of the loop, the result is one extra (very brief) lock/unlock.
- It briefly switches to the old thread stage in case of spurious wakeups. This might be observable, and potentially misleading, for a user that checks the thread stage in performance_schema.threads or processlist.

So the bug report is valid and the correct fix in 5.7+8.0 would be to move ENTER_COND before the loop, move EXIT_COND after the unlock that is done after the loop, and remove the unlock/lock pair from the loop.