Bug #30706 SQL thread on slave is allowed to block client queries when slave load is high
Submitted: 29 Aug 2007 20:36 Modified: 6 Nov 2009 14:46
Reporter: Kevin Benton (Candidate Quality Contributor) Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.1.21 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[29 Aug 2007 20:36] Kevin Benton
Description:
My understanding of the just-released 5.1.21 notification, a master server can now further overrun the capability of a slave to keep up during high-load times regarding InnoDB.  It seems to me that administrators ought to be given the option to allow or disallow this behavior rather than hard-coding MySQL to allow the replication slave to further compromise query execution.  In general, a replication slave may be used when the concern isn't to have the most up-to-date information (slightly out-of-date is generally considered acceptable), rather the desire is to run queries as quickly as possible without encumbering the master.  This update takes that ability away from slave users.

The text reports:

   * The SQL thread on a slave now is always allowed to enter
     InnoDB even if this would exceed the limit imposed by the
     innodb_thread_concurrency system variable. In cases of high
     load on the slave server (when innodb_thread_concurrency is
     reached), this change helps the slave stay more up to date
     with the master; in the previous behavior, the SQL thread was
     competing for resources with all client threads active on the
     slave server. (Bug#25078: http://bugs.mysql.com/25078)

How to repeat:
See description

Suggested fix:
Either restore functionality to the way it was before Bug 25078, or add a configuration option to give the "old" behavior back to administrators.
[29 Aug 2007 20:56] Kevin Benton
Changing synopsis to better reflect intent.
[30 Aug 2007 0:49] Sveta Smirnova
Thank you for the reasonable feature request.
[30 Aug 2007 5:15] Vasil Dimov
This is InnoDB feature request, I will take it.
[14 Sep 2007 13:36] Heikki Tuuri
Let us add a new my.cnf option to 5.1.
[17 Sep 2007 19:11] Vasil Dimov
Implementation

Attachment: replication_delay.diff (application/octet-stream, text), 6.52 KiB.

[17 Sep 2007 19:15] Vasil Dimov
Kevin,

I have attached a patch that should make it in the next MySQL version for your early access to this bug report.

It adds a innodb_replication_delay=N (milliseconds) configuration parameter. If innodb_thread_concurrency is reached on the slave server then the replication thread is delayed that number of milliseconds and then allowed to enter InnoDB.
[17 Sep 2007 19:31] Kevin Benton
Vasil,

Thanks for the contribution, however, it appears that it does not address the issue I posed previously.  As I mentioned previously, I would hope that there would be a way to restore the old functionality, to allow a slave to be processed as yet another thread rather than forcing slaves to have priority over other threads.

If a server (slave) is thrashing already, what I don't want is to allow an update to cause more thrash than what is already there.  By allowing requests (queries and updates) to be processed sequentially, I would hope that this would allow a server to work out the problem causing it to thrash sooner and help prevent an administrator from misinterpreting what is causing the thrash.

From my perspective, if a slave is thrashing, I want to see it process existing queries before attempting to update the underlying data.  There are times when I would really like to see a slave update have lower priority processing than queries coming in from users.  After all - our users know that if they have to have up-to-date information, they need to run queries off the master.  Otherwise, slaves are there to give the quickest responses even if the data is slightly "out-of-date".  This is typically the case when large reports are being run.  I don't want to give out the ability to start and stop the slave process to regular users, but simultaneously, I do want to ensure that users get the results they need back as quickly as they can.
[6 Nov 2007 22:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/37230

ChangeSet@1.2604, 2007-11-06 15:42:58-07:00, tsmith@ramayana.hindu.god +31 -0
  Apply snapshot innodb-5.1-ss1989
  
  Fixes the following bugs:
  
  Bug #30706: SQL thread on slave is allowed to block client queries when slave load is high
    Add (innodb|innobase|srv)_replication_delay MySQL config parameter.
  
  Bug #30888: Innodb table + stored procedure + row deletion = server crash
    While adding code for the low level read of the AUTOINC value from the index,
    the case for MEDIUM ints which are 3 bytes was missed triggering an
    assertion.
  
  Bug #30907: Regression: "--innodb_autoinc_lock_mode=0" (off) not same as older releases
    We don't rely on *first_value to be 0 when checking whether
    get_auto_increment() has been invoked for the first time in a multi-row
    INSERT. We instead use trx_t::n_autoinc_rows. Initialize trx::n_autoinc_rows
    inside ha_innobase::start_stmt() too.
  
  Bug #31444: "InnoDB: Error: MySQL is freeing a thd" in innodb_mysql.test
    ha_innobase::external_lock(): Update prebuilt->mysql_has_locked and
    trx->n_mysql_tables_in_use only after row_lock_table_for_mysql() returns
    DB_SUCCESS.  A timeout on LOCK TABLES would lead to an inconsistent state,
    which would cause trx_free() to print a warning.
  
  Bug #31494: innodb + 5.1 + read committed crash, assertion
    Set an error code when a deadlock occurs in semi-consistent read.
[7 Nov 2007 0:59] Timothy Smith
Queued to 5.1-build
[21 Nov 2007 18:54] Bugs System
Pushed into 5.1.23-rc
[21 Nov 2007 18:54] Bugs System
Pushed into 6.0.4-alpha
[19 Dec 2007 17:50] Timothy Smith
It is a mistake that this bug was reported as fixed.  The proposed patch is not yet approved, and it was never included in a released version.  I apologize for any confusion caused by this.
[19 Dec 2007 18:44] Vasil Dimov
Tim, the change was rejected by MySQL because 5.1 is "frozen". The "Patch pending" state is probably not appropriate.
[19 Dec 2007 19:00] Kevin Benton
Actually, my concern is that this really doesn't address the concern I had which is to allow for blocking client queries when slave load is high.  I requested that the ability to specify the "old" functionality be made available by configuration, however, while this is closer, it doesn't implement the requested functionality.  If a slave is blocking because of high load, there are times when I want it to keep blocking for the same reason.

I have to believe that when the server blocks, it's backlogged for a reason and I'd rather let updates wait like everyone else in certain cases because my concern is getting queries executed first before performing updates.  Typically, (in this situation) updates are non-critical while the ability to see trend data is critical.
[29 May 2008 9:14] Vasil Dimov
Reopening since the proposed patch as rejected by MySQL and furthermore it did not address the original reporter's concern. New patch should be developed.
[5 Oct 2009 13:45] Vasil Dimov
I will now deal with this stalled bug.

Kevin are you still there?
[6 Oct 2009 14:46] Vasil Dimov
Kevin,

The innodb_replication_delay has made it to the InnoDB Plugin, while being rejected for MySQL 5.1 at the time it was implemented.

While this is not what you are asking for and is not the old functionality, it is close to it. Specifying a short replication delay should have roughly the same effect as the old functionality.

Can you test the parameter innodb_replication_delay in the InnoDB Plugin and see if it works well enough for you?

It is easy to restore the old behavior conditionally but it will introduce yet another parameter, while we are trying to reduce their count generally.

Thank you!
[7 Nov 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".