MySQL Bugs: #81808: Adding a column affects performance on unrelated database

Bug #81808	Adding a column affects performance on unrelated database
Submitted:	10 Jun 2016 19:59	Modified:	24 Jun 2016 19:38
Reporter:	teo teo	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S5 (Performance)
Version:	5.6.20	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I launched an ALTER query to add a TINYINT column to an innodb table that has about 1M rows on a database A.

On the same server, there is another unrelated database B used by a live website with pretty high load.

This is on a 32-core server.

Now I knew that all writes to that table would be blocked for the whole duration of the ALTER operation (that is ridiculous by the way, but well documented). However, I wouldn't expect a noticeable impact on the other database.

Surprisingly, operations on the live website's database B, the one not touched by the ALTER query, became more than TEN TIMES slower while the alter query on database A was being performed.

I do understand that it's the same server, and that adding a column involves copying the data to a temporary table which is a pretty heavy operation which consumes CPU and memory, so the overall performance of the server might be affected a little bit. However this is a 32-core computer, and I am altering one database that nobody is using while another unrelated database is under high load. I would expect MySQL to be able to manage priorities when handling resources in such a way that the impact on the other database should be almost unnoticeable. Certainly not a 10x increment in avarage response time in every kind of operations on the unrelated database.

How to repeat:
ALTER TABLE mytable ADD newcolumn TINYINT

Was the operation really CPU bound? This looks more like a IO bound situation to me ...

Yep you're right, definitely.

Doesn't change much. That is, it changes a lot but not the fact that it's unacceptable and should be totally avoidable that a heavy operation on a database (especially one that has zero workload except for that one heavy operation) affects unrelated databases (especially one that is under high work load).

Is it because of the huge amount of writing to disk? If so, shouldn't it be possible to optimize that so that most of the work is done in memory and flushing to disk occurs in more sporadic bursts?

InnoDB storage engine adheres to ACID standard, hence causing immediate writing to disk.

Still, in 5.6 and 5.7 it is possible fine tune the system so that I/O load is reduced. This is all thoroughly explained in our Reference Manual.

Not a bug.

If you have to "fine tune the system" just to get decent performance under obvious circumnstances, it IS a bug (or a poor design which is just a bug of a greater magnitude).

You request an operation that requires a massive rewrite of table rows.

That causes a lot of IO, obviously.

"If so, shouldn't it be possible to optimize that so that most of the work is done in memory and flushing to disk occurs in more sporadic bursts?"

Still the modified memory pages need to be written back at some point, transaction logs have to be written about the changes, too.

Still you expect that this doesn't compete with other IO (for the other database/schema) that's on the same file system? (Assuming that it is on the same FS as that's the default and you didn't mention otherwise).

From the perspective of the InnoDB buffer pool handling that takes care
of writing back dirty pages, and form the perspective of the redo log, it doesn't matter that these are two different databases, they both share the same buffer pool in memory and the same IO path.

You overload your system, you suffer. That's what's going to happen if your system is IO bound already as the working set doesn't fully fit into memory.

Yes, in theory that would be solvable by adding per-schema IO quotas, but that would add *a lot* of extra complexity.

At the same time other solutions exist, like e.g. using a master-slave replication pair, perform the schema change on the slave, then switch over to make it the master, now perform the schema change on the former master that is now a slave. To mention just one option.

Or you could put your two databases files on two different file systems that use different IO channels, so that they don't compete for IO.

Your main mistake is to think that the two databases are unrelated while they physically are.

Agreeing with Sinisa on "not a bug" (which is scary as we don't agree that often ;)