MySQL Bugs: #37976: Replication test failing: table handler changes if disk space is low

Bug #37976	Replication test failing: table handler changes if disk space is low
Submitted:	8 Jul 2008 19:21	Modified:	10 Feb 2011 11:23
Reporter:	Joerg Bruehe	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	Tools: MTR / mysql-test-run	Severity:	S3 (Non-critical)
Version:	5.1.26-rc	OS:	Any
Assigned to:	Bjørn Munch	CPU Architecture:	Any

Description:
This happened in the release build of 5.1.26-rc
on our FreeBSD (64 bit) host,
but I do not think this to be platform-dependent:

During one build and its test run, disk space got low;
this was visible from several test failures reporting errno 28 (ENOSPC).

In some replication tests, this caused the slave to silently use the MyISAM table handler even though InnoDB was required.
It could be detected from later test output ("show create table" on the slave),
but was not visible from the "create table" and its log.

One affected test was "rpl_read_only".

I repeated that build + test after cleaning up, and all was ok.

Both the old ("ENOSPC") and the new log are available for examination.

How to repeat:
Run replication tests (in a loop)
while slowly filling the disk in parallel,
this should lead to a similar situation.

Suggested fix:
1) Table handlers should not silently be changed because of the disk usage.
   Rather let the system stop with an error
   than use a handler different from the one specified.

2) Test "rpl_read_only" uses an option file which specifies "--loose-innodb",
   AIUI this means the slave is allowed to use MyISAM if InnoDB is not available.
   As this makes the test fail (it requires InnoDB semantics),
   this option should be changed.

It seems my description was not clear enough, so I elaborate:

- We run replication tests within one machine,
  using the same binary for both the master and the slave server.

- For the slave, this option is set by the test:
      --loose-innodb

- On the master, the test creates an InnoDB table:
      create table t1(a int) engine=InnoDB;

On one machine, disk space got low during the test suite run.
It seems this caused the slave to create its table as a MyISAM table,
and as a consequence the test failed (it relies on transaction semantics).

Re-running the suite after cleanup (more space), this failure did not occur.

My conclusion is that the slave used MyISAM even though InnoDB was demanded just because of disk space scarcity.

1) IMO, it is very risky that the slave may use a different table handler because resources are scarce (runtime decision).
It would be safer if the slave issued warnings (to the system console ? system log ?) and then ceased working.
Rather have a visible failure than silently changed semantics.

2) Specifically for this test, InnoDB on the slave is mandatory.
Assuming "--loose-innodb" allowed it to do this change, the option file setting this should be removed.

When executing a CREATE TABLE statement on the slave, no special semantics is used compared to a normal execution of CREATE TABLE. A substitution to the default storage engine will occur when the specified engine is not available, which is normal behavior.  Since InnoDB was not loaded because of low memory, this is expected behavior and in reality the test case should not have been executed at all, requiring InnoDB to operate correctly.

In other words, this is a test problem, not a replication problem.

Changing title and category to reflect the situation.

Re-categorizing bug as test-system bug for the time being. The fact that the test executes despite having "source include/have_innodb.inc" first in the file might indicate a problem with MTR, and hence might require fixing there.

MTR runs the innodb test based on the fact that the server supports innodb. It will not and should not avoid running the test if innodb for some reason becomes unavailable. MTR cannot guess why this happens. Instead, the test should run and fail.