MySQL Bugs: #39502: myisam table crash on update or replace into periodically

Bug #39502	myisam table crash on update or replace into periodically
Submitted:	17 Sep 2008 17:21	Modified:	1 Nov 2008 4:43
Reporter:	Steve Drew	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S2 (Serious)
Version:	4.1.22+4.1.20	OS:	Linux (Centos 4.7)
Assigned to:		CPU Architecture:	Any

Description:
We have a setup with 1 master and 3 slaves.

Currently every 3 days or so we are getting problems with the replication dying due to a 1034..

Last_Error: Error 'Incorrect key file for table 'probtable'; try to repair it' on query. Default database: 'db'. Query: 'UPDATE probtable SET field1 = 'golf ryder Half Length Mid Top golfers players cheerful smile smiles defending champs champions display cup trophy cup MANGNB hli' WHERE field2 = '6354440' AND field3 = 2'

With 4.1.20 all the 3 slaves will error at the same time. With 4.1.22 2 of the slaves had the same type of error, and 1 carried on fine.

All 4 of the servers were upgraded to mysql4.1.22 2 days ago, as I initially thought it could be related to bug#22384 <http://bugs.mysql.com/bug.php?id=22384>

When the slave is checked we get the following error:

mysql> check table probtable extended;
+--------------+-------+----------+--------------------------------------+
| Table        | Op    | Msg_type | Msg_text                             |
+--------------+-------+----------+--------------------------------------+
| db.probtable | check | warning  | Table is marked as crashed           |
| db.probtable | check | error    | Key in wrong position at page 367616 |
| db.probtable | check | error    | Corrupt                              |
+--------------+-------+----------+--------------------------------------+
3 rows in set (0.01 sec)

If a repair table is run, normal functionality is resumed for a period (anything from 2days to 2 months). After the repair table is run, the SQL statements that were run will apply fine to the master (and then be successfully replicated to the slaves)

Load on all the machines is around 0.5, and all have plenty of disk space (150Gb+) free.

Also looking at the changelog for 4.1.23 and 4.1.24, I cant see any changes that apply (I can try if there is a change I'm not reading correctly)

There are SELECT, UPDATE, INSERT and REPLACE INTO statements running on this table.

The index file is at 4Gb, with around 3gb of data. There are 18 Fulltext index's and 12 btree indexes.

mysql> show table status like 'probtable'\G;
*************************** 1. row ***************************
           Name: probtable
         Engine: MyISAM
        Version: 9
     Row_format: Dynamic
           Rows: 5727458
 Avg_row_length: 571
    Data_length: 3272638336
Max_data_length: 4294967295
   Index_length: 4295680000
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2008-02-04 16:30:59
    Update_time: 2008-09-17 17:33:53
     Check_time: 2008-09-17 10:02:02
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: pack_keys=1
        Comment: 
1 row in set (0.00 sec)

[root@dc-dbr2 mysql]# myisamchk -dv db/probtable

MyISAM file:         db/probtable
Record format:       Packed
Character set:       latin1_swedish_ci (8)
File-version:        1
Creation time:       2008-02-04 16:30:59
Recover time:        2008-09-17 10:02:02
Status:              changed
Data records:              5727548  Deleted blocks:                 0
Datafile parts:            5728665  Deleted data:                   0
Datafile pointer (bytes):        4  Keyfile pointer (bytes):        4
Datafile length:        3272684764  Keyfile length:        4295779328
Max datafile length:    4294967294  Max keyfile length: 4398046510079
Recordlength:                 1519

Apologies if I've missed any info..

Many Thanks,

steve

How to repeat:
Struggling to replicate it in a test environment as the SQL statements that fail initially will run fine after a repair tale.

As a note, when the slaves have crashed, the position of the wrong key is different across all the slaves.

Thank you for the report.

Have you tried to find any pattern after which error occurs? Like particular DELETE query in bug #22384?

REPLACE INTO are the main culprits (95%), with the very occasional UPDATE causing problems. 

I encountered the problem 11 hours after repairing the master and slaves servers last night. 

When the issue occurs, its appearing that it affects a small range (around 80 items), based around the object_name (first field).. And it records that have been inserted today where the problems occur.

MyISAM file:         db/probtable
Record format:       Packed
Character set:       latin1_swedish_ci (8)
File-version:        1
Creation time:       2008-02-04 16:30:59
Recover time:        2008-09-17 10:02:02
Status:              changed
Data records:              5727548  Deleted blocks:                 0
Datafile parts:            5728665  Deleted data:                   0
Datafile pointer (bytes):        4  Keyfile pointer (bytes):        4
Datafile length:        3272684764  Keyfile length:        4295779328
Max datafile length:    4294967294  Max keyfile length: 4398046510079
Recordlength:                 1519

I think I've found the cause of this.

If the keyfile length is > Max datafile length and the thread_concurrency variable is set, then corruption occurs extremely quickly.

I'll check to see if this is a know bug already reported and update accordingly.

This does not appear to be an issue if the slave is running 5.0.22

probably a duplicate of bug #31305

Please, upgrade to 4.1.24 and check if the problem is repeatable in this newer version.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".