MySQL Bugs: #36256: error 1032 Can't find record in [tablename]

Bug #36256	error 1032 Can't find record in [tablename]
Submitted:	22 Apr 2008 13:18	Modified:	10 May 2010 8:10
Reporter:	Horst Pralow	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	5.0.67-community	OS:	Linux (SLES 10, 64-bit, 8 CPU)
Assigned to:		CPU Architecture:	Any

Description:
I repeatedly get "error 1032 Can't find record in ..." with an INSERT ... SELECT type of statement where I would expect a set of rows inserted into a temporary table.  

Details: 
My application is a web-based multiuser system written in PHP and running against a MySQL DB. The Insert-select-statement producing the error is used to construct a user specific To-Do-List by applying user-selectable criteria to a join of 3 tables called 'claim', 'thread', 'timer' and inserting the matches into a temp table. The source tables are InnoDB, the temporary table 'todolist' was of type HEAP but during testing I changed it to be of type InnoDB.
This change dramatically reduced the frequency of the error occuring.

The sequence of events in the application is something like 
1. CREATE TEMPORARY TABLE todolist .... 

2. Perform the statement 
INSERT INTO todolist /* constant column list */ SELECT /* column list */ FROM 
(thread, claim) 
LEFT JOIN timer ON timer.fk_thread_id = thread.parent_thread_id 
WHERE /* complex where clause */ 
AND /* variable additional where_conditions ORed together following a pattern like 
OR (claim.fk_initiator_id=1448 AND thread.fk_user_id=0 AND thread.parent_thread_id=0) 
*/ 
AND /* additional where conditions */ 

3. Do further processing of the temporary table ...

The problem is: sometimes statements like the one pointed out above under 2. fails with 
error 1032: Can't find record in 'thread' 

This happens repeatedly (although not frequently, about 100 errors within 10000 statements of the same pattern).
The complexety of the Where-clause doesn't seem to have influence on the error. Below I'll post 2 variants of the statement triggering error 1032.
Adding a LOCK IN SHARE MODE to the select part of the statement also doesn't fix the problem. 

Interestingly I never have been able to trigger the error when only performing the SELECT part of an INSERT ... SELECT copied out of our error logs. 

Modificatoins tries without succes were:
- adding LOCK IN SHARE MODE
- setting transaction isolation level to read committed
- bracketing the statemant with START TRANSACTION ... COMMIT

I have never been able to reproduce the error, but in the production system it gets logged frequently. The frequence of the error occuring has increased the last few months, but this probably is due to a grown user base.

Another experience from monitoring my log files is that the count of records matched by the select part has influence on the likelyness the error being triggered. Since in the application this mostly depends the user's permissions in the application, users with very limited permissions will hardly see the error at all whereas power-users with a broad range of permissions do suffer frequently.

BTW. I'm running MySQL 5.0.51a-community-log on 64-bit Linux (8 CPU). However the problem has been around on any older 5.0 Version of MySQL. CHECK TABLE on all source tables did yield OK status.

The tables currently hold approximately
thread: 3660000 records (app. 2.1 GB)
claim:   770000 records (app. 565 MB)
timer: 	  10500 records	(app. 4 MB)

I'll supply the structure of the tables and the statemant causing the error in a separate file since my post is getting too large.

How to repeat:
No way found to reproduce the erreo outside of the production system

INSERT-SELECT Statemant and Table structure

Attachment: structure.txt (text/plain), 10.81 KiB.

Thank you for the report.

Please provide output of `df` command. Would be better if you could check free space in time when error occurs.

In Reply to [13 May 14:25] Sveta Smirnova

>Please provide output of `df` command. Would be better if you could
>check free space in time when error occurs.

Output of df follows:
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             15735128   7773060   7962068  50% /
tmpfs                  8177604        12   8177592   1% /dev/shm
/dev/sda3            1133042276 867277636 208209368  81% /sbs

The database files are on /dev/sda3, Temp-Files on /dev/sda2.

The values given above were take immediately after an error 1032 was reported into the log-file.

Horst,

many thanks for writing a bug report.

For me this sounds more like a system overload.

Please monitor I/O. Also you can try to use I/O scheduler: deadline

Also check timeout variables and maybe set them to higher values.

If you didn't increase the variable table_cash then do it now.

Please, let us know if something of the above is helping you.

To analyse if there is a bug we need to know how to reproduce it.

Looking to your description above my suggestion is that this is not a bug only a tuning issue.

Some remarks on the recommendations from Susanne Ebrecht:

1. I doubt this is primarirly a system overload problem. If it were, I would expect to see erroneous database behaviour with any kind of SQL statement, but that's not the case.

2. I have increased table_cache from 5000 to 10000 with no noticeable change.
I had increased it from default to a fairly large value months ago, but still had a huge
value in 'Opened_tables'. Since the manual suggest this to be an indicator of table_cache beeing too small, I
increased i further. Tests however have show that the large value of Opened_tables is caused by the usage of
temporary tables in my application: Every 'create temporary table ... ' increments 'Opened_tables' by 2 and thus
it is not indicative for table_cache size being too small.

3. I haven't tried dealine I/O scheduler yet (no chance to reboot the production server so far).

4. Could you name which timeout variables could possibly have impact on our problem? There are no timeouts
reported in my logs. The value in 'Innodb_row_lock_time_max' is far smaller than innodb_lock_wait_timout, so I don't see a problem here.

5. If we had a mere tuning problem I'd expect slow but reliable operation or timeouts. That is what I would call a tuning problem. And if it were such, then at least the error message would be totally misleading!

I will try to export the 3 tables involved and then create a test environment, although I doubt it will be possible to reproduce the problem without the concurrency imposed by multiple users in production environment. For me the problem currently look more like being related with read consistency the iwth I/O performance.

Horst - do you use a transaction isolation level other than repeatable read ?

In replay to [10 Feb 14:59] Shane Bester

>Horst - do you use a transaction isolation level other than repeatable read ?

No.

In the meantime we have upgraded hardware and OS as well as MySQL . The problem still occurs. 
As a workaround I now repeat the select immediately (at most) a second time if the first try yields error 1032 hoping everything needed may now be in main memory. Most times the second try will succeed but a few times a day it still fails.

I had never been able to repeat this exact problem.
However, it is probably some locking bug, similar to the likes of bug #41756

Another cause might have been when the SELECT part of INSERT .. SELECT
had a lock wait timeout, the error wasn't properly handled further up the chain.

I'd be curious is this still happens with 5.1.46 version?

>I'd be curious is this still happens with 5.1.46 version?
I can't tell since we have changed our application's implementation so that the SQL-statement causing the errors in question no longer exits.