Bug #36256 error 1032 Can't find record in [tablename]
Submitted: 22 Apr 2008 15:18 Modified: 10 Feb 17:12
Reporter: Horst Pralow
Status: Analyzing
Category:Server Severity:S2 (Serious)
Version:5.0.67-community OS:Linux (SLES 10, 64-bit, 8 CPU)
Assigned to: Shane Bester Target Version:

[22 Apr 2008 15:18] Horst Pralow
Description:
I repeatedly get "error 1032 Can't find record in ..." with an INSERT ... SELECT type of
statement where I would expect a set of rows inserted into a temporary table.  

Details: 
My application is a web-based multiuser system written in PHP and running against a MySQL
DB. The Insert-select-statement producing the error is used to construct a user specific
To-Do-List by applying user-selectable criteria to a join of 3 tables called 'claim',
'thread', 'timer' and inserting the matches into a temp table. The source tables are
InnoDB, the temporary table 'todolist' was of type HEAP but during testing I changed it
to be of type InnoDB.
This change dramatically reduced the frequency of the error occuring.

The sequence of events in the application is something like 
1. CREATE TEMPORARY TABLE todolist .... 

2. Perform the statement 
INSERT INTO todolist /* constant column list */ SELECT /* column list */ FROM 
(thread, claim) 
LEFT JOIN timer ON timer.fk_thread_id = thread.parent_thread_id 
WHERE /* complex where clause */ 
AND /* variable additional where_conditions ORed together following a pattern like 
OR (claim.fk_initiator_id=1448 AND thread.fk_user_id=0 AND thread.parent_thread_id=0) 
*/ 
AND /* additional where conditions */ 

3. Do further processing of the temporary table ...

The problem is: sometimes statements like the one pointed out above under 2. fails with 
error 1032: Can't find record in 'thread' 

This happens repeatedly (although not frequently, about 100 errors within 10000
statements of the same pattern).
The complexety of the Where-clause doesn't seem to have influence on the error. Below
I'll post 2 variants of the statement triggering error 1032.
Adding a LOCK IN SHARE MODE to the select part of the statement also doesn't fix the
problem. 

Interestingly I never have been able to trigger the error when only performing the SELECT
part of an INSERT ... SELECT copied out of our error logs. 

Modificatoins tries without succes were:
- adding LOCK IN SHARE MODE
- setting transaction isolation level to read committed
- bracketing the statemant with START TRANSACTION ... COMMIT

I have never been able to reproduce the error, but in the production system it gets
logged frequently. The frequence of the error occuring has increased the last few months,
but this probably is due to a grown user base.

Another experience from monitoring my log files is that the count of records matched by
the select part has influence on the likelyness the error being triggered. Since in the
application this mostly depends the user's permissions in the application, users with
very limited permissions will hardly see the error at all whereas power-users with a
broad range of permissions do suffer frequently.

BTW. I'm running MySQL 5.0.51a-community-log on 64-bit Linux (8 CPU). However the problem
has been around on any older 5.0 Version of MySQL. CHECK TABLE on all source tables did
yield OK status.

The tables currently hold approximately
thread: 3660000 records (app. 2.1 GB)
claim:   770000 records (app. 565 MB)
timer: 	  10500 records	(app. 4 MB)

I'll supply the structure of the tables and the statemant causing the error in a separate
file since my post is getting too large.

How to repeat:
No way found to reproduce the erreo outside of the production system
[22 Apr 2008 15:20] Horst Pralow
INSERT-SELECT Statemant and Table structure

Attachment: structure.txt (text/plain), 10.81 KiB.

[13 May 2008 14:25] Sveta Smirnova
Thank you for the report.

Please provide output of `df` command. Would be better if you could check free space in
time when error occurs.
[13 May 2008 14:40] Horst Pralow
In Reply to [13 May 14:25] Sveta Smirnova

>Please provide output of `df` command. Would be better if you could
>check free space in time when error occurs.

Output of df follows:
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             15735128   7773060   7962068  50% /
tmpfs                  8177604        12   8177592   1% /dev/shm
/dev/sda3            1133042276 867277636 208209368  81% /sbs

The database files are on /dev/sda3, Temp-Files on /dev/sda2.

The values given above were take immediately after an error 1032 was reported into the
log-file.
[15 May 2008 11:48] Susanne Ebrecht
Horst,

many thanks for writing a bug report.

For me this sounds more like a system overload.

Please monitor I/O. Also you can try to use I/O scheduler: deadline

Also check timeout variables and maybe set them to higher values.

If you didn't increase the variable table_cash then do it now.

Please, let us know if something of the above is helping you.

To analyse if there is a bug we need to know how to reproduce it.

Looking to your description above my suggestion is that this is not a bug only a tuning
issue.
[21 May 2008 13:26] Horst Pralow
Some remarks on the recommendations from Susanne Ebrecht:

1. I doubt this is primarirly a system overload problem. If it were, I would expect to
see erroneous database behaviour with any kind of SQL statement, but that's not the
case.

2. I have increased table_cache from 5000 to 10000 with no noticeable change. 
I had increased it from default to a fairly large value months ago, but still had a huge
value in 'Opened_tables'. Since the manual suggest this to be an indicator of table_cache
beeing too small, I 
increased i further. Tests however have show that the large value of Opened_tables is
caused by the usage of
temporary tables in my application: Every 'create temporary table ... ' increments
'Opened_tables' by 2 and thus 
it is not indicative for table_cache size being too small.

3. I haven't tried dealine I/O scheduler yet (no chance to reboot the production server
so far).

4. Could you name which timeout variables could possibly have impact on our problem?
There are no timeouts
reported in my logs. The value in 'Innodb_row_lock_time_max' is far smaller than
innodb_lock_wait_timout, so I don't see a problem here.

5. If we had a mere tuning problem I'd expect slow but reliable operation or timeouts.
That is what I would call a tuning problem. And if it were such, then at least the error
message would be totally misleading!

I will try to export the 3 tables involved and then create a test environment, although I
doubt it will be possible to reproduce the problem without the concurrency imposed by
multiple users in production environment. For me the problem currently look more like
being related with read consistency the iwth I/O performance.
[10 Feb 14:59] Shane Bester
Horst - do you use a transaction isolation level other than repeatable read ?
[10 Feb 15:12] Horst Pralow
In replay to [10 Feb 14:59] Shane Bester

>Horst - do you use a transaction isolation level other than repeatable read ?

No.

In the meantime we have upgraded hardware and OS as well as MySQL . The problem still
occurs. 
As a workaround I now repeat the select immediately (at most) a second time if the first
try yields error 1032 hoping everything needed may now be in main memory. Most times the
second try will succeed but a few times a day it still fails.