Bug #32917 detect orphan temp-pool files, and handle gracefully
Submitted: 2 Dec 2007 22:41 Modified: 24 Nov 2014 16:50
Reporter: Sean Pringle Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.0.50 OS:Any
Assigned to: CPU Architecture:Any

[2 Dec 2007 22:41] Sean Pringle
Description:
Temp table names coming from the temp-pool have the shorter three part file name:

#sql_277e_0
%s_%lx_%i
<tmp_file_prefix>_<pid>_<temp_pool_slot>

Since these names are re-used, it is easier for clashes to occur if one of these files is orphaned for whatever reason (another bug, presumably), resulting in:

Error 'Can't create/write to file '/tmp/#sql_277e_0.MYD' (Errcode: 17)'

In the case of a replication slave constantly faulting, this can be troublesome.

Obviously any bug causing temp-pool orphans can be worked around by disabling the temp-pool, resulting in longer unique file names being used:

#sql277e_42_17
%s%lx_%lx_%x
<tmp_file_prefix><pid>_<thread_id>_<thread_tmp_table_number++>

...though this defeats the original purpose of the temp-pool and can result in a whole lot more orphans accumulating.

How to repeat:
n/a

Suggested fix:
This feature request is simply to add a self healing option to the temp-pool to:

a) remove the orphan, or
b) archive it elsewhere for investigation, or
c) simply remove that name from the temp-pool

... all on the fly.  The name clash would presumably still be logged as a warning for investigation, regardless.
[3 Dec 2007 5:40] Valeriy Kravchuk
Thank you for a reasonable feature request.
[27 Aug 2010 15:09] Sveta Smirnova
Bug #56288 was marked as duplicate of this one.
[24 Nov 2014 16:50] Paul DuBois
Noted in 5.5.42, 5.6.23, 5.7.6 changelogs.

A file created for an internal temporary table could cause problems
if the file was orphaned for some reason and the file name was reused
for later queries.
[11 Feb 2015 9:59] Laurynas Biveinis
5.5 $ git show -s 5a587b6d
commit 5a587b6d2897e786b515d05a09b37ef81695dab7
Author: Nisha Gopalakrishnan <nisha.gopalakrishnan@oracle.com>
Date:   Mon Nov 24 20:24:18 2014 +0530

    BUG#11747548: DETECT ORPHAN TEMP-POOL FILES, AND HANDLE GRACEFULLY
    
    Analysis:
    --------
    Certain queries using intrinsic temporary tables may fail due to
    name clashes in the file name for the temporary table when the
    'temp-pool' enabled.
    
    'temp-pool' tries to reduce the number of different filenames used for
    temp tables by allocating them from small pool in order to avoid
    problems in the Linux kernel by using a three part filename:
    <tmp_file_prefix>_<pid>_<temp_pool_slot_num>.
    The bit corresponding to the temp_pool_slot_num is set in the bit
    map maintained for the temp-pool when it used for the file name.
    It is cleared after the temp table is deleted for re-use.
    
    The 'create_tmp_table()' function call under error condition
    tries to clear the same bit twice by calling 'free_tmp_table()'
    and 'bitmap_lock_clear_bit()'. 'free_tmp_table()' does a delete
    of the table/file and clears the bit by calling the same function
    'bitmap_lock_clear_bit()'.
    
    The issue reported can be triggered under the timing window mentioned
    below for an error condition while creating the temp table:
    a) THD1: Due to an error clears the temp pool slot number used by it
       by calling 'free_tmp_table'.
    b) THD2: In the process of creating the temp table by using an unused
       slot number in the bit map.
    c) THD1: Clears the slot number used THD2 by calling
      'bitmap_lock_clear_bit()' after completing the call 'free_tmp_table'.
    d) THD3: Uses the slot number used the THD2 since it is freed by THD1.
       When it tries to create the temp file using that slot number,
       an error is reported since it is currently in use by THD2.
       [The error: Error 'Can't create/write to file
       '/tmp/#sql_277e_0.MYD' (Errcode: 17)']
    
    Another issue which may occur in 5.6 and trunk is that:
    When the open temporary table fails after its creation(due to ulimit
    or OOM error), the file is not deleted. Thus further attempts to use
    the same slot number in the 'temp-pool' results in failure.
    
    Fix:
    ---
    a) Under the error condition calling the 'bitmap_lock_clear_bit()'
       function to clear the bit is unnecessary since 'free_tmp_table()'
       deletes the table/file and clears the bit. Hence removed the
       redundant call 'bitmap_lock_clear_bit()' in 'create_tmp_table()'
       This prevents the timing window under which the issue reported
       can be seen.
    
    b) If open of the temporary table fails, then the file is deleted
       thus allowing the temp-pool slot number to be utilized for the
       subsequent temporary table creation.
    
    c) Also if the attempt to create temp table fails since it already
       exists, the temp-pool slot for it is marked as used, to avoid
       the problem from re-appearing.
[12 Feb 2015 13:05] Laurynas Biveinis
5.6 $ git show -s 432078d
commit 432078dc748a79a631c8e54e83ffe548060fc859
Author: Nisha Gopalakrishnan <nisha.gopalakrishnan@oracle.com>
Date:   Tue Nov 25 15:55:34 2014 +0530

    BUG11747548: DETECT ORPHAN TEMP-POOL FILES, AND HANDLE GRACEFULLY
    
    Follow up patch to fix the test case failure.