Bug #33238 mysqlbinlog cannot read binlog with >2000 LOAD DATA INFILE, and is slow
Submitted: 14 Dec 2007 10:23 Modified: 2 Jul 2009 10:30
Reporter: Sven Sandberg Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: replication mysqlbinlog load slow limit

[14 Dec 2007 10:23] Sven Sandberg
Description:
When mysqlbinlog sees a binlog with events corresponding to queries like "LOAD DATA INFILE", it re-creates the files locally. There is an internal limit of 2000 files. When a binlog contains more than 2000 files, mysqlbinlog will print an error message.

Some binlogs contain more than 2000 LOAD DATA INFILE.

Moreover, the algorithm for generating temporary filenames is quadratic, whereas linear would suffice. In this case, quadratic is really slow.

How to repeat:
Create a binlog with > 2000 LOAD DATA INFILE and feed it to mysqlbinlog.

Suggested fix:
===== client/mysqlbinlog.cc 1.162 vs edited =====
--- 1.162/client/mysqlbinlog.cc 2007-09-15 05:10:31 +02:00
+++ edited/client/mysqlbinlog.cc        2007-12-11 12:54:13 +01:00
@@ -147,8 +147,12 @@ class Load_log_processor
   File create_unique_file(char *filename, char *file_name_end)
     {
       File res;
-      /* If we have to try more than 1000 times, something is seriously wrong */
-      for (uint version= 0; version<1000; version++)
+      static uint version= -1;
+      /*
+        If we have to try more than 0x100000 times, something is
+        seriously wrong.
+      */
+      for (version++; version<0x100000; version++)
       {
-       sprintf(file_name_end,"-%x",version);
+       sprintf(file_name_end,"-%05x",version);
        if ((res= my_create(filename,0,
[31 Mar 2008 12:43] Davi Arnaut
Suggested fix: use a standard function for creating temporary file names, such as: mkstemp, mktemp, etc.
[30 Apr 2008 16:08] Andrei Elkin
will be fixed with bug#35546.
[1 May 2008 11:14] Andrei Elkin
Setting it back to verified because the limit 2000 remains after the patch for
Bug#35546.
Perhaps, 2000 would be better to be turned into a configurable variable.
Generating the temp name indeed is nice with "use a standard function".
[26 Sep 2008 18:46] Omer Barnir
triage: Correcting tag from SR51MRU to CHECKED as customer issue is closed
[2 Jul 2009 10:26] Sven Sandberg
After discussion with Luís, we found that the issue is not as serious as I initially thought. There is no limit on the number of LOAD DATA INFILE in a single binlog, because the file's file_id is used to create the file. Assuming the server that created the binlog works ok, there file_id is guaranteed to be unique.

However, you cannot run mysqlbinlog more than 1000 times on the same binlog without cleaning the temp directory in between. That's a much smaller issue. In fact, it's only healthy if users that don't remove the temp files get some sort of notification that their temp directories are filling up.

There are some arguments above that it would be better to use a standard function to generate temp files; however that is a (very small) change in semantics. It could possibly exist users that rely on the filename order of temp files, or something like that. So I think the bug can be closed.
[2 Jul 2009 10:30] Luis Soares
Since I could not repeat and given my discussion with Sven (check [2 Jul 12:26] Sven Sandberg comment), I am closing this as CAN'T REPEAT.