Bug #34283 mysqlbinlog leaves tmpfile after termination if binlog contains load data infile
Submitted: 4 Feb 2008 15:50 Modified: 16 Oct 2008 13:10
Reporter: Sven Sandberg
Status: Verified
Category:Server: Replication Severity:S3 (Non-critical)
Version:4.1/5.0/5.1 OS:Any
Assigned to: Sven Sandberg Target Version:
Tags: mysqlbinlog, LOAD DATA INFILE, temporary file
Triage: D3 (Medium)

[4 Feb 2008 15:50] Sven Sandberg
Description:
With statement- or mixed-mode logging, "LOAD DATA INFILE" queries are written to the
binlog using special types of log events (Begin_load_query_log_event /
Append_block_log_event / Execute_load_query_log_event in >= 5.0, Create_file_log_event /
Append_block_log_event / Execute_file_log_event before 5.0). When mysqlbinlog reads such
events, it re-creates the file in a temporary directory with a generated filename and
outputs a "LOAD DATA INFILE" query where the filename is replaced by the generated file.
The temporary file is not deleted by mysqlbinlog (it cannot be deleted since users must
be able to pipe the output of mysqlbinlog to a client).

There are several problems with reproducing LOAD DATA INFILE queries in this manner:
 - If the user does not know the internal workings of mysqlbinlog, the temp files will
accumulate on disk and never be removed.
 - There is no easy way to remove the temp files automatically (the safest way is to
parse the output of mysqlbinlog, which is hard and there is still no guarantee that the
format never changes, etc.)
 - Temp files will accumulate on the pushbuild server.

(An associated problem is that the way temporary files are generated, two instances of
mysqlbinlog running in parallel may in theory generate the same temporary filename and
write to it concurrently. Another issue is BUG#33238.)

How to repeat:
1. Execute a "LOAD DATA INFILE" query.
2. Run mysqlbinlog on the generated binlog.

Suggested fix:
Three possible solutions:

(1) Add flags to mysqlbinlog to aid handling temp files. E.g.:
    --no-generate-tempfiles           suppress creating temp files (useful when
                                      mysqlbinlog is used only to list binlog,
                                      not piping to client)
    --output-tempfile-names=FILENAME  write generated files to a given file
                                      (to simplify automatic and manual removal
                                      of generated tempfiles)
    --tempdir=DIR                     put tempfiles in given directory instead
                                      of system tempdir (also to simplify
                                      removal)

(2) In mixed mode logging, use row-based logging for "LOAD DATA INFILE".

(3) Introduce new SQL syntax allowing to include the entire file contents in the query.
E.g.:

  LOAD DATA INSTRING 'file contents' INTO TABLE t1;

The point is that you can replace INFILE 'file name' by INSTRING 'file contents' and get
an equivalent query that can be completely contained in the binlog. Then:
 - mysqlbinlog outputs LOAD DATA INSTRING when it sees Execute_file_query_log_event
 - we can get rid of the special events associated with LOAD DATA INFILE and just log a
LOAD DATA INSTRING query instead.

I think (1) is a bad option. This is a bug in the architecture and should be fixed by a
more robust architecture. (1) is just a workaround. The user still needs to be aware of
this strange internal architecture.

I think (2) and (3) are both ok. If we do (2), it still does not work with pure
statement-based logging, but I'm a mixed-mode advocate who thinks pure statement-based
logging should be considered deprecated. (3) may be an interesting feature in itself but
of course requires much more work than (2).
[4 Feb 2008 23:34] Sveta Smirnova
Thank you for the report.

Verified as described.

See also bug #12834 and bug #1073
[2 May 2008 14:19] Andrei Elkin
While I personally do not think it's too heavy burden for the user to remove
--load-local directory with possible associated files in it (not temporary!) -
after all the user is to remove the main mysqlbinlog produced file,
there is still a room for implementing the offered option (2) to switch to row-based
replication instead of processing the load data event as currently.
Having that said, setting the ticket back to Open.
[27 Sep 2008 19:29] Sveta Smirnova
Bug #39701 was marked as duplicate of this one.
[18 May 21:34] Daniel Fischer
LOAD DATA INSTRING is a nice idea, but there are certain limits on string length that make
it rather unfeasible. (Remember we ignore a tonne of 64 bit warnings because we only
support 1G strings, etc.)