Description:
With statement- or mixed-mode logging, "LOAD DATA INFILE" queries are written to the
binlog using special types of log events (Begin_load_query_log_event /
Append_block_log_event / Execute_load_query_log_event in >= 5.0, Create_file_log_event /
Append_block_log_event / Execute_file_log_event before 5.0). When mysqlbinlog reads such
events, it re-creates the file in a temporary directory with a generated filename and
outputs a "LOAD DATA INFILE" query where the filename is replaced by the generated file.
The temporary file is not deleted by mysqlbinlog (it cannot be deleted since users must
be able to pipe the output of mysqlbinlog to a client).
There are several problems with reproducing LOAD DATA INFILE queries in this manner:
- If the user does not know the internal workings of mysqlbinlog, the temp files will
accumulate on disk and never be removed.
- There is no easy way to remove the temp files automatically (the safest way is to
parse the output of mysqlbinlog, which is hard and there is still no guarantee that the
format never changes, etc.)
- Temp files will accumulate on the pushbuild server.
(An associated problem is that the way temporary files are generated, two instances of
mysqlbinlog running in parallel may in theory generate the same temporary filename and
write to it concurrently. Another issue is BUG#33238.)
How to repeat:
1. Execute a "LOAD DATA INFILE" query.
2. Run mysqlbinlog on the generated binlog.
Suggested fix:
Three possible solutions:
(1) Add flags to mysqlbinlog to aid handling temp files. E.g.:
--no-generate-tempfiles suppress creating temp files (useful when
mysqlbinlog is used only to list binlog,
not piping to client)
--output-tempfile-names=FILENAME write generated files to a given file
(to simplify automatic and manual removal
of generated tempfiles)
--tempdir=DIR put tempfiles in given directory instead
of system tempdir (also to simplify
removal)
(2) In mixed mode logging, use row-based logging for "LOAD DATA INFILE".
(3) Introduce new SQL syntax allowing to include the entire file contents in the query.
E.g.:
LOAD DATA INSTRING 'file contents' INTO TABLE t1;
The point is that you can replace INFILE 'file name' by INSTRING 'file contents' and get
an equivalent query that can be completely contained in the binlog. Then:
- mysqlbinlog outputs LOAD DATA INSTRING when it sees Execute_file_query_log_event
- we can get rid of the special events associated with LOAD DATA INFILE and just log a
LOAD DATA INSTRING query instead.
I think (1) is a bad option. This is a bug in the architecture and should be fixed by a
more robust architecture. (1) is just a workaround. The user still needs to be aware of
this strange internal architecture.
I think (2) and (3) are both ok. If we do (2), it still does not work with pure
statement-based logging, but I'm a mixed-mode advocate who thinks pure statement-based
logging should be considered deprecated. (3) may be an interesting feature in itself but
of course requires much more work than (2).