Bug #24313 Request for non-ACID option for implicit LOAD DATA INFILE periodic commits
Submitted: 14 Nov 2006 17:48 Modified: 25 Sep 2009 3:38
Reporter: Douglas Fischer Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: General Severity:S4 (Feature request)
Version:5.0.26 OS:Any
Assigned to: CPU Architecture:Any

[14 Nov 2006 17:48] Douglas Fischer
Description:
In reference to the change made for bug # 11151, it would be very desirable to have a way to explicitly get the 'old behavior', i.e. a non-ACID manner in which to get LOAD DATA INFILE to implicitly commit every X records, or some other fashion to get the faster performance back, acknowledging that such an option would break ACID. 

In the case where very large loads are being performed and you're willing to allow non-ACID inserts, the performance increase with the old behavior is very useful.

Something along the lines of a NONACID keyword (e.g. LOAD DATA INFILE NONACID) or something like LOAD DATA INFILE IMPLICIT COMMIT INTERVAL 100000, etc.

How to repeat:
Feature request
[25 Sep 2009 3:38] Valeriy Kravchuk
Thank you for the feature request.
[25 Sep 2009 3:42] Valeriy Kravchuk
Bug #43320 was marked as a duplicate of this one.
[14 Jan 2010 7:53] Justin Swanhart
Can you characterize the difference in performance that you expect to see?

Outside of the possibility of long rollbacks, long running transactions aren't necessarily a bad thing in my book.

As I understand it, Oracle originally added periodic commit to the loader because rollback segments were of fixed size and rollback could easily be exceeded by a long loader call.  Since InnoDB has no such limits, periodic commit seems of little use to me, except to avoid long rollbacks.
[22 Jan 2010 15:55] Mark Callaghan
Periodic commits don't make loads faster when run in isolation. But I don't like long running loads on production servers because:
* rollback takes forever
* purge is halted by this as the oldest transaction snapshot does not advance
* this may consume all memory for locks and make InnoDB abort. I noticed this after changing reporting slaves to use InnoDB temp tables -- there were long running INSERT into foo select from bar queries that triggered the crash.
[22 Jan 2010 15:56] Mark Callaghan
InnoDB also does periodic commits during ALTER TABLE, from ha_innobase::write_row

        if ((user_thd->lex->sql_command == SQLCOM_ALTER_TABLE
            || user_thd->lex->sql_command == SQLCOM_OPTIMIZE
            || user_thd->lex->sql_command == SQLCOM_CREATE_INDEX
            || user_thd->lex->sql_command == SQLCOM_DROP_INDEX)
            && num_write_row >= 10000) {
                /* ALTER TABLE is COMMITted at every 10000 copied rows.
                The IX table lock for the original table has to be re-issued.
                As this method will be called on a temporary table where the
                contents of the original table is being copied to, it is
                a bit tricky to determine the source table.  The cursor
                position in the source table need not be adjusted after the
                intermediate COMMIT, since writes by other transactions are
                being blocked by a MySQL table lock TL_WRITE_ALLOW_READ. */
[22 Jan 2010 16:20] Mark Callaghan
ha_innobase::write_row can be extended to do commit-per-N rows during long running loads as it already does this for ALTER TABLE.
[22 Jan 2010 16:35] Harrison Fisk
There was an old bug where InnoDB used to commit transaction automatically for LOAD DATA INFILE:

http://bugs.mysql.com/bug.php?id=11151

This used the ha_enable_transaction() call to enable/disable transactions for LOAD DATA INFILE.  It looks like the call is only used in ALTER TABLE now.  It seems pretty easy to add that into other sections of code as well (OPTIMIZE, CREATE INDEX, DROP INDEX, which are what InnoDB checks).

Rather than having InnoDB check for specific SQL statements, it could then check for thd->transaction.on = on and then commit every 10000 in that case.

Once that is done, it would be trivial to add "transaction-less" mode for InnoDB for many statements (or even a session variable which could be changed).
[22 Jan 2010 21:16] Harrison Fisk
5.1 Patch to add NO TRANSACTION to LOAD DATA INFILE to commit every 10k rows in non-plugin InnoDB

Attachment: load_data_commit.patch (text/x-patch), 5.00 KiB.

[22 Jan 2010 21:21] Harrison Fisk
I made a small patch to add NO TRANSACTION to LOAD DATA INFILE syntax.  It comes right after the CHARACTER SET syntax, such as:

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [CHARACTER SET charset_name]
    [NO TRANSACTION]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number LINES]
    [(col_name_or_user_var,...)]
    [SET col_name = expr,...]

This will commit any open transactions and cause InnoDB to automatically commit every 10k rows and at completion of the LOAD.

It does change the ABI for plugins by adding a new external function to access a THD value.  According to Serg, it should be backwards compatible, but I bumped the plugin ABI version just in case.

TODO for it:
* add test cases
* add innodb plugin (should be 1-liner)
* fixup rest of code to use this new API, rather than checking query types
* Fix NDB cluster to use this as well perhaps
[22 Jan 2010 21:41] Harrison Fisk
Oh, I need to check to see if it at all works with replication as well.

It would also make LOAD DATA INFILE not work inside of stored functions (nothing with implicit commits can), but procedures should be fine afaik.
[22 Jan 2010 22:02] Harrison Fisk
It does pass the current test suite though, so it doesn't break anything ;)
[16 Apr 2013 21:22] Len Budney
Any updates on this bug? It was verified back in 2010, but apparently not fixed?