Bug #47937 Appending to fields using CONCAT gets very slow
Submitted: 9 Oct 2009 9:11 Modified: 9 Oct 2009 11:41
Reporter: Marcus Bointon Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: General Severity:S5 (Performance)
Version: OS:Any
Assigned to: CPU Architecture:Any

[9 Oct 2009 9:11] Marcus Bointon
Description:
I have a very large number of user records (> 10 million) in InnoDB, each having their own activity log TEXT field. Quite often I need to add activity data to lots of them at once, using a query along the lines of:

UPDATE users SET log = CONCAT(log, '2009-10-09 12:00:00\t sent newsletter') WHERE account_id = 123;

This may be for up to half a million records at a time. To start with this query works fine, but over time, as the log field gets bigger, it slows down exponentially as its roughly an O(n^2) operation - in my case I've seen it grow from a few seconds to over 2 hours. It seems that MySQL reads the whole field, then writes the whole thing back out. As a workaround, I tried this:

UPDATE users SET log = INSERT(log, LENGTH(log), LENGTH('2009-10-09 12:00:00\t sent newsletter'), '2009-10-09 12:00:00\t sent newsletter') WHERE account_id = 123;

but it gives exactly the same performance problem.

Assuming that mysql already knows the length of the field as part of the record storage, it should be possible to append data to a field without having to read it, so while the original query reads and writes the whole existing string, it should be possible to optimise that by recognising that source and destination fields are the same and simply appending to the existing value and increasing the stored length. This would return the complexity to a more manageable O(n), and thus give reasonable performance.

I have looked at numerous other ways of doing this such as using the archive storage engine, using a really tall table (I'd need over a billion rows!), storing links to external files (which runs into file system issues with that many files, though it doesn't have a problem appending to files of any size in constant time). I guess it would be a reasonable idea to build a storage engine that mapped external files into TEXT type fields, but that seems like a very long way around, and I want this to be part of my existing InnoDB structure.

How to repeat:
create table t (a TEXT NOT NULL);

repeat 100000 times:
insert into t set a = '123456787890890-13-948579ksgkj dfgnkjdsfng kdsjnf\n';

now do this

update t set a = CONCAT(a, 'b lsdu lsuf lsdnf lsdafn eilwhf iaweuf liaeuf lwabeiuf liaweub fl\n');

repeat a few hundred times and watch the query times get longer and longer. This should be a constant-time operation for a fixed number of records.

Suggested fix:
Enhance optimiser to spot that source and destination of the concat are the same and append text blindly, avoiding reading and writing the field value, and thus the performance hit.
[9 Oct 2009 9:21] Valeriy Kravchuk
Please, send the results of:

EXPLAIN select * from users WHERE account_id = 123;

I'd say this is a nice feature request anyway...
[9 Oct 2009 10:58] Marcus Bointon
I don't think my indexes are anything to do with this as updates of other types on the same set of records don't suffer the slowdown. Here's an example explain anyway:

explain select * from users where customer_id = 123\G

           id: 1
  select_type: SIMPLE
        table: users
         type: ref
possible_keys: customer_id
          key: customer_id
      key_len: 5
          ref: const
         rows: 1320058
        Extra: Using where

Glad to hear you think it's a feasible feature!
[9 Oct 2009 11:41] Valeriy Kravchuk
It would be nice to have:

update t set blob = concat(blob, 'something') where ...

statements executed without reading entire blob value and writing it back. Maybe new data types (Smart/new BLOBs in terms of Informix/Oracle) are needed for this.