Bug #73056 Prepared Statements ABSURDLY SLOW due to inefficient query creation for logging
Submitted: 19 Jun 2014 21:34 Modified: 11 Mar 2016 14:52
Reporter: Seth Willits Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Prepared statements Severity:S5 (Performance)
Version:5.7, 5.7.8 OS:Any
Assigned to: CPU Architecture:Any
Tags: logging, prepared, query, slow, statement

[19 Jun 2014 21:34] Seth Willits
Description:
Premise:
---------------------
A significant benefit of using prepared statements is avoiding SQL parsing. Therefore, if I'm going to execute many many INSERT queries with identical structure but different values, a prepared statement makes a lot of sense. The client side does not have to generate SQL when it has the raw buffered data available, and the server side does not have to parse the SQL. There's also less data transmitted from client to server.

Performance Problem:
---------------------
On the server, if the binlog, general log, or slow log is enabled, then during statement execution, the server will generate a full/complete SQL query from the prepared statement's template query and the parameter values available. 

In other words it takes the template query...
    INSERT INTO table (a, b, c) VALUES (?, ?, ?)
    
... and combines with the parameters to produce:
    INSERT INTO table (a, b, c) VALUES ("the value for column a", "value for column b", "etc etc etc")

...so that it can log that full query if necessary.

The problem is that the generation of this query (eg in insert_params_with_log) is extremely slow, so much that it can make using a prepared statement as much as FIFTY TIMES slower than sending the identical SQL from the client to the server in the first place.

How to repeat:
For this particular test I am importing about 350,000 rows into mysql running on the local machine. To fit within the 1 MB max packet size, 4,000 rows are batched together into a single INSERT query. That's about 90 INSERT queries in total.

When sending them as SQL to the server, the import takes about 5 seconds.

When using a prepared statement (again, a single INSERT with 4,000 rows in it) and executing it to import the same total number of rows, it takes 265 seconds. 

50x slower.

CPU profiling of the client shows that nearly all of the time is spent waiting for the server.

CPU profiling of the server shows that 98% of the time is spent in String::replace().

Running Time        Self        Symbol Name
30749.0ms  100.0%       0.0     _pthread_body  0xbac259
30749.0ms  100.0%       0.0      thread_start
30749.0ms  100.0%       0.0       _pthread_start
30749.0ms  100.0%       0.0        _pthread_body
30749.0ms  100.0%       0.0         handle_one_connection
30749.0ms  100.0%       0.0          do_handle_one_connection(THD*)
30749.0ms  100.0%       0.0           do_command(THD*)
30749.0ms  100.0%       0.0            dispatch_command(enum_server_command, THD*, char*, unsigned int)
30749.0ms  100.0%       0.0             mysqld_stmt_execute(THD*, char*, unsigned int)
30748.0ms   99.9%       0.0              Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*)
30333.0ms   98.6%       0.0               Prepared_statement::set_parameters(String*, unsigned char*, unsigned char*)
30333.0ms   98.6%      37.0                insert_params_with_log(Prepared_statement*, unsigned char*, unsigned char*, unsigned char*, String*)
30177.0ms   98.1%       3.0                 String::replace(unsigned int, unsigned int, String const&)
30174.0ms   98.1%       4.0                  String::replace(unsigned int, unsigned int, char const*, unsigned int)
30116.0ms   97.9%   30116.0                   bmove_upp

This is due to the way the algorithm to generate the query works. The algorithm creates a string from the original query template, and then "replaces" the ? parameter indicators with the real SQL values. To replace each one, it reallocates the string's buffer for EACH PARAMETER, and copies the part of the string after the ? being replaced over far enough to make room for the new query, then copies the parameter value into place. With many parameters this is EXTREMELY INEFFICIENT.

Suggested fix:

The insert_params_with_log() parameter function needs to be rewritten to not naively use String::replace and instead efficiently allocate the string once to the final dimensions it needs to be, and likely work from the last parameter to the first so that no repetitive memory copying is needed.

This would be dramatically faster and more efficient meaning prepared statements could be used to do even-faster importing (by avoiding the client having to generate the sql and the server having to parse it).
[11 Aug 2014 10:02] Umesh Shastry
Hello Seth,

Thank you for the report.
I couldn't repeat this with dummy data on my testing box.
Could you please provide complete repeatable test case(schema, scripts, etc), and conf file used during tests?

Thanks,
Umesh
[11 Aug 2014 17:35] Seth Willits
I don't have any particular data and code lying around that I can share. The content of the data really doesn't matter.

What matters are two things:

1) One or more of the binlog, general log, or slow log must be enabled. (See Prepared_statement::setup_set_params() for the path that will use insert_params_with_log).

2) The prepared statement's query must have a lot of parameters: eg, close to the 64K limit. 

Doing that will cause the full query to be generated via 64K inefficient String::replace() calls to fill in the parameters. Obviously the longer the parameter data the longer this replacement will take and the inefficiency will be magnified.
[27 Sep 2014 13:26] Shane Bester
related:  https://bugs.mysql.com/bug.php?id=67676

Maybe that testcase can be used here ?
[24 Apr 2015 12:23] Umesh Shastry
Thank you Shane, Seth.
Observed similar behavior with 5.7.8 with test case from #67676

Thanks,
Umesh
[24 Apr 2015 12:27] Umesh Shastry
test results

Attachment: 73056.txt (text/plain), 16.70 KiB.

[7 May 2015 9:57] Tatjana Nuernberg
There is some degree of improvement for the assembly of the log string (i.e. calculate total size required, allocate, fill in, rather than do it piecemeal). That said, as implicated by the profiling, bmove_upp() performed horribly on modern CPUs; it has however been fixed in Bug#16839824 changeset a/o
2013-06-14, so I'm a bit non-plussed about it showing up in a relatively recent profiling? Does this problem persist (to a similar extent) with a current server?
[7 May 2015 9:58] Tatjana Nuernberg
Should read, "room for improvement," not "degree ..."
[7 May 2015 18:44] Seth Willits
Unless I'm mistaken, 16839824 was fixed in 5.7.2 and Umesh confirmed this in 5.7.8, so the issue seems perfectly relevant still.

Changing the algorithm itself would have a significant positive improvement on performance, so I don't see why it shouldn't happen. It's practically changing it from O(n^2) copies to O(n)... I'd call that a large degree of improvement.
[11 Mar 2016 14:52] Paul Dubois
Noted in 5.8.0 changelog.

Executed prepared statements are logged with ? parameter markers
replaced by data values. Construction of the logged string was
inefficient and has been improved.