Bug #85155 Importing huge statement seems to cause memory problems
Submitted: 23 Feb 2017 16:25 Modified: 20 Jul 2018 18:05
Reporter: Thomas Benkert Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S5 (Performance)
Version:5.7.17 OS:CentOS
Assigned to: CPU Architecture:Any
Tags: import, Memory, statement

[23 Feb 2017 16:25] Thomas Benkert
Description:
I have a file with one big INSERT statement, about 1.700.000 rows and I'm importing it like this:

shell> mysql -u user my_database < file_with_huge_insert_stmt

The table looks like this:

CREATE TABLE `customerInformation` (
  `customerNumber` int(11) NOT NULL,
  `company` varchar(255) DEFAULT NULL,
  `salutation` varchar(50) DEFAULT NULL,
  `firstName` varchar(255) DEFAULT NULL,
  `lastName` varchar(255) DEFAULT NULL,
  `street` varchar(255) DEFAULT NULL,
  `additionalAddress` varchar(255) DEFAULT NULL,
  `postcode` varchar(50) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  `country` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`customerNumber`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

During this operation I get nothing in the error log but this

2017-02-23T15:54:26.884182Z 82 [Note] Aborted connection 82 to db: 'playground' user: 'root' host: 'localhost' (Got timeout reading communication packets)

The processlist says, that the thread is just sleeping.

root@srv-dw03:playground > select * from information_schema.processlist where id = 84\G
*************************** 1. row ***************************
     ID: 84
   USER: root
   HOST: localhost
     DB: playground
COMMAND: Sleep
   TIME: 45
  STATE:
   INFO: NULL

Several minutes after this error message I get this on the command line:

ERROR 2006 (HY000) at line 1: MySQL server has gone away

When I use a 5.6.21 client on another machine to import the statement on the 5.7 server, like this:

shell> mysql -u user -h remote_5_7_host my_database < file_with_huge_insert_stmt

(which means importing it to the same server) this whole statement took about 5 seconds.

So I wanted to see what they are doing differently with strace. The difference is, that the 5.6 client is doing a lot of reads and after a while switching to reading and mremapping.

...
read()
read()
...
mmap()
read()
mremap()
read()
mremap()
...

The 5.7 client on the other hand does something like this:

...
read()
read()
brk()
brk()
read()
read()
brk()
brk()
...

before it switches after a while to

...
mmap()
munmap()
read()
mmap()
munmap()
read()
mmap()
munmap()
read()
...

I hope this stripped down version helps anyway, but I would have to remove a lot of sensitive data.

How to repeat:
Import a huge statment via command line client.

shell> mysql -u user my_database < file_with_huge_insert_stmt
[25 May 2017 19:54] Scott Benkovic
Thomas have you found a solution?  I am having a similar problem replaying binary log files since upgrading to 5.7.  Worked fine in 5.1,5,6.
https://community.oracle.com/thread/4036847

Scott
[26 May 2017 7:10] Thomas Benkert
Yes, I actually found a solution, I'm surprised though, that this also happens with mysqlbinlog. The solution for me was to remove the newlines from the file. I wanted to update this bug report, but haven't found the time yet.
[26 May 2017 18:20] Scott Benkovic
That helped at least figure out what is going on.

When I extract the binary log that contains a 120MB blob:
mysqlbinlog drrsdb1-bin.000001 > drrsdb1-bin.000001.sql

Then use "tr -d '\r\n'" on the blob portion of the sql file to remove CR/LF.
The file imports in a couple of seconds like previous versions:
mysql -u root -pxxxxx < drrsdb1-bin.000001.sql

If I don't remove CR/LF, it takes 20 minutes to import.

The sql file that results from mysqlbinlog in both 5.6 and 5.7 produce the same file.  Looks like its the way the 5.7 mysql client reads it when doing the import.  None of the client options seem to make a difference.

Anyway thanks for the information, Scott
[18 May 2018 15:09] Sinisa Milivojevic
Hi,

Thank you for this report.

We have seen cases like this one many, many times. It is not a bug, but only a case of misconfiguration. Especially when it comes to all the various timeouts ......
[22 May 2018 7:03] Thomas Benkert
Hi  Sinisa,
thanks for your reply, but do you mind explaining that a bit? As I wrote, the solution for me was to remove the newlines from the sql statement. I don't understand how that would be correlated to misconfiguration. The timeout is just a symptom, not the main cause.
[22 May 2018 12:18] Sinisa Milivojevic
Hi,

What BLOB are you writing about ???

Your table does not have any such column. Also, looks like you are writing about two different problems. First description shows the automatic closing of the sleeping connections. That is expected behaviour, regulated by one of the timeouts on the server.

Second problem is totally unrelated and deals with a large BLOB, which does not exist and difference in the speed of importing it with or without \r\n. It is also unclear how can you remove newlines at all. This looks like mysqlbinlog was not used with proper options. 

Two different issues.
[22 May 2018 12:33] Thomas Benkert
Hi Sinisa,

the blob in a binary log thing is a comment from Scott, not from me. His problem seems to be related however. 

About the timeout, again this is just a symptom. The real problem is, that the mysql client has problems managing memory and this is somehow related to reading the newline character. The timeout happens, because the mysql client is too busy handling memory and in this time it doesn't send anything to the mysql server. 

Scott's comment just proves, that it's not an issue with my insert statement, but that it can also happen when the source is the binary log. I don't believe it's a misuse of mysqlbinlog.
[22 May 2018 12:39] Sinisa Milivojevic
Again, these are two different cases. One is a sleeping connection, which timeouts, which is expected and designed behaviour.

The other is the execution of the very large INSERT statement that takes only 5 seconds. Again, expected behaviour.
[22 May 2018 12:49] Thomas Benkert
Sinisa,

I don't know how to make this any more clear. I did not file a bug report because of the timeout. I filed this bug report because the mysql client is not capable of sending certain statements to the server because of memory issues. In the bug report I showed that the mysql client of version 5.6 works fine, whereas that of 5.7 does not. I wanted to provide information how MySQL is behaving, so that others can relate (maybe this helped Scott to identify that his problem may be related to this one). I'm sorry, that this (for you) unnecessary information is causing too much confusion. Next time I'll leave it out, so you can concentrate on the actual problem.

Thanks.
[22 May 2018 12:53] Sinisa Milivojevic
Hi,

So, mysql client, as per your opinion, does not send statements to the server, for some unknown issue.

What you can do is upload that SQL file with INSERT which does not get sent to the server and I will try it out.
[22 May 2018 15:37] Thomas Benkert
Hello Sinisa,

sure, no problem. I reproduced the bug again, although I have limited the test data because I have not much time at hand, so there was no timeout, but with enough data I bet there would be as the times to execute the statement differs hugely.

How to reproduce:

In MySQL create a table like this:

create table reproduce_bug (a varchar(5), b varchar(5), c varchar(5), d varchar(5), e varchar(5), f varchar(5), g varchar(5), h varchar(5));

On the command line produce a test statement like this:

echo "INSERT INTO reproduce_bug VALUES ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')" > rb
for i in {1..1000000}; do echo ", ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')" >> rb ; done
echo ';' >> rb

This produces a file with 42MB in size.

Then load the statement like this:

time cat rb | mysql your_playground_schema

In MySQL 5.6 I get this:

real    0m57.942s
user    0m0.464s
sys     0m0.096s

In 5.7 I get 

real    3m46.506s
user    0m48.679s
sys     1m34.243s

VM with 5.6 has 1 core and 1GB RAM
VM with 5.7 has 2 cores and 2GB RAM
Configuration of MySQL is the same on both servers.
[31 May 2018 15:45] Sinisa Milivojevic
Hi,

I can't repeat your results. I used my two year old iMac, with 1 Gb RAM dedicated to MySQL server.

My dump file contained also CREATE TABLE and was optimised for minimal locking, not for speed. Your table and your data were used.

Here you can see the dump file size :

rw-r--r--  1 sinisa  staff  35653667 May 31 16:42 tmp/reproduce.sql

Here are the results on 5.6.36:

$ time mysql  test < tmp/reproduce.sql

real	2m10.343s
user	0m0.928s
sys	0m0.086s

Total number of rows in the table was 1048576.

This is a speed on 5.7.21:

$ time mysql  test < tmp/reproduce.sql

real	1m59.153s
user	0m0.753s
sys	0m0.091s

Dump was not optimised for speed, but as I wrote, for minimum locking. Next, I used our debug binaries and not production binaries. Debug binaries are 20 - 30 % slower then our production binaries.

It should be noted that this is a single-thread operation, so number of cores is irrelevant.
[1 Jun 2018 9:02] Shane Bester
The mysql process ends up going a gazillion memory reallocs...

[x@x ~]$ for i in {1..5000000}; do echo ", ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')" >> rb ; done
[x@x ~]$ echo ';' >> rb
[x@x ~]$ ls -l rb
-rw-rw-r--. 1 wwww wwww 215000076 Jun  1 10:35 rb
[x@x ~]$ time ./mysql/release/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysql -uroot -h192.168.1.107 test < rb

[x@x release]$ pstack 2995
#0  __memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:2938
#1  0x000000000045704f in my_realloc (key=<optimized out>, ptr=0x7fec36e7b030, size=33437320, flags=<optimized out>) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/mysys/my_malloc.c:100
#2  0x0000000000419943 in String::mem_realloc (this=0xa467c0 <glob_buffer>, alloc_length=33437315, force_on_heap=<optimized out>) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/sql-common/sql_string.cc:115
#3  0x0000000000416f70 in add_line (truncated=false, ml_comment=<optimized out>, in_string=<optimized out>, line_length=42, line=0x27a3ae2 ", ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')\n, ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')\n, ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')\n, ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')\n, ('a', 'b', 'c', 'd', 'e', "..., buffer=...) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysql.cc:2795
#4  read_and_execute (interactive=false) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysql.cc:2367
#5  0x00000000004182da in main (argc=5, argv=0x2792da0) at /export/home/pb2/build/sb_0-21378219-1480347226.17/mysql-5.7.17/client/mysql.cc:1447

Meanwhile...
mysql> show processlist;
+----+------+---------------------+------+---------+------+----------+------------------+
| Id | User | Host                | db   | Command | Time | State    | Info             |
+----+------+---------------------+------+---------+------+----------+------------------+
|  2 | root | localhost:48303     | test | Query   |    0 | starting | show processlist |
|  3 | root | 192.168.1.105:40506 | test | Sleep   |  230 |          | NULL             | <--- waiting around ....
+----+------+---------------------+------+---------+------+----------+------------------+
2 rows in set (0.00 sec)
[1 Jun 2018 9:21] Shane Bester
Seems to be this taking time on long lines:  at bottom of add_line function:

buffer.mem_realloc(buffer.length()+length+IO_SIZE);

I didn't bother testing 5.6.  It seems mysql client simply needs to be optimized to read long lines.  Incrementally adding IO_SIZE (4096) to the memory allocation and recopying everything to new malloc is not optimal.

How about using powers of two up to say 16M at a time to increase buffer size?
[1 Jun 2018 11:06] Shane Bester
Verified.
5.6 worked quickly (seconds) vs over an hour to read the file in 5.7.
[1 Jun 2018 14:17] Shane Bester
Hi Sinisa, the server is not involved in the bug so we can ignore it.

The nature of this bug is the time taken for mysql command line client to read a long multi-line query.

Reading times that grow exponentially and alloc/dealloc crazy number of times are not expected.

In this test: where I double up the number $i range to make larger number of lines in query,

echo "INSERT INTO reproduce_bug VALUES ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')" > rb
for i in {1..10000}; do echo ", ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')" >> rb ; done
echo ';' >> rb
time ./mysql-5.7.22-linux-glibc2.12-x86_64/bin/mysql -uroot test < rb

Here are the timings.  The times should increase linearly, but it grows out of control quick.

5.7.22 client:
numlines, real time
--------------------------
10000,    real    0m0.474s
20000,    real    0m0.464s
40000,    real    0m1.385s
80000,    real    0m3.064s
160000,   real    0m10.877s
320000,   real    0m40.425s
640000,   real    2m41.129s
1280000,  real    10m8.779s
2560000,  real    44m11.166s

by contract,  5.6.21 client:

2560000,  real    0m26.464s

Many people like dump-files with newlines for each row (as evidenced in https://bugs.mysql.com/bug.php?id=65465 )
since they are more readable.
[1 Jun 2018 15:49] Sinisa Milivojevic
Setting severity to the correct value, since the cause of the performance degradation has been identified.
[21 Jun 2018 12:18] Shane Bester
Tested an internal preliminary fix from Sinisa.

The standard build 5.7.22:
-----------------------------
E:\mysql-5.7.22-winx64\bin>mysql -uroot test < f:\tmp1\rb
~600 seconds  (and increasingly crazy memory)

5.7.22 with Sinisa’s patch:
----------------------------

E:\mysql-5.7.22-winx64\bin>F:\ade\mysql_archives\5.7\5.7.22\mysql-5.7.22\bld\client\RelWithDebInfo\mysql -uroot test < f:\tmp1\rb
~25 seconds (with steady memory)
[21 Jun 2018 14:54] Sinisa Milivojevic
It should be noted that my patch is VERY preliminary and actually more proof-of-the-concept then anything else.

My patch changes only code in the mysql CLI and nothing else. Server remains the same in code and configuration.

I must also note that the performance improvements on the fully optimised dumps are very small.

I will be sending my patch to the right address, for any (potential) scheduling and (truly) necessary changes.
[20 Jul 2018 18:05] Paul Dubois
Posted by developer:
 
Fixed in 8.0.13.

The mysql client was slow to import large multiple-line statements in
batch mode. Memory allocation is now more efficient for this
situation.