Bug #42115 Restoring from a dump adds "\n" randomly in queries longer than 32Mb
Submitted: 14 Jan 2009 15:19 Modified: 15 Jan 2009 22:09
Reporter: Txema Heredia Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: General Severity:S1 (Critical)
Version:5.1.28-rc-community-log OS:Linux (2.6.16.60-0.27-smp x86_64)
Assigned to: CPU Architecture:Any
Tags: Backup, BLOB, longblob, longtext, restore, text

[14 Jan 2009 15:19] Txema Heredia
Description:
While we were restoring a backup made with mysqldump ( mysql database < dump.sql ), we found that in some longblob values we had some magical "\n" that weren't there before (and it was pretty critical because they were .zip files).

Those "\n" were NOT in the dump.sql file we have, but they seemed to randomly appear in our database. After a little investigation we found that those "\n" didn't only appear when restoring a dump backup, but in the result of every query executed through "mysql database < file.sql" there were appearing "\n" that weren't in the original query, and they were every 33554432 (2^24) bytes

Furthermore, we created a sql file with an extremely long single query ( insert into table values (...),(...),... ) that ended up being a 112Mb file query. We executed it ( mysql db < file , again ) and the same happened: 3 "\n"'s ( 112M query / 32M chunk = 3.5 chunks) were distributed through all the data, but only affected 3 rows: the 3 ones that had the "bad luck" of being in the "breaking spot"

So, our main problem is: our database is ok (at the moment) but our dump backups are useless because despite they are physically ok, if we restore them we'll be adding "random" "\n"'s to 1/32.000.000 bytes

Machine specifications:
cpu: 2x Intel Xeon QuadCore 2,33GHz
ram: 9Gb

db engine: MySam & InnoDB

 

How to repeat:
Create a new MySam table with a blob attribute, as for example:

mysql> CREATE TABLE foo (id INT PRIMARY KEY, file LONGBLOB);

create a 40Mb file (for each 32Mb there will be another "\n" so, the more, the better) containing only a single line full of "a" ( "aaaaaaa......aaa" )

upload it to the table (from mysql)

mysql> INSERT INTO foo VALUES (1, LOAD_FILE('file'));

check it:
mysql> SELECT id FROM foo WHERE file LIKE "%\n%";   <---- 0 results
mysql> SELECT id FROM foo;   <---- 1 results

#> mysqldump database foo > foo_dump.sql <---- has no "\n"

This way it works well, now let's try the bugged way:

clean the table and create a new input.sql like this:

INSERT INTO foo VALUES (1,"aaaaaaaa..........aaaa"); (again, more than 40million a's)

#> mysql database < input.sql

let's check it:

mysql> SELECT id FROM foo WHERE file LIKE "%\n%";   <---- 1 results
mysql> SELECT id FROM foo;   <---- 1 results

#> mysqldump database foo > foo_dump2.sql <---- has one "\n" for each 32Mb

create a new and smaller table:

CREATE TABLE foo2 (id INT PRIMARY KEY, data VARCHAR(100));

and a new input file, this time this way:

INSERT INTO foo2 VALUES (1,"aaaaaaaaaaaaaaaaaaaaaaaa"),(2,"aaaaaaaaaaaaaaaaaaaaaaaaa"),..........,(n,"aaaaaaaaaaaaaaaaaaaaaaaaa");

all in a single line and in a single query, and upload it:

#> mysql database < file.sql

and check it:
mysql> select * from foo2 WHERE data like "%\n%";
+--------+---------------------------------------------------------------+
| id     | data                                                          |
+--------+---------------------------------------------------------------+
| 467577 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa | 
| 933611 | aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 
+--------+---------------------------------------------------------------+
2 rows in set (0.30 sec)

mysql> select count(*) from foo2 WHERE data NOT like "%\n%";
+----------+
| count(*) |
+----------+
|   999997 | 
+----------+
1 row in set (0.32 sec)

mysql> select count(*) from foo2;                           
+----------+
| count(*) |
+----------+
|   999999 | 
+----------+
1 row in set (0.00 sec)

and if you dump the table, you'll have those two "\n" there too

Suggested fix:
Whatever it is the reason to do this in queries longer than 32Mb remove that "\n".

We can tolerate queries that fail, but a query can NEVER modify the data it stores by its own. And this is even worse because this happens in the backup system
[14 Jan 2009 15:32] Txema Heredia
We've found that we do ALREADY have some "/n"'s in our current database due to a backup restore we did in September 2007.

There is an odd thing about this: We restored our database in a support-server, but it was a 32bit CPU and the "\n"'s we've found there are every 16Mb (2^23) instead the 32Mb (2^24) that we have found now.
[14 Jan 2009 17:26] Valeriy Kravchuk
Thank you for a problem report. Please, send my.cnf file content from both servers you had used.
[15 Jan 2009 22:09] Sveta Smirnova
Thank you for the feedback.

Thisis really duplicate of bug #41486