Description:
While we were restoring a backup made with mysqldump ( mysql database < dump.sql ), we found that in some longblob values we had some magical "\n" that weren't there before (and it was pretty critical because they were .zip files).
Those "\n" were NOT in the dump.sql file we have, but they seemed to randomly appear in our database. After a little investigation we found that those "\n" didn't only appear when restoring a dump backup, but in the result of every query executed through "mysql database < file.sql" there were appearing "\n" that weren't in the original query, and they were every 33554432 (2^24) bytes
Furthermore, we created a sql file with an extremely long single query ( insert into table values (...),(...),... ) that ended up being a 112Mb file query. We executed it ( mysql db < file , again ) and the same happened: 3 "\n"'s ( 112M query / 32M chunk = 3.5 chunks) were distributed through all the data, but only affected 3 rows: the 3 ones that had the "bad luck" of being in the "breaking spot"
So, our main problem is: our database is ok (at the moment) but our dump backups are useless because despite they are physically ok, if we restore them we'll be adding "random" "\n"'s to 1/32.000.000 bytes
Machine specifications:
cpu: 2x Intel Xeon QuadCore 2,33GHz
ram: 9Gb
db engine: MySam & InnoDB
How to repeat:
Create a new MySam table with a blob attribute, as for example:
mysql> CREATE TABLE foo (id INT PRIMARY KEY, file LONGBLOB);
create a 40Mb file (for each 32Mb there will be another "\n" so, the more, the better) containing only a single line full of "a" ( "aaaaaaa......aaa" )
upload it to the table (from mysql)
mysql> INSERT INTO foo VALUES (1, LOAD_FILE('file'));
check it:
mysql> SELECT id FROM foo WHERE file LIKE "%\n%"; <---- 0 results
mysql> SELECT id FROM foo; <---- 1 results
#> mysqldump database foo > foo_dump.sql <---- has no "\n"
This way it works well, now let's try the bugged way:
clean the table and create a new input.sql like this:
INSERT INTO foo VALUES (1,"aaaaaaaa..........aaaa"); (again, more than 40million a's)
#> mysql database < input.sql
let's check it:
mysql> SELECT id FROM foo WHERE file LIKE "%\n%"; <---- 1 results
mysql> SELECT id FROM foo; <---- 1 results
#> mysqldump database foo > foo_dump2.sql <---- has one "\n" for each 32Mb
create a new and smaller table:
CREATE TABLE foo2 (id INT PRIMARY KEY, data VARCHAR(100));
and a new input file, this time this way:
INSERT INTO foo2 VALUES (1,"aaaaaaaaaaaaaaaaaaaaaaaa"),(2,"aaaaaaaaaaaaaaaaaaaaaaaaa"),..........,(n,"aaaaaaaaaaaaaaaaaaaaaaaaa");
all in a single line and in a single query, and upload it:
#> mysql database < file.sql
and check it:
mysql> select * from foo2 WHERE data like "%\n%";
+--------+---------------------------------------------------------------+
| id | data |
+--------+---------------------------------------------------------------+
| 467577 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa |
| 933611 | aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+--------+---------------------------------------------------------------+
2 rows in set (0.30 sec)
mysql> select count(*) from foo2 WHERE data NOT like "%\n%";
+----------+
| count(*) |
+----------+
| 999997 |
+----------+
1 row in set (0.32 sec)
mysql> select count(*) from foo2;
+----------+
| count(*) |
+----------+
| 999999 |
+----------+
1 row in set (0.00 sec)
and if you dump the table, you'll have those two "\n" there too
Suggested fix:
Whatever it is the reason to do this in queries longer than 32Mb remove that "\n".
We can tolerate queries that fail, but a query can NEVER modify the data it stores by its own. And this is even worse because this happens in the backup system