Bug #98717 generated column causes mysqldump to create extremely big dump file
Submitted: 23 Feb 2020 14:57 Modified: 24 Feb 2020 5:37
Reporter: Christian Roser Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: mysqldump Command-line Client Severity:S1 (Critical)
Version:5.7.x, 5.7.29 OS:Debian
Assigned to: CPU Architecture:x86
Tags: corruption, generated column, invalid data, mysqldump

[23 Feb 2020 14:57] Christian Roser
Description:
On some customer databases we see following issue that mysqldump creates dumpfiles with (almost) infinite sizes. 

The bug seems to be caused by https://bugs.mysql.com/bug.php?id=80790 however the severity of it's impact is too big so I create this new bug here.

Small example:
table with ~250k rows
mysql@database_hostb[data](rdbng)> du -sh t_database/t_test.*
12K     t_database/t_test.frm
80M     t_database/t_test.ibd

using select in pipe with pv to check size when reading everything from table
mysql@database_hostb[~](rdbng)> mysql -BN -e "select * from t_test" t_database | pv > /dev/null
34.2MiB 0:00:01 [32.5MiB/s]

using pv in combination with mysqldump
mysql@database_hostb[~](rdbng)> mysqldump t_database t_test | pv > /dev/null
3.69TiB 0:44:28 [1.42GiB/s] [

The main problem is that dumping the small database runs the target volume out of diskspace at every database backup. It would theoretically need 3.7T to store the (invalid) dump of an 80M table.

https://bugs.mysql.com/bug.php?id=80790 is all about the invalid syntax and not restoreable dumpfile when using generated columns but the problem goes much deeper.

mysqldump seems to get confused by all that weird comma and starts multiplying the rows of the table.

I created a small testcase for that

CREATE SCHEMA test;

use test

CREATE TABLE `t1` (
  `gen_id` varchar(16) GENERATED ALWAYS AS (concat(`col2`,`col3`)) STORED NOT NULL,
  `col2` varchar(10) NOT NULL,
  `col3` int(11) DEFAULT NULL,
  PRIMARY KEY (`gen_id`)
); -- engine and charset don't seem to matter

insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);

-- running mysqldump
mysqldump test t1

LOCK TABLES `t1` WRITE;
/*!40000 ALTER TABLE `t1` DISABLE KEYS */;
INSERT INTO `t1` (`col2`, `col3`) VALUES ,'zaawqR6NZ5',2147483647);
/*!40000 ALTER TABLE `t1` ENABLE KEYS */;
UNLOCK TABLES;

-- so far so good, nothing new compared to bug 80790, snytactically the insert within the dump is broken

-- let's insert another row verify it and run mysqldump again
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
select * from t1;
+------------------+------------+------------+
| gen_id           | col2       | col3       |
+------------------+------------+------------+
| WLbSj8YuJV214748 | WLbSj8YuJV | 2147483647 |
| zaawqR6NZ5214748 | zaawqR6NZ5 | 2147483647 |
+------------------+------------+------------+

content of mysqldump looks like that:
/*!40000 ALTER TABLE `t1` DISABLE KEYS */;
INSERT INTO `t1` (`col2`, `col3`) VALUES ,'WLbSj8YuJV',2147483647),,'WLbSj8YuJV',2147483647),'zaawqR6NZ5',2147483647);
/*!40000 ALTER TABLE `t1` ENABLE KEYS */;

weird thing: content of first row is located twice in the dump

10 rows in table:       55 'rows' in dump
100 rows in table:     5050 rows in dump
1000 rows in table:   500500 rows in dump
10000 rows in table: 50005000 rows in dump

How to repeat:
CREATE SCHEMA test;

use test

CREATE TABLE `t1` (
  `gen_id` varchar(16) GENERATED ALWAYS AS (concat(`col2`,`col3`)) STORED NOT NULL,
  `col2` varchar(10) NOT NULL,
  `col3` int(11) DEFAULT NULL,
  PRIMARY KEY (`gen_id`)
); -- engine and charset don't seem to matter

insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);
insert into t1 (col2, col3) values('$(pwgen -s 10 -1)', 08154711123);

mysqldump test t1

Suggested fix:
If mysqldump is not able to dump the table properly it should just fail instead of spamming the filesystem with invalid data
[24 Feb 2020 5:37] MySQL Verification Team
Hello Christian,

Thank you for the report and test case.
Imho an existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely
to be the same. Because of this, we hope you add your comments to the original bug instead.

Thank you for your interest in MySQL.

regards,
Umesh
[24 Feb 2020 5:39] MySQL Verification Team
- In case pwgen don't exists

DROP SCHEMA IF EXISTS test;
CREATE SCHEMA test;
use test

CREATE TABLE `t1` (
  `gen_id` varchar(16) GENERATED ALWAYS AS (concat(`col2`,`col3`)) STORED NOT NULL,
  `col2` varchar(10) NOT NULL,
  `col3` BIGINT DEFAULT NULL,
  PRIMARY KEY (`gen_id`)
); 

insert into t1 (col2, col3) values(LEFT(UUID(), 8), 08154711);
insert into t1 (col2, col3) values(LEFT(UUID(), 8), 08154711);
insert into t1 (col2, col3) values(LEFT(UUID(), 8), 08154711);
insert into t1 (col2, col3) values(LEFT(UUID(), 8), 08154711);
insert into t1 (col2, col3) values(LEFT(UUID(), 8), 08154711);

--
bin/mysqldump -uroot -S /tmp/mysql_ushastry.sock test t1
-- MySQL dump 10.13  Distrib 5.7.29, for linux-glibc2.12 (x86_64)
--
-- Host: localhost    Database: test
-- ------------------------------------------------------
-- Server version       5.7.29

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `t1`
--

DROP TABLE IF EXISTS `t1`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `t1` (
  `gen_id` varchar(16) GENERATED ALWAYS AS (concat(`col2`,`col3`)) STORED NOT NULL,
  `col2` varchar(10) NOT NULL,
  `col3` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`gen_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `t1`
--

LOCK TABLES `t1` WRITE;
/*!40000 ALTER TABLE `t1` DISABLE KEYS */;
INSERT INTO `t1` (`col2`, `col3`) VALUES ,'02be34f3',8154711),,'02be34f3',8154711),'02be5e10',8154711),,'02be34f3',8154711),'02be5e10',8154711),'02be87e4',8154711),,'02be34f3',8154711),'02be5e10',8154711),'02be87e4',8154711),'02beb0df',8154711),,'02be34f3',8154711),'02be5e10',8154711),'02be87e4',8154711),'02beb0df',8154711),'02f5cf5c',8154711);
/*!40000 ALTER TABLE `t1` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2020-02-24  6:32:27