Bug #80347 mysqldump backup restore fails due to invalid FTS_DOC_ID (Error 182 and 1030)
Submitted: 12 Feb 2016 5:02 Modified: 8 Nov 2016 13:39
Reporter: Emanuel Calvo (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.7.9, 5.7.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: mysqldump

[12 Feb 2016 5:02] Emanuel Calvo
Description:

mysqldump/mysql client restore fails when using FTS_DOC_ID in certain cases. 

The table structure, count and data are the following:

master [localhost] {msandbox} (test) > show create table bookContentByLine\G
*************************** 1. row ***************************
       Table: bookContentByLine
Create Table: CREATE TABLE `bookContentByLine` (
  `FTS_DOC_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `bookid` bigint(20) DEFAULT NULL,
  `content` text,
  PRIMARY KEY (`FTS_DOC_ID`),
  KEY `ftsbookid` (`bookid`),
  FULLTEXT KEY `ftscontent` (`content`)
) ENGINE=InnoDB AUTO_INCREMENT=145296 DEFAULT CHARSET=latin1
1 row in set (0,00 sec)

master [localhost] {msandbox} (test) > select count(*) from bookContentByLine;
+----------+
| count(*) |
+----------+
|   100732 |
+----------+
1 row in set (0,05 sec)

master [localhost] {msandbox} (test) > select FTS_DOC_ID, bookid FROM bookContentByLine WHERE FTS_DOC_ID < 3;
+------------+--------+
| FTS_DOC_ID | bookid |
+------------+--------+
|          1 |  12242 |
|          2 |  12242 |
+------------+--------+
2 rows in set (0,00 sec)

Errors:
ERROR 182 (HY000) at line 41: Invalid InnoDB FTS Doc ID
Error (Code 182): Invalid InnoDB FTS Doc ID
Error (Code 1030): Got error 182 from storage engine

How to repeat:

emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ master/my sqldump -c -t -e test bookContentByLine > dumps/bcl_onlydata.dump
emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m bug < dumps/bcl_onlydata.dump
ERROR 182 (HY000) at line 42: Invalid InnoDB FTS Doc ID

emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ master/my sqldump test bookContentByLine > dumps/bcl.dump
emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m bug < dumps/bcl.dump 
ERROR 182 (HY000) at line 41: Invalid InnoDB FTS Doc ID

emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ master/my sqldump --no-data test bookContentByLine | ./m bug
emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ master/my sqldump --single-transaction test bookContentByLine | ./m bug
ERROR 182 (HY000) at line 41: Invalid InnoDB FTS Doc ID
mysqldump: Got errno 32 on write

Adding the insert (here only used a few docs for the sake of the length of this post):

emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m bug

master [localhost] {msandbox} (bug) > INSERT INTO `bookContentByLine` (`FTS_DOC_ID`, `bookid`, `content`) VALUES (1,12242,'\"Project Gutenberg\'s Poems: Three Series, Complete, by Emily Dickinson\"'),(2,12242,'\"This eBook is for the use of anyone anywhere at no cost and with\"'),(3,12242,'\"almost no restrictions whatsoever.  You may copy it, give it away or\"'),(4,12242,'\"re-use it under the terms of the Project Gutenberg License included\"'),(5,12242,'\"with this eBook or online at www.gutenberg.net\"'),(6,12242,'\"Title: Poems: Three Series, Complete\"'),(7,12242,'\"Author: Emily Dickinson\"'),(8,12242,'\"Release Date: May 3, 2004 [EBook #12242]\"'),(9,12242,'\"Language: English\"'),(10,12242,'\"*** START OF THIS PROJECT GUTENBERG EBOOK POEMS: THREE SERIES, COMPLETE ***\"'),(11,12242,'\"Produced by Jim Tinsley <jtinsley@pobox.com>\"'),(12,12242,'\"POEMS\"'),(13,12242,'\"by EMILY DICKINSON\"');
Query OK, 13 rows affected (0,02 sec)
Records: 13  Duplicates: 0  Warnings: 0

Please use https://github.com/3manuek/FTS_bug to see the dumps generated.

Also, if you want to load the data from original sources, use the `load.R` module at https://github.com/3manuek/fts_article/blob/master/load.R .

Suggested fix:

Doing partial inserts or doing manual inserts works (from the mysql client).
[12 Feb 2016 7:28] Umesh Shastry
Hello Emanuel,

Thank you for the report and test case.
Observed this with 5.7.11 build.

Thanks,
Umesh
[13 Feb 2016 22:35] Emanuel Calvo
Hi Umesh,

I dug a little bit on the bug and traced what's causing this error. 

The error happens at innobase/row/row0mysql.cc, when checking the difference.

The problem looks like that mysqldump does not consider smaller chunks of inserts when backuping FTS_DOC_ID columns.

2016-02-13T22:11:53.125300Z 19 [ERROR] InnoDB: Doc ID 10002 is too big. Its difference with largest used Doc ID 1 cannot exceed or equal to 10000

Regards,
[13 Feb 2016 22:37] Emanuel Calvo
From the code at innobase/row/row0mysql.cc is clear:

                        /* Difference between Doc IDs are restricted within
                        4 bytes integer. See fts_get_encoded_len() */
[12 Oct 2016 16:43] Emanuel Calvo
Just to confirm this is actually a serious issue, as it does not allow to restore backups using FTS_DOC_ID. Backups are useless for tables using explicit FTS_DOC_ID and with more than 4 bytes integer (rows>9999 if id is sequentially assigned).

mysqldump should detect if FTS_DOC_ID is explicit and dump in smaller batches in order to do not overflow the permitted difference limitation imposed per fts_get_encoded_len() (innobase/row/row0mysql.cc).
[1 Nov 2016 2:44] Brett Gardner
Smaller batches may not necessarily improve things. eg

1. Inserts record which receives FTS_DOC_ID = 100
2. 100,000 inserts are recorded
3. Records with id 101 -> 80000 are then deleted
4. Attempting to load a backup from mysqldump will fail as the gap between record with id 100 and 80001 is more than the allowed gap
[1 Nov 2016 13:01] Emanuel Calvo
Brett, is expected to have this error if using mysqldump as the split needs to be done at backup time. You can try mydumper/myloader using the queries-per-transaction =< 999. It is a partial workaround, but it should work.
[8 Nov 2016 4:37] Brett Gardner
I'm confused re your last statement.

Isn't it illegal to load the following data

FTS_DOC_ID, TextContent
1,"Some content"
100000,"Some other content"

as there would be a gap of more than 65,535, regardless of how you attempt to load it? Or do you need to ensure that on reload that the FTS_DOC_ID's are reset, in which case they cannot be used as foreign keys in any other table?

This situation is allowed to occur during the initial incremental load / deletion as the rows with the interim values still exist, but are marked as deleted so do not appear in any query results.
[8 Nov 2016 13:39] Emanuel Calvo
Hi Brett,

as specified, the difference between Doc IDs are restricted within 4 bytes integer PER TRANSACTION (the uppercase is not explicit, that's why I added it). You don't need to change the FTS_DOC_ID, although it is a limitation when inserting FTS rows. 

Regards,