Bug #111628 Parallel chunk restoration doesn't preserve order of records from original table
Submitted: 30 Jun 2023 8:41 Modified: 3 Jul 2023 11:56
Reporter: Jyoti Rajai Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Shell Dump & Load Severity:S3 (Non-critical)
Version:8.0.33 OS:Other
Assigned to: MySQL Verification Team CPU Architecture:Other
Tags: mysqlshell

[30 Jun 2023 8:41] Jyoti Rajai
Description:

The restoration of table separated with help of chunk bytes, doesn't restore in the backup taken order.

In the given scenario the chunk ID 1 was started to restore first before the chunk 0, due to which the table has wrong sequence ( after 8k series 4lakh series is started)( This table is without primary key)

Dump : 

 MySQL  localhost  JS >  util.dumpSchemas(["test"],"/root/backup/test4",{threads :8,chunking:1,bytesPerChunk: '20M'})
Acquiring global read lock
Global read lock acquired
Initializing - done
1 schemas will be dumped and within them 15 tables, 0 views.
Gathering information - done
All transactions have been started
Locking instance for backup
Global read lock has been released
Writing global DDL files
Running data dump using 8 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
103% (15.26M rows / ~14.82M rows), 3.19M rows/s, 156.86 MB/s uncompressed, 6.29 MB/s compressed
Dump duration: 00:00:04s
Total duration: 00:00:04s
Schemas dumped: 1
Tables dumped: 15
Uncompressed data size: 738.17 MB
Compressed data size: 30.23 MB
Compression ratio: 24.4
Rows written: 15264905
Bytes written: 30.23 MB
Average uncompressed throughput: 149.51 MB/s
Average compressed throughput: 6.12 MB/s

Restore : 

 MySQL  localhost  JS > util.loadDump("/root/backup/test4",{ threads: 8 , schema : "test10" , resetProgress: true } )
Loading DDL and Data from '/root/backup/test4' using 8 threads.
Opening dump...
Target is MySQL 8.0.33-0ubuntu0.22.04.2. Dump was produced from MySQL 8.0.33-0ubuntu0.22.04.2
Scanning metadata - done
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done
Executing view DDL - done
Starting data load
6 thds loading / 100% (738.17 MB / 738.17 MB), 320.20 KB/s, 15 / 15 tables done
Recreating indexes - done
Executing common postamble SQL
62 chunks (15.26M rows, 738.17 MB) for 15 tables in 1 schemas were loaded in 21 min 9 sec (avg throughput 586.74 KB/s)
0 warnings were reported during the load.

General Logs : 

2023-06-30T06:28:01.697825Z     268589 Query    /* mysqlsh loadDump(), thread 4, table `test10`.`test_table`, chunk ID: 0 */ LOAD DATA LOCAL INFILE '/root/backup/test4/test@test_table@0.tsv.zst' REPLACE INTO TABLE `test10`.`test_table` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ' ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`eobject_test`, `language_test`, `root_test`, `stream_test`, `position`, `token_test`, `weight`)
2023-06-30T06:28:12.251459Z     268587 Query    /* mysqlsh loadDump(), thread 1, table `test10`.`test_table`, chunk ID: 1 */ LOAD DATA LOCAL INFILE '/root/backup/test4/test@test_table@1.tsv.zst' REPLACE INTO TABLE `test10`.`test_table` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ' ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`eobject_test`, `language_test`, `root_test`, `stream_test`, `position`, `token_test`, `weight`)
2023-06-30T06:29:28.524389Z     268591 Query    /* mysqlsh loadDump(), thread 6, table `test10`.`test_table`, chunk ID: 2 */ LOAD DATA LOCAL INFILE '/root/backup/test4/test@test_table@2.tsv.zst' REPLACE INTO TABLE `test10`.`test_table` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ' ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`eobject_test`, `language_test`, `root_test`, `stream_test`, `position`, `token_test`, `weight`)
2023-06-30T06:29:39.861617Z     268593 Query    /* mysqlsh loadDump(), thread 7, table `test10`.`test_table`, chunk ID: 3 */ LOAD DATA LOCAL INFILE '/root/backup/test4/test@test_table@@3.tsv.zst' REPLACE INTO TABLE `test10`.`test_table` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY '        ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`eobject_test`, `language_test`, `root_test`, `stream_test`, `position`, `token_test`, `weight`)

Records fetched from Table restored : 
eobject_test     language_test    root_test        stream_test      position        token_test       weight
8923    8923    8923    8923    8923    8923    8923
8924    8924    8924    8924    8924    8924    8924
8925    8925    8925    8925    8925    8925    8925
8926    8926    8926    8926    8926    8926    8926
424038  424038  424038  424038  424038  424038  424038
8927    8927    8927    8927    8927    8927    8927
424039  424039  424039  424039  424039  424039  424039
424040  424040  424040  424040  424040  424040  424040
424041  424041  424041  424041  424041  424041  424041
424042  424042  424042  424042  424042  424042  424042
8928    8928    8928    8928    8928    8928    8928
8929    8929    8929    8929    8929    8929    8929
424043  424043  424043  424043  424043  424043  424043
8930    8930    8930    8930    8930    8930    8930
424044  424044  424044  424044  424044  424044  424044
8931    8931    8931    8931    8931    8931    8931
424045  424045  424045  424045  424045  424045  424045
8932    8932    8932    8932    8932    8932    8932
424046  424046  424046  424046  424046  424046  424046
8933    8933    8933    8933    8933    8933    8933
424047  424047  424047  424047  424047  424047  424047
8934    8934    8934    8934    8934    8934    8934
424048  424048  424048  424048  424048  424048  424048
424049  424049  424049  424049  424049  424049  424049
8935    8935    8935    8935    8935    8935    8935
424050  424050  424050  424050  424050  424050  424050
424051  424051  424051  424051  424051  424051  424051
8936    8936    8936    8936    8936    8936    8936
424052  424052  424052  424052  424052  424052  424052
8937    8937    8937    8937    8937    8937    8937
424053  424053  424053  424053  424053  424053  424053

How to repeat:
1. Table having 999999 records with syntax : 
CREATE TABLE `test_table` (
  `eobject_test` bigint NOT NULL,
  `language_test` bigint NOT NULL,
  `root_test` bigint NOT NULL,
  `stream_test` bigint NOT NULL,
  `position` bigint NOT NULL,
  `token_test` bigint NOT NULL,
  `weight` bigint NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 

2. create similar 10-20 tables.

3. Take backup with help of mysqlshell & restore it.
[30 Jun 2023 14:37] MySQL Verification Team
Hi,

I am not sure what are you actually reporting here? What order? How do you test for the order? If you have select without order by the order will be random, it is not preserved nor can be nor will be preserved in any way, three identical selects can return differently ordered result from same table? That is expected behavior?

There is no way I can test your report as

SELECT * FROM test_table;

run 10 times can return 10 differently ordered datasets as there is no ORDER BY in that query. There is nothing in SQL standard that defines how the resultset will be ordered so neither MySQL nor any other RDBMS that follows SQL standard will order that data in any way nor will always return same dataset.

if I do SELECT * FROM test_table ORDER BY root_test; then again "order in table is irrelevant" so?

This is why order of data in a table is irrelevant and this is used by the restore to save time as if order of data in table is irrelevant, why waste time preserving it.
[1 Jul 2023 8:31] Jyoti Rajai
On the source server, number of time I execute the SELECT query the output is received in sequential order only. I believe the backup is also taken in sequential order only. But after parallel restore the sequence of rows have been changed ( as you can see the snapshot). If you repeat this case you will see the same difference.
[1 Jul 2023 10:02] MySQL Verification Team
> On the source server, number of time I execute the SELECT query the output is received in sequential order only.

That is not guaranteed by any standard and is not something you can count on.
[3 Jul 2023 11:56] Jyoti Rajai
But I have shown you in the general log that the chunk id 1 restoration has started before chunk id 0 which causes the difference in order of rows while accessing the table.
( I have restored the data on the same server, and the configuration of the server & table is the same, otherwise, what can cause the difference in the sequence of rows between the source table & restored table?