Bug #96944 Default value of "threads" option of "Parallel Table Import Utility"
Submitted: 19 Sep 2019 9:35 Modified: 7 Oct 2019 10:59
Reporter: MAKOTO FUKUMOTO Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Document Store: MySQL Shell Severity:S3 (Non-critical)
Version:8.0.17 OS:CentOS
Assigned to: CPU Architecture:Any
Tags: mysql shell

[19 Sep 2019 9:35] MAKOTO FUKUMOTO
Description:
About "Parallel Table Import Utility" in "MySQL Shell Utilities" In the following document, "threads" defaults to 8.

https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-parallel-table.html

> threads: number
> Use this number of parallel threads to send the data in the input file to the target server. The default is 8 threads.

However, when it was actually run, it appeared to be 4.

```
 MySQL  localhost  JS > util.importTable("/var/lib/mysql-files/load.csv", {schema: "xxx", table: "t1", dialect: "csv-unix", showProgress: true})
Importing from file '/var/lib/mysql-files/load.csv' to table `xxx`.`t1` in MySQL Server at /var%2Flib%2Fmysql%2Fmysql.sock using 4 threads
[Worker002] blog.t1: Records: 420874  Deleted: 0  Skipped: 0  Warnings: 0
[Worker003] blog.t1: Records: 1515152  Deleted: 0  Skipped: 0  Warnings: 0
[Worker000] blog.t1: Records: 1515152  Deleted: 0  Skipped: 0  Warnings: 0
[Worker001] blog.t1: Records: 1548822  Deleted: 0  Skipped: 0  Warnings: 0
```

Is the content of the document correct?

How to repeat:
Described in "Description".
[19 Sep 2019 11:25] MySQL Verification Team
Hello MAKOTO-San,

Thank you for the report and feedback.

regards,
Umesh
[7 Oct 2019 10:59] Krzysztof Grzadziel
Posted by developer:
 
There is no reason to spawn more import threads than needed.

Relation between threads, bytesPerChunk and imported file size is following:

min{max{1, threads}, chunks}}

where:
 - threads - number of threads
 - chunks - (file_size / bytesPerChunk) + 1

i.e. we need at least one thread, but no more threads than file chunks.

You see 4 spawned threads because your import file has size roughly between 150M and 200M (with default bytesPerChunk equal to 50M).

It is not a bug, but it would be nice to have relation formula in documentation.