Bug #79585 mtr limits parallel tests to 50
Submitted: 10 Dec 2015 6:13 Modified: 16 Aug 2016 3:54
Reporter: Daniel Black Email Updates:
Status: Closed Impact on me:
Category:Tools: MTR / mysql-test-run Severity:S3 (Non-critical)
Version:5.6.28, 5.7.10 OS:Any
Assigned to: CPU Architecture:Any

[10 Dec 2015 6:13] Daniel Black
Parallel workers of the mysql-test-run has an artificial limit of 50 workers. On some hardware more CPUs are present. Running on such a hardware bot runs out of unique build thread ids

build@2b2bb0db98a7:~/build/mysql-test$ export MTR_MAX_PARALLEL=900
build@2b2bb0db98a7:~/build/mysql-test$ export MTR_PARALLEL=auto
build@2b2bb0db98a7:~/build/mysql-test$ ./mtr 2>&1 | tee /tmp/mtr.log
2015-12-10 03:33:53 0 [Note] /build/build/sql/mysqld (mysqld 5.6.27) starting as process 78806 ...
2015-12-10 03:33:53 78806 [Note] Plugin 'FEDERATED' is disabled.
2015-12-10 03:33:53 78806 [Note] Binlog end
2015-12-10 03:33:53 78806 [Note] Shutting down plugin 'MyISAM'
2015-12-10 03:33:53 78806 [Note] Shutting down plugin 'CSV'
MySQL Version 5.6.27
Checking supported features...
 - SSL connections supported
Using suites: main,sys_vars,binlog,federated,rpl,innodb,innodb_fts,innodb_zip,perfschema,funcs_1,opt_trace,parts,auth_sec
Collecting tests...
 - adding combinations for binlog
 - adding combinations for rpl
Checking leftover processes...
Removing old var directory...
Creating var directory '/build/build/mysql-test/var'...
Installing system database...
Using parallel: 161
worker[2] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
worker[3] Using MTR_BUILD_THREAD 304, with reserved ports 13040..13049
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 13030..13039
worker[14] Using MTR_BUILD_THREAD 305, with reserved ports 13050..13059
worker[6] Using MTR_BUILD_THREAD 302, with reserved ports 13020..13029
worker[1] Using MTR_BUILD_THREAD 301, with reserved ports 13010..13019
worker[15] Using MTR_BUILD_THREAD 306, with reserved ports 13060..13069
worker[38] Using MTR_BUILD_THREAD 348, with reserved ports 13480..13489
worker[35] Using MTR_BUILD_THREAD 345, with reserved ports 13450..13459
worker[44] mysql-test-run: *** ERROR: Could not get a unique build thread id
worker[49] mysql-test-run: *** ERROR: Could not get a unique build thread id
worker[74] mysql-test-run: *** ERROR: Could not get a unique build thread id
worker[75] mysql-test-run: *** ERROR: Could not get a unique build thread id
worker[63] mysql-test-run: *** ERROR: Could not get a unique build thread id

How to repeat:
export MTR_PARALLEL=80

Suggested fix:
patch as per https://github.com/mysql/mysql-server/pull/33

diff --git a/mysql-test/mysql-test-run.pl b/mysql-test/mysql-test-run.pl
index 26b31d3..d3c222a 100755
--- a/mysql-test/mysql-test-run.pl
+++ b/mysql-test/mysql-test-run.pl
@@ -1801,9 +1801,12 @@ sub set_build_thread_ports($) {
   if ( lc($opt_build_thread) eq 'auto' ) {
     my $found_free = 0;
     $build_thread = 300;       # Start attempts from here
+    my $build_thread_upper = $build_thread + ($opt_parallel > 49
+                                              ? $opt_parallel
+                                              : 49);
     while (! $found_free)
-      $build_thread= mtr_get_unique_id($build_thread, 349);
+      $build_thread= mtr_get_unique_id($build_thread, $build_thread_upper);
       if ( !defined $build_thread ) {
         mtr_error("Could not get a unique build thread id");
[10 Dec 2015 6:14] Daniel Black
OCA was submitted today
[10 Dec 2015 7:24] MySQL Verification Team
Hello Daniel,

Thank you for the report and contribution.

[10 Dec 2015 10:00] Bjørn Munch
Your patch looks good but I do have a comment:

The max. argument to mtr_get_unique_id() is the highest ID it checks. If your parallel setting is 49 or higher, the number of IDs checked is equal to the parallel setting (possibly +1). So if any one of those is taken or if any of the ports in the range are, you may run out. So you should probably add some overhead, maybe add $opt_parallel / 10 or something like that.

Even so, there will have to be an absolute upper limit. I suppose port numbers up to 32000 at least will be valid on any OS, this will correspond to build_thread_id of 2199. Then we can reconsider when it becomes realistic to run more that 2000 mtr threads. :-)
[10 Dec 2015 10:03] Bjørn Munch
I notice this was reported against 5.6.29 and 5.7.11. Those versions do not exist yet and could even end up including your patch. I suggest changing to 5.6.28 and 5.7.10 which were both released a few days ago.
[11 Dec 2015 7:35] Daniel Black
$opt_parallel / 10  effectively cuts down the maximium even specified by user. I'll look at the off by 1 description however I didnt' see a problem.

I've generally noticed the tests are full of waiting on conditions and sleeps so even the "auto" number of parallel defaulting to the number of cores isn't straining all the cpus to achieve its goal.
[14 Dec 2015 9:03] Bjørn Munch
What I meant was to add e.g. $opt_parallel / 10 in *addition* to what was already suggested.
[17 Dec 2015 22:55] Daniel Black
no bogomips and raise MTR_MAX_PARALLEL default

Attachment: no-bogomips-raise-MTR_MAX_PARALLEL-default.patch (text/x-patch), 686 bytes.

[17 Dec 2015 22:59] Daniel Black
Ah ok. follow now. Since the limit is now 2K ish lets raise the default max (should be safe for the next few years). bogomips seems to be a bit of a hack and most build machines are over this spec anyway so patch removes it.

Not sure who disliked Windows VMs when this was written but I haven't tested it so I'm leaving it as is :-)
[15 Aug 2016 15:32] Paul DuBois
Posted by developer:
Noted in 5.6.33, 5.7.15 changelogs.

In mysql-test-run.pl, a limit of 50 was imposed on the number of 
workers for parallel testing, which on systems with more than 50 CPUs
resulted in exhaustion of unique thread IDs. The ID-exhaustion
problem has been corrected, and the limit of 50 on number of workers
has been lifted. Additionally, these changes were made:

* To avoid idle workers, the number of parallel workers now is limited
to the number of tests.

* Previously, if --parallel=auto was given and the MTR_MAX_PARALLEL
environment variable was not set, a limit of 8 was imposed on the
number of parallel workers. This limit has been lifted.
[15 Aug 2016 16:23] Paul DuBois
Thanks to Daniel Black for the patch on which this change was based.
[16 Aug 2016 3:54] Daniel Black
Thanks for the commit Paul.

On linux the number of tests is still limited by fs.aio-max-nr however I never got time to work out how many was really needed.