Bug #48302 Major performance regression in RQG when using --mask
Submitted: 26 Oct 2009 11:43 Modified: 26 Oct 2009 13:32
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: Random Query Generator Severity:S2 (Serious)
Version:2.1 OS:Any
Assigned to: Bernt Marius Johnsen CPU Architecture:Any

[26 Oct 2009 11:43] Philip Stoev
Description:
The new implementation for --mask and --mask-level cause an almost 10-fold performance regression in query generation.

This is because the masking is calculated for each SQL query, even though --mask and --mask-level does not change during the lifetime of the test.

We should be very careful about performance regressions in the RQG, since such a regression *silently* turns a stress test into a test where perl is exercised 100% on a all CPUs, and mysql remains almost idle.

How to repeat:
[root@fedora10 randgen]# time perl gensql.pl --dsn=dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test --queries=10000 --grammar=conf/WL5004_sql.yy > /dev/null

real    0m4.789s
user    0m4.645s
sys     0m0.126s
[root@fedora10 randgen]# time perl gensql.pl --dsn=dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test --queries=10000 --grammar=conf/WL5004_sql.yy --mask=1 > /dev/null

real    0m30.792s
user    0m30.631s
sys     0m0.122s
[root@fedora10 randgen]# time perl gensql.pl --dsn=dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test --queries=10000 --grammar=conf/WL5004_sql.yy --mask=2 > /dev/null

real    0m32.064s
user    0m31.883s
sys     0m0.157s
[root@fedora10 randgen]# time perl gensql.pl --dsn=dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test --queries=10000 --grammar=conf/WL5004_sql.yy --mask=3 > /dev/null

real    0m30.630s
user    0m30.511s
sys     0m0.103s

Suggested fix:
A better solution would be to perform masking only once and pass the already masked grammar to FromGrammar.pm . Alternatively, some form of caching would also work.
[26 Oct 2009 13:14] Bernt Marius Johnsen
Move masking from next() to new(). But, this implies that treadN_init or query_init is not dependent on the rules used by query or threadN, bu I assuem that's ok.
[26 Oct 2009 13:18] Philip Stoev
This grammar is possible and should not be impacted by a fix:

# Issue a lot of creates in the beginning to populate the namespace more densely.

query_init:
 create ; create ; create ; create ; create ; create ; create ;

query:
 create ; ddl ; dml ;
[26 Oct 2009 13:28] Bernt Marius Johnsen
Ok. Then we add a masked grammar if mask is specified. The masked grammar is generated if needed and used for query/thraedN while the original grammar is used for query_init/threadN_init