Bug #108773 Query causes mysql to hang at 'preparing' and query cannot be stopped.
Submitted: 14 Oct 2022 8:30 Modified: 17 Oct 2022 18:50
Reporter: Doug C Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:8.0.31 OS:Linux (Amazon Linux 2022)
Assigned to: CPU Architecture:x86 (AWS r6i)

[14 Oct 2022 8:30] Doug C
Description:
Almost identical query to working query causes MySQL 8.0.30 SQL node to go into some kind of race condition that cannot be aborted.  The bug does not require any data in the database to duplicate the failure.  I purged the database, re-created empty tables, and the same error occurred.  No apparent memory issues, and CPU load goes to about 1.0 making me think the query processor is stuck in some kind of a loop.  A kill <id> from console causes the query to respawn a 2nd thread. A shutdown command issued from mysqladmin will not stop the service. Only a kill -9 pid will stop the MySQL server.   

OS: Amazon Linux 2022 RC2

MySQL Compiled from Source file:
mysql-cluster-gpl-8.0.30.tar.gz

SQL Server node runs on same physical server as the NDB node.  NDB<>SQL are connected via Shared Memory.

Table Structures and server configuration data are included in the attached file.

How to repeat:
This query works perfectly:

SELECT  xl.title,UNIX_TIMESTAMP(xl.whenTime) AS utime,xl.city,xl.state,xl.auctioneer,xl.message,xl.listingID,xl.catalogTitle,xl.street1,xl.street2,xl.uid,NULL as xs,y.day_views,NULL AS oazliveid,xu.privacy,if (xu.privacy=0,xu.email,NULL) as email,ai.aucid,y.ave_views,y.advtag, 1 AS exp,xl.azliveid FROM ( SELECT c.listingid,if(l.imagecount>=1,if(DATEDIFF(l.whentime,l.creation)<=360,1,0),0) AS xsort ,c.ave_views ,c.iday_views ,c.day_views ,c.advtag ,u.abuse FROM lviewcount AS c LEFT JOIN listings AS l USING (listingid) LEFT JOIN users AS u USING (uid) WHERE c.listingid IN  ( SELECT listingID FROM listings WHERE lactive=1 AND publish=1 AND msglength>10 AND creation<DATE_SUB(now(),INTERVAL 10 DAY) AND published<DATE_SUB(now(),INTERVAL 10 DAY) AND EXTRACT(SECOND from whentime) = 0 ) AND c.iday_views>=0 AND u.abuse=0 ORDER BY xsort DESC,c.ave_views DESC,c.iday_views DESC LIMIT 6 ) AS y LEFT JOIN listings AS xl USING (listingID) LEFT JOIN users AS xu USING (uid) LEFT JOIN auctioneerid AS ai USING (uid)

The modified query below hangs at preparing and cannot be killed. Same query executes fine in Mysql 5.6 on InnoDB engine.  Killing the query will cause it to respawn the same query in a new thread (see processlist below):

SELECT  1 AS site ,'/var/www/html/includes/cache/clist-5t0f3' AS t5key ,DATE_ADD(NOW(),INTERVAL 10 MINUTE) AS expires ,x.listingid AS listingid FROM ( SELECT  xl.title,UNIX_TIMESTAMP(xl.whenTime) AS utime,xl.city,xl.state,xl.auctioneer,xl.message,xl.listingID,xl.catalogTitle,xl.street1,xl.street2,xl.uid,NULL as xs,y.day_views,NULL AS oazliveid,xu.privacy,if (xu.privacy=0,xu.email,NULL) as email,ai.aucid,y.ave_views,y.advtag, 1 AS exp,xl.azliveid FROM ( SELECT c.listingid,if(l.imagecount>=1,if(DATEDIFF(l.whentime,l.creation)<=360,1,0),0) AS xsort ,c.ave_views ,c.iday_views ,c.day_views ,c.advtag ,u.abuse FROM lviewcount AS c LEFT JOIN listings AS l USING (listingid) LEFT JOIN users AS u USING (uid) WHERE c.listingid IN  ( SELECT listingID FROM listings WHERE lactive=1 AND publish=1 AND msglength>10 AND creation<DATE_SUB(now(),INTERVAL 10 DAY) AND published<DATE_SUB(now(),INTERVAL 10 DAY) AND EXTRACT(SECOND from whentime) = 0 ) AND c.iday_views>=0 AND u.abuse=0 ORDER BY xsort DESC,c.ave_views DESC,c.iday_views DESC LIMIT 6 ) AS y LEFT JOIN listings AS xl USING (listingID) LEFT JOIN users AS xu USING (uid) LEFT JOIN auctioneerid AS ai USING (uid) ) AS x 

*************************** 3. row ***************************
     Id: 16
   User: root
   Host: localhost
     db: auctions
Command: Killed
   Time: 1044
  State: preparing
   Info: SELECT  1 AS site ,'/var/www/html/includes/cache/clist-5t0f3' AS t5key ,DATE_ADD(NOW(),INTERVAL 10 MINUTE) AS expires ,x.listingid AS listingid FROM ( SELECT  xl.title,UNIX_TIMESTAMP(xl.whenTime) AS utime,xl.city,xl.state,xl.auctioneer,xl.message,xl.listingID,xl.catalogTitle,xl.street1,xl.street2,xl.uid,NULL as xs,y.day_views,NULL AS oazliveid,xu.privacy,if (xu.privacy=0,xu.email,NULL) as email,ai.aucid,y.ave_views,y.advtag, 1 AS exp,xl.azliveid FROM ( SELECT c.listingid,if(l.imagecount>=10,if(DATEDIFF(l.whentime,l.creation)<=60,1,0),0) AS xsort ,c.ave_views ,c.iday_views ,c.day_views ,c.advtag ,u.abuse FROM lviewcount AS c LEFT JOIN listings AS l USING (listingid) LEFT JOIN users AS u USING (uid) WHERE c.listingid IN  ( SELECT listingID FROM listings WHERE lactive=1 AND publish=1 AND msglength>150 AND creation<DATE_SUB(now(),INTERVAL 5 DAY) AND published<DATE_SUB(now(),INTERVAL 5 DAY) AND EXTRACT(SECOND from whentime) = 0 ) AND c.iday_views>=0 AND u.abuse=0 ORDER BY xsort DESC,c.ave_views DESC,c.iday_views DESC LIMIT 5 ) AS y LEFT JOIN listings AS xl USING (listingID) LEFT JOIN users AS xu USING (uid) LEFT JOIN auctioneerid AS ai USING (uid) ) AS x
*************************** 4. row ***************************
     Id: 46
   User: root
   Host: localhost
     db: auctions
Command: Query
   Time: 0
  State: init
   Info: show full processlist
*************************** 5. row ***************************
     Id: 251
   User: root
   Host: localhost
     db: auctions
Command: Query
   Time: 477
  State: preparing
   Info: SELECT  1 AS site ,'/var/www/html/includes/cache/clist-5t0f3' AS t5key ,DATE_ADD(NOW(),INTERVAL 10 MINUTE) AS expires ,x.listingid AS listingid FROM ( SELECT  xl.title,UNIX_TIMESTAMP(xl.whenTime) AS utime,xl.city,xl.state,xl.auctioneer,xl.message,xl.listingID,xl.catalogTitle,xl.street1,xl.street2,xl.uid,NULL as xs,y.day_views,NULL AS oazliveid,xu.privacy,if (xu.privacy=0,xu.email,NULL) as email,ai.aucid,y.ave_views,y.advtag, 1 AS exp,xl.azliveid FROM ( SELECT c.listingid,if(l.imagecount>=10,if(DATEDIFF(l.whentime,l.creation)<=60,1,0),0) AS xsort ,c.ave_views ,c.iday_views ,c.day_views ,c.advtag ,u.abuse FROM lviewcount AS c LEFT JOIN listings AS l USING (listingid) LEFT JOIN users AS u USING (uid) WHERE c.listingid IN  ( SELECT listingID FROM listings WHERE lactive=1 AND publish=1 AND msglength>150 AND creation<DATE_SUB(now(),INTERVAL 5 DAY) AND published<DATE_SUB(now(),INTERVAL 5 DAY) AND EXTRACT(SECOND from whentime) = 0 ) AND c.iday_views>=0 AND u.abuse=0 ORDER BY xsort DESC,c.ave_views DESC,c.iday_views DESC LIMIT 5 ) AS y LEFT JOIN listings AS xl USING (listingID) LEFT JOIN users AS xu USING (uid) LEFT JOIN auctioneerid AS ai USING (uid) ) AS x

Suggested fix:
Find looping code and fix to continue execution or fail with appropriate error message.
[15 Oct 2022 19:31] Doug C
I ran into this again in a more complex version of this query that I was testing.  I noticed that the trigger appears be pulling a few columns from a group of nested  sub selects/IN selects that slowly grow to a larger number of columns in the final sub select.
[17 Oct 2022 18:50] MySQL Verification Team
Hi,
Thanks a lot for the report and the test case.

Verified with 8.0.31 

*************************** 3. row ***************************
     Id: 10
   User: msandbox
   Host: localhost
     db: test
Command: Query
   Time: 46
  State: preparing
   Info: SELECT  1 AS site ,'/var/www/html/includes/cache/clist-5t0f3' AS t5key ,DATE_ADD(NOW(),INTERVAL 10 MINUTE) AS expires ,x.listingid AS listingid FROM ( SELECT  xl.title,UNIX_TIMESTAMP(xl.whenTime) AS utime,xl.city,xl.state,xl.auctioneer,xl.message,xl.listingID,xl.catalogTitle,xl.street1,xl.street2,xl.uid,NULL as xs,y.day_views,NULL AS oazliveid,xu.privacy,if (xu.privacy=0,xu.email,NULL) as email,ai.aucid,y.ave_views,y.advtag, 1 AS exp,xl.azliveid FROM ( SELECT c.listingid,if(l.imagecount>=1,if(DATEDIFF(l.whentime,l.creation)<=360,1,0),0) AS xsort ,c.ave_views ,c.iday_views ,c.day_views ,c.advtag ,u.abuse FROM lviewcount AS c LEFT JOIN listings AS l USING (listingid) LEFT JOIN users AS u USING (uid) WHERE c.listingid IN  ( SELECT listingID FROM listings WHERE lactive=1 AND publish=1 AND msglength>10 AND creation<DATE_SUB(now(),INTERVAL 10 DAY) AND published<DATE_SUB(now(),INTERVAL 10 DAY) AND EXTRACT(SECOND from whentime) = 0 ) AND c.iday_views>=0 AND u.abuse=0 ORDER BY xsort DESC,c.ave_views DESC,c.iday_views DESC LIMIT 6 ) AS y LEFT JOIN listings AS xl USING (listingID) LEFT JOIN users AS xu USING (uid) LEFT JOIN auctioneerid AS ai USING (uid) ) AS x

NOTE: works ok on regular setup with innodb, fails only on cluster with ndbcluster tables
[20 Oct 2022 13:39] Ole John Aske
Posted by developer:
 
Bug reproduced as described, with empty tables

Note that tables are created with the default timestamp values '0000-00-00 00:00:00'.
This is not a legal timestamp value as such, and not allowed with the default
sql_mode which set 'NO_ZERO_DATE'. Thus the config need to override this, like
in the supplied config which specifies:

sql_mode= 'ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'
('NO_ZERO_DATE' is not set)
[20 Oct 2022 17:49] Ole John Aske
Posted by developer:
 
The root cause for the 'hang' is that we get stuck in a while-loop in ::is_pushable_with_root(). That loop is intended to un-roll the join nest structures as part of analyzing the inner-/outer-join structure of the query.

....Might be some different call stack variants in the uppermost levels.....
#4  0x0000000007038a0f in ndb_pushed_builder_ctx::is_pushable_with_root (this=0x7fffa84ebee8) at /home/oleja/mysql-tmp/storage/ndb/plugin/ha_ndbcluster_push.cc:813
#5  0x0000000007037e0b in ndb_pushed_builder_ctx::make_pushed_join (this=0x7fffa84ebee8, pushed_join=@0x7fffa84ed4a8: 0x0) at /home/oleja/mysql-tmp/storage/ndb/plugin/ha_ndbcluster_push.cc:524
#6  0x000000000703a790 in ndb_pushed_builder_ctx::make_pushed_join (thd_ndb=0x7ffed40c45e0, plan=...) at /home/oleja/mysql-tmp/storage/ndb/plugin/ha_ndbcluster_push.cc:564
#7  0x0000000007007171 in ndbcluster_push_to_engine (thd=0x7ffed4001040, root_path=0x7ffed4c12a40, join=0x7ffed4c01148) at /home/oleja/mysql-tmp/storage/ndb/plugin/ha_ndbcluster.cc:14520
#8  0x00000000051455b6 in JOIN::push_to_engines (this=0x7ffed4c01148) at /home/oleja/mysql-tmp/sql/sql_optimizer.cc:1106
#9  0x000000000513ca9a in JOIN::optimize (this=0x7ffed4c01148, finalize_access_paths=true) at /home/oleja/mysql-tmp/sql/sql_optimizer.cc:1020
#10 0x000000000528ecc9 in Query_block::optimize (this=0x7ffed4b0a848, thd=0x7ffed4001040, finalize_access_paths=true) at /home/oleja/mysql-tmp/sql/sql_select.cc:1822
#11 0x0000000005355998 in Query_expression::optimize (this=0x7ffed4b0a760, thd=0x7ffed4001040, materialize_destination=0x0, create_iterators=true, finalize_access_paths=true)
    at /home/oleja/mysql-tmp/sql/sql_union.cc:1003
#12 0x000000000528c47e in Sql_cmd_dml::execute_inner (this=0x7ffed4bab8b0, thd=0x7ffed4001040) at /home/oleja/mysql-tmp/sql/sql_select.cc:772
#13 0x000000000528b957 in Sql_cmd_dml::execute (this=0x7ffed4bab8b0, thd=0x7ffed4001040) at /home/oleja/mysql-tmp/sql/sql_select.cc:587
#14 0x00000000051ee99b in mysql_execute_command (thd=0x7ffed4001040, first_level=true) at /home/oleja/mysql-tmp/sql/sql_parse.cc:4677
#15 0x00000000051e5be3 in dispatch_sql_command (thd=0x7ffed4001040, parser_state=0x7fffa84f2a78) at /home/oleja/mysql-tmp/sql/sql_parse.cc:5312
#16 0x00000000051e18e5 in dispatch_command (thd=0x7ffed4001040, com_data=0x7fffa84f3a28, command=COM_QUERY) at /home/oleja/mysql-tmp/sql/sql_parse.cc:2032
#17 0x00000000051e439f in do_command (thd=0x7ffed4001040) at /home/oleja/mysql-tmp/sql/sql_parse.cc:1435
#18 0x0000000005474634 in handle_connection (arg=0xc55f270) at /home/oleja/mysql-tmp/sql/conn_handler/connection_handler_per_thread.cc:302
#19 0x0000000007399f94 in pfs_spawn_thread (arg=0xa65ffa0) at /home/oleja/mysql-tmp/storage/perfschema/pfs.cc:2986
#20 0x00007ffff764d947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#21 0x00007ffff76dda44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
[31 Oct 2022 12:56] MySQL Verification Team
Thanks a lot, Mr. Aske.