Bug #94911 | ndbcluster shutdown when dropping large table | ||
---|---|---|---|
Submitted: | 4 Apr 2019 18:26 | Modified: | 16 May 2019 22:25 |
Reporter: | Shawn Hogan | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 7.6.8 | OS: | SUSE |
Assigned to: | MySQL Verification Team | CPU Architecture: | x86 |
[4 Apr 2019 18:26]
Shawn Hogan
[5 Apr 2019 16:52]
MySQL Verification Team
Hi Shawn, ndbcluster is in-memory system and all transactions need to happen in memory, I doubt you have configured system to handle 3.5G big transaction (that happens when you delete 3.5G of data in single statement) so that's why you have a crash. Yes, you are right, proper way to do it is to delete smaller chunks, but even that you have to do in separate transactions, if you do begin, delete, delete, delete... commit, you will again have crash! You have to do begin, delete, commit, begin, delete, commit... Now with regards to the reported issue, I'm bit concerned with the actual error reported so I might log this as a bug in that respect (the crash error you see in log should say that you don't have proper configuration rather then to just segfault), but for further analysis of this report I will need a full ndb_error_report or at least your config file kind regards Bogdan
[5 Apr 2019 20:42]
Shawn Hogan
Got the cluster back online and it happened again. Except this time it was caused by an ALTER TABLE on a small table (4 columns, zero records). At that point I brought the cluster back online again, then upgraded to 7.6.9 in case it was some bug that was fixed. I also went ahead and did --initial restarts of the data nodes for good measure. It *seemed* to be okay for awhile... even was fine dropping the 3.2GB table as well as another 1.6GB one (figured it was time to clean up and purge some stuff). A few hours later the entire cluster came down again, although not sure what it was exactly that time (maybe was a process that once an hour TRUNCATES) a small table (less than 50 small records in it). Brought the cluster back online again and started digging through logs. It appears it was the LCP watchdog process that killed all data nodes at the same instant. I'm not entirely sure the internals of how that works, but I assume it's a per node watchdog. If that's the case, I have a hard time believing that all 8 data nodes had I/O problems concurrently at the same second in the day. Each node has 6x Seagate Savvio drives in a hardware RAID-6. Not SSDs, but enterprise class drives: https://www.seagate.com/support/internal-hard-drives/enterprise-hard-drives/savvio-10k/#sp... Checked all 48 drives to see if any were giving any errors at the hardware level and all were fine. Did a real world test of reading large files from one node and writing to another through the network (everything is interconnected with 56Gbit Infiniband) and was getting throughput of around 1.2GB/sec. Watching servers io processes and I don't see anything other than ndbmtd doing it's normal stuff (LCP writing is the most). After all 8 data nodes being killed by the LCP watchdog 3x in a single day, I ended up disabling the watchdog (LcpScanProgressTimeout=0 now). Nothing has changed with the servers or config or load and stuff has been humming along fine for years. Servers are under light load. Each server has 2 CPU sockets, with each running 8-core Xeons. lscpu output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Stepping: 7 CPU MHz: 1239.574 CPU max MHz: 3500.0000 CPU min MHz: 1200.0000 BogoMIPS: 5401.54 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Server loads never goes over 2.00, and typically hover between 0.50 and 1.25.
[5 Apr 2019 20:47]
Shawn Hogan
Current config file (the only thing I changed since the crashes was adding the LcpScanProgressTimeout=0 option (left it with commented out stuff... some of that stuff is from temporary changes for redo logs when doing backup restores or loading a ton of data at once). ------ [tcp default] SendBufferMemory=8M ReceiveBufferMemory=8M [ndbd default] NoOfReplicas= 2 datadir=/var/mysql-cluster DataMemory=100G #IndexMemory=10G MaxNoOfFiredTriggers=12000 TransactionDeadlockDetectionTimeout=10000 ##TimeBetweenLocalCheckpoints = 10 TimeBetweenGlobalCheckpoints=10000 ### Params for REDO LOG #FragmentLogFileSize=256M #InitFragmentLogFiles=SPARSE #NoOfFragmentLogFiles=756 #RedoBuffer=48M FragmentLogFileSize=64M InitFragmentLogFiles=SPARSE NoOfFragmentLogFiles=64 #NoOfFragmentLogFiles=256 #TimeBetweenLocalCheckpoints=10 RedoBuffer=64M ## MySQL Cluster 7.3 ## DiskCheckpointSpeed=20M ## DiskCheckpointSpeedInRestart=250M ## MySQL Cluster 7.4 MinDiskWriteSpeed=40M MaxDiskWriteSpeed=80M MaxDiskWriteSpeedOtherNodeRestart=200M MaxDiskWriteSpeedOwnRestart=400M ## watchdog disabled LcpScanProgressTimeout=0 ODirect=1 MaxNoOfConcurrentScans=500 MaxNoOfLocalScans=100000 LongMessageBuffer=32M MaxNoOfExecutionThreads=8 LockPagesInMainMemory=1 MaxNoOfConcurrentOperations=1000000 MaxNoOfConcurrentTransactions=65536 MaxNoOfTables=4096 ## MaxAllocate=64M MaxNoOfOrderedIndexes=4096 MaxNoOfUniqueHashIndexes=2048 MaxNoOfAttributes=40480 MaxNoOfTriggers=10240 BuildIndexThreads=4 ## RealTimeScheduler=1
[5 Apr 2019 20:48]
Shawn Hogan
Oh, forgot to mention that each node has 256GB RAM, with more than 100GB free (even after allocating 100GB to data nodes).
[11 Apr 2019 19:54]
MySQL Verification Team
Hi, ndb_error_reporter would give us some additional info, especially that crash on that alter that was on a miniature table. I believe your cluster is not properly configured, for start Shared global memory is default, and that's small for cluster of this size. https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-s... and I really think you would benefit from a session with MySQL Cluster Support team. all best Bogdan
[13 Apr 2019 2:31]
Shawn Hogan
Is a 128M SharedGlobalMemory small on a cluster with no disk data, tablespaces, etc.? I figured the default was a good starting point when the documentation seems to imply it's more for stuff we don't use. mysql> SELECT * FROM information_schema.TABLESPACES; Empty set (0.00 sec) We also don't have any log files defined. The only thing in information_schema.FILES is the default stuff (the 23 records in there all are for ENGINE InnoDB, which we don't use for anything beyond whatever mysql sets up by default). mysql> SELECT * FROM information_schema.FILES WHERE ENGINE != 'InnoDB'; Empty set (0.00 sec) Truth be told, ndbcluster does what it does pretty well (when the watchdog doesn't kill all the data nodes concurrently.. haha), so we've stayed away from trying to use disk tables with it. Although we do have about 30GB of rarely used/accessed binary data being stored in the cluster that we would love to store on disk. But it is what it is... hasn't been a priority to try and make ndbcluster do what it really wasn't designed for. As far as the error logs and trace files, I'll see about getting them.
[17 May 2019 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".