| Bug #90606 | Segmentation fault in ndbd.cpp | ||
|---|---|---|---|
| Submitted: | 24 Apr 2018 14:03 | Modified: | 5 Aug 2018 20:19 | 
| Reporter: | Daniël van Eeden (OCA) | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) | 
| Version: | 7.6.4 | OS: | Any | 
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any | 
   [24 Apr 2018 14:03]
   Daniël van Eeden        
  
 
   [26 Apr 2018 7:30]
   Mikael Ronström        
  The most likely cause of this segfault is that the variable m_dirty_pages contains an illegal value. To be able to track this easier if it happens again I have added more information about code running in this case at crash time. I have also added a bunch of new asserts in production code to ensure that we hit any anomalies in the code at the root rather than at its usage. Will also see if we can reproduce this bug by combining node restarts with drops of disk data tables.
   [30 Apr 2018 14:39]
   MySQL Verification Team        
  Changing to feedback for the comments done by Mikael.
   [1 Jun 2018 1:00]
   Bugs System        
  No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
   [4 Jul 2018 10:46]
   Daniël van Eeden        
  The cluster failed because we lost network connectivity for a bit.
Then starting the cluster failed.
Not 100% sure this is the same bug, but it looks related.
Time: Wednesday 4 July 2018 - 12:34:04
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 10895 thr: 4
Version: mysql-5.7.22 ndb-7.6.6
Trace file name: ndb_28_trace.log.12_t4
Trace file path: /var/lib/mysql-cluster/ndb_28_trace.log.12 [t1..t17]
***EOM***
2018-07-04 12:34:04 [ndbd] INFO     -- LDM instance 3: Restored T288F301 LCP 103423 rows, 184485 row operations, 1291 millis, 142900 row operations/sec)
2018-07-04 12:34:04 [ndbd] INFO     -- Received signal 11. Running error handler.
2018-07-04 12:34:04 [ndbd] INFO     -- Signal 11 received; Segmentation fault
2018-07-04 12:34:04 [ndbd] INFO     -- /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp
2018-07-04 12:34:04 [ndbd] INFO     -- Error handler signal shutting down system
2018-07-04 12:34:04 [ndbd] INFO     -- Error handler shutdown completed - aborting
2018-07-04 12:34:19 [ndbd] INFO     -- Child process terminated by signal 6
2018-07-04 12:34:19 [ndbd] ALERT    -- Node 28: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
(gdb) bt
#0  0x00007f9f22c03277 in raise () from /lib64/libc.so.6
#1  0x00007f9f22c04968 in abort () from /lib64/libc.so.6
#2  0x00000000004056b3 in childAbort (error_code=error_code@entry=6000, currentStartPhase=4, exit_code=-1)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp:615
#3  0x000000000041338d in NdbShutdown (error_code=error_code@entry=6000, type=NST_ErrorHandlerSignal, restartType=restartType@entry=NRT_Default)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp:1273
#4  0x00000000006d678c in ErrorReporter::handleError (messageID=messageID@entry=6000, problemData=problemData@entry=0x7f896914a350 "Signal 11 received; Segmentation fault", 
    objRef=objRef@entry=0x7a0ce8 "/export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp", nst=nst@entry=NST_ErrorHandlerSignal)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/error/ErrorReporter.cpp:280
#5  0x0000000000412ed5 in handler_error (signum=11) at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/ndbd.cpp:663
#6  <signal handler called>
#7  0x000000000043ea28 in Restore::prepare_parts_for_execution (this=this@entry=0x18b8f30, signal=signal@entry=0x7f896914b4c0, file_ptr=...)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/restore.cpp:2061
#8  0x000000000043ec7c in Restore::open_data_file (this=0x18b8f30, signal=0x7f896914b4c0, file_ptr=...)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/restore.cpp:2148
#9  0x000000000043f205 in Restore::execFSREMOVECONF (this=<optimized out>, signal=<optimized out>)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/restore.cpp:697
#10 0x0000000000774b60 in executeFunction (f=<optimized out>, signal=0x7f896914b4c0, gsn=<optimized out>, this=0x18b8f30)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/vm/SimulatedBlock.hpp:1585
#11 executeFunction_async (signal=0x7f896914b4c0, gsn=<optimized out>, this=0x18b8f30)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/vm/SimulatedBlock.hpp:1552
#12 execute_signals (max_signals=262080, sig=0x7f896914b4c0, r=0x7f9f21eb1220, h=0x7f9f21ea8100, q=0x7f9f21ea8000, selfptr=0x7f9f21ea7f80)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/vm/mt.cpp:5559
#13 run_job_buffers (selfptr=selfptr@entry=0x7f9f21ea7dc0, sig=sig@entry=0x7f896914b4c0, send_sum=@0x7f896914ab80: 0, flush_sum=@0x7f896914ab90: 0, pending_send=@0x7f896914ab70: true)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/vm/mt.cpp:5606
#14 0x0000000000776f6e in mt_job_thread_main (thr_arg=<optimized out>)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/vm/mt.cpp:6576
#15 0x000000000070a2ef in ndb_thread_wrapper (_ss=0xe86600)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/common/portlib/NdbThread.c:258
#16 0x00007f9f23bcde25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f9f22ccbbad in clone () from /lib64/libc.so.6
 
   [4 Jul 2018 14:04]
   Daniël van Eeden        
  (gdb) frame
#7  0x000000000043ea28 in Restore::prepare_parts_for_execution (this=this@entry=0x1878940, signal=signal@entry=0x7f2d94ad02c0, file_ptr=...)
    at /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/restore.cpp:2061
2061    in /export/home2/pb2/build/sb_1-28710439-1526985206.38/rpm/BUILD/mysql-cluster-gpl-7.6.6/mysql-cluster-gpl-7.6.6/storage/ndb/src/kernel/blocks/restore.cpp
(gdb) info locals
j = 0
partPair = {
  startPart = 17486, 
  numParts = 16962
}
part_id = 17486
i = 2003
lcpCtlFilePtr = 0x1894130
(gdb) print *lcpCtlFilePtr
$26 = {
  fileHeader = {
    Magic = "NDBBCKUP", 
    BackupVersion = 460292, 
    SectionType = 1, 
    SectionLength = 9, 
    FileType = 6, 
    BackupId = 0, 
    BackupKey_0 = 0, 
    BackupKey_1 = 0, 
    ByteOrder = 305419896, 
    NdbVersion = 460294, 
    MySQLVersion = 329494
  }, 
  Checksum = 1313227893, 
  ValidFlag = 1, 
  TableId = 298, 
  FragmentId = 133, 
  CreateTableVersion = 1, 
  CreateGci = 574830, 
  MaxGciCompleted = 5578265, 
  MaxGciWritten = 5578266, 
  LcpId = 347481, 
  LocalLcpId = 0, 
  MaxPageCount = 1055, 
  MaxNumberDataFiles = 2064, 
  LastDataFileNumber = 921, 
  MaxPartPairs = 2048, 
  NumPartPairs = 2048, 
  RowCountLow = 1434713, 
  RowCountHigh = 0, 
  FutureUse = {0 <repeats 16 times>}, 
  partPairs = {{
      startPart = 1847, 
      numParts = 1
    }}
}
(gdb) print file_ptr.p->m_part_state[part_id]
Cannot access memory at address 0x7f2d583d8dc8
(gdb) print partPair.startPart
$27 = 17486
(gdb) print part_id
$28 = 17486
(gdb) print lcpCtlFilePtr->partPairs[i]
$29 = {
  startPart = 17486, 
  numParts = 16962
}
(gdb) print lcpCtlFilePtr->partPairs[i-1]
$30 = {
  startPart = 1801, 
  numParts = 1
}
(gdb) print file_ptr.p->m_part_state[1801]
$31 = Restore::File::PART_IGNORED
(gdb) print file_ptr.p->m_part_state[1802]
$32 = Restore::File::PART_ALL_CHANGES
 
   [5 Jul 2018 20:19]
   MySQL Verification Team        
  Hi Daniel, Mikael is lookin at this bug in the restore.cpp but this is not the same as the original issue reported. The original issue, if it happens again we'll hopefully have more data. Mikael will update you when he has more data about the restore.cpp one all best Bogdan
   [6 Jul 2018 11:40]
   Mikael Ronström        
  A fix for the restore issue reported in this bug is now on its way into the next 7.6 release.
   [6 Aug 2018 1:00]
   Bugs System        
  No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

