Bug #56890 Calling typedef my_thread_id in ndbd.cpp
Submitted: 21 Sep 2010 10:18 Modified: 23 Sep 2010 9:53
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.0.19 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any

[21 Sep 2010 10:18] Magnus Blåudd
Description:
We are calling a typedef in ndbd.cpp :)

In my_pthread.h:

typedef ulong my_thread_id;

and in ndbd.cpp

handler_error(int signum)
{
  static long thread_id = 0;

  if (thread_id != 0 && thread_id == my_thread_id()) <<<
  {
    // Shutdown thread received signal
    kill own process
    enter endless loop
  }

  thread_id = my_thread_id(); <<<<

}

It's interesting too see that this code compiles.:) Some printouts shows below that zero is assigned to thread_id which means the guard against getting a signal in signal handler does not work.

handler_error, signum: 6
thread_id: 0
thread_id: 0
2010-09-21 12:48:21 [ndbd] INFO     -- Received signal 6. Running error handler.
2010-09-21 12:48:21 [ndbd] INFO     -- Signal 6 received; Aborted
2010-09-21 12:48:21 [ndbd] INFO     -- ndbd.cpp 

How to repeat:
Manual code inspection and printouts.

Suggested fix:
OK, so how to solve this so that only one shutdown takes place?

1) Install the default signal handler for all signals as first step in 'handler_error', this would mean that if the shutdown code triggers another signal handler, the process would exit() or abort() - meaning we might get a core indicating where the second problem occured.

2) Using the already existing theShutdownMutex. We should move all code for it into ndbd.cpp and let the creation and destruction be handled in the same place where it's used. This will make it possible to create the mutex before allowing shutodwn or installing the signal handler, thus removing uncertainty if the mutex is created or not and thus removing the need for knowing which thread id is running handler_error or shutdown. I think we could actually put the mutex with file storage and remove the need for it to be created/destroyed.
[22 Sep 2010 9:56] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.38 (revid:magnus.blaudd@sun.com-20100922091646-jn4yk85oaufflxhi) (version source revid:magnus.blaudd@sun.com-20100922091646-jn4yk85oaufflxhi) (merge vers: 5.1.47-ndb-6.3.38) (pib:21)
[22 Sep 2010 9:56] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.19 (revid:magnus.blaudd@sun.com-20100922092208-uwlok5g3i83urdwt) (version source revid:magnus.blaudd@sun.com-20100922092208-uwlok5g3i83urdwt) (merge vers: 5.1.47-ndb-7.0.19) (pib:21)
[22 Sep 2010 11:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/118794
[22 Sep 2010 11:28] Magnus Blåudd
Pushed to 6.3.38, 7.0.19 and 7.1.8
[22 Sep 2010 11:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/118800
[23 Sep 2010 9:53] Jon Stephens
Documented in the NDB-6.3.38, 7.0.19, and 7.1.8 changelogs, as follows:

        An error in program flow could result in data node shutdown
        routines being called multiple times.

Closed.
[29 Sep 2010 10:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119379

3288 Martin Skold	2010-09-29 [merge]
      Merge
      removed:
        cluster_change_hist.txt
      modified:
        mysql-test/collections/default.experimental
        mysql-test/suite/ndb/r/ndb_database.result
        mysql-test/suite/ndb/t/ndb_database.test
        sql/ha_ndbcluster.cc
        sql/ha_ndbcluster.h
        sql/ha_ndbcluster_binlog.cc
        sql/handler.cc
        sql/handler.h
        sql/sql_show.cc
        sql/sql_table.cc
        storage/ndb/include/kernel/GlobalSignalNumbers.h
        storage/ndb/include/kernel/signaldata/FsReadWriteReq.hpp
        storage/ndb/include/mgmapi/mgmapi.h
        storage/ndb/include/ndbapi/NdbDictionary.hpp
        storage/ndb/src/kernel/blocks/ERROR_codes.txt
        storage/ndb/src/kernel/blocks/dbdict/Dbdict.cpp
        storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
        storage/ndb/src/kernel/blocks/dblqh/Dblqh.hpp
        storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
        storage/ndb/src/kernel/blocks/dbtup/Dbtup.hpp
        storage/ndb/src/kernel/blocks/dbtup/DbtupIndex.cpp
        storage/ndb/src/kernel/blocks/dbtup/DbtupMeta.cpp
        storage/ndb/src/kernel/blocks/dbtux/Dbtux.hpp
        storage/ndb/src/kernel/blocks/dbtux/DbtuxBuild.cpp
        storage/ndb/src/kernel/blocks/dbtux/DbtuxMaint.cpp
        storage/ndb/src/kernel/blocks/dbtux/DbtuxNode.cpp
        storage/ndb/src/kernel/blocks/dbtux/DbtuxTree.cpp
        storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
        storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.hpp
        storage/ndb/src/kernel/blocks/ndbfs/Ndbfs.cpp
        storage/ndb/src/kernel/blocks/ndbfs/Ndbfs.hpp
        storage/ndb/src/kernel/blocks/ndbfs/VoidFs.cpp
        storage/ndb/src/kernel/blocks/suma/Suma.cpp
        storage/ndb/src/kernel/blocks/suma/Suma.hpp
        storage/ndb/src/kernel/main.cpp
        storage/ndb/src/ndbapi/DictCache.cpp
        storage/ndb/src/ndbapi/DictCache.hpp
        storage/ndb/src/ndbapi/NdbDictionary.cpp
        storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp
        storage/ndb/src/ndbapi/NdbDictionaryImpl.hpp
        storage/ndb/test/include/NdbRestarter.hpp
        storage/ndb/test/ndbapi/testIndex.cpp
        storage/ndb/test/ndbapi/testRestartGci.cpp
        storage/ndb/test/ndbapi/testSystemRestart.cpp
        storage/ndb/test/run-test/daily-basic-tests.txt
        storage/ndb/test/src/NdbRestarter.cpp