Bug #47496 Got error 4009 'Cluster Failure' from NDB - information_schema corrupt.
Submitted: 21 Sep 23:30 Modified: 7 Dec 16:13
Reporter: theblue Sage
Status: Need Feedback
Category:Server: ClusterDD Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux (CentOS 5.3)
Assigned to: Jonas Oreland Target Version:
Tags: disk storage, tablespaces, 5.1.35-ndb-7.0.7-cluster-gpl
Triage: Triaged: D1 (Critical) / R6 (Needs Assessment) / E6 (Needs Assessment)

[21 Sep 23:30] theblue Sage
Description:
Cluster crashed completely (all SQL/API nodes disconnect, and ndbd shuts down on the
datanode due to using the last extent on a disk write, and then trying to do another
write. the ERRMSG on the ndbd node was :

Time: Monday 21 September 2009 - 10:27:25
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error
message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: ndbd.cpp
Program: ndbd
Pid: 32624
Trace: /var/lib/mysql/ndb_4_trace.log.3
Version: mysql-5.1.35 ndb-7.0.7
***EOM***

At the time my cluster looked like this :
Admin : 
Cluster Configuration
---------------------
[ndbd(NDB)]	1 node(s)
id=4	@10.6.1.76  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.6.1.70  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=2	@10.6.1.77  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.6.1.78  (mysql-5.1.35 ndb-7.0.7)

Data node had a tablespace called called session_table_space. It had 4 datafiles in it,
of size, 32M, 2G, 2G, 2G. I tried to add about 1.3 million records. as I saw the
free_extents disappearing, I tried to add a new datafile, but (root) got permission
denied (I assume this is because I cannot add a new datafile to a tablespace that is
concurrently having records added to it, which makes sense to me.) The entire tablespace
filled up at record 792160, and then the whole system crashed when I tried to add one
more.

When I restarted the cluster, (ndbd -v on the datanode) and waited for it to come up
before starting the SQL/API nodes, it did come back with what I though was the correct
number of records, and the right amount of data/index usage ...

Cluster Configuration
---------------------
[ndbd(NDB)]	1 node(s)
id=4	@10.6.1.76  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.6.1.70  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=2	@10.6.1.77  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.6.1.78  (mysql-5.1.35 ndb-7.0.7)

ndb_mgm>  Node 4: Started (version 7.0.7)
Node 4: Data usage is 9%(12250 32K pages of total 131072)
Node 4: Index usage is 2%(3274 8K pages of total 131104)

-----
I log into one of hte SQL/API nodes , switch to the session table, and can get a result
from :
mysql> select count(*) from session_store;
+----------+
| count(*) |
+----------+
|   792161 | 
+----------+
1 row in set (0.00 sec)

however, when I then try to add a new datafile to the tablespace, I am now getting :
 
mysql> ALTER TABLESPACE session_table_space ADD DATAFILE 'session_data_five.dat'
INITIAL_SIZE 2G ENGINE NDBCLUSTER;
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database
'information_schema'
mysql> grant all on *.* to root@localhost;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLESPACE session_table_space ADD DATAFILE 'session_data_five.dat'
INITIAL_SIZE 2G ENGINE NDBCLUSTER;
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database
'information_schema'
mysql> 

now nothing I can do can alter the information_schema.

How to repeat:
create a tablespace in a cluster. add a datafile for disk storage. Fill it with rows
until the datafile is full. Try to add one more row.

Suggested fix:
I expected the database to return "table full".
[22 Sep 7:47] Jonas Oreland
tracefiles ? (see ndb_error_reporter)
schema ?
[22 Sep 15:14] theblue Sage
ndb_Error_report documents

Attachment: ndb_error_report_20090922051001.tar.bz2 (application/x-bzip, text), 155.02 KiB.

[22 Sep 15:17] theblue Sage
I would happily upload the schema that the bug relates to, but any accessing of hte schema
'information_schema' (which is the one with the problem), causes he cluster to crash. It
is also my understanding (though I am likely to be quite wrong) that the information
schema is not actually a real table as such so cant/shouldnt be dumped.
[23 Sep 9:28] Jonas Oreland
tracefiles shows that it crashes during an update on table 124.
I.e not strictly information schema related.
so please upload your schema, maybe some data, and some typical queries

/Jonas
[23 Sep 21:23] theblue Sage
schema of table 124 (database fe_sessions)
CREATE TABLE `session_store` (
  `session_id` varchar(40) NOT NULL,
  `create_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `last_updated` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `session_data` longtext NOT NULL,
  PRIMARY KEY (`session_id`)
) 
TABLESPACE session_tablespace STORAGE DISK
ENGINE=ndbcluster DEFAULT CHARSET=latin1;

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
about hte data :
the sessions_id field is a valid created UUID, examples :

mysql> select session_id, create_Date, last_updated from session_store limit 5;
+--------------------------------------+---------------------+---------------------+
| session_id                           | create_Date         | last_updated        |
+--------------------------------------+---------------------+---------------------+
| f547aed4-8f18-3964-85b0-427878da9735 | 2009-09-23 10:36:14 | 2009-09-23 10:36:15 | 
| 00a5d0cb-ec7e-b944-cd5d-e44b4a29b947 | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00efe7cf-ed17-bfb4-5d74-c307370fee86 | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00e288ad-cd8f-9694-690b-d1664769a9cb | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00dfee06-92a3-8fa4-e521-5f8f9837363e | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
+--------------------------------------+---------------------+---------------------+
5 rows in set (0.00 sec)

The session_data field is a serialized php session pertaining to our product. It ranges
from about 2.5K to 6K in size, with the average at around 3.5K . Here is one example :

sessionTimeStamp|i:1253408957;UUID|s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";Wurfl8-28-2009|b:1;wurfl_device_id|s:11:"Yahoo_Slurp";isWap|b:0;imageType|s:3:"gif";screenWidth|i:770;tables|b:0;phoneModel|s:14:"Bot
or
Crawler";phoneBrand|s:5:"Robot";isWirelessDevice|b:0;crumbs|a:0:{}heldSection|s:10:"setcollege";lastHeldSection|N;brandExp|i:0;partnerid|s:0:"";carrierid|s:0:"";brandid|s:0:"";affiliateid|s:0:"";carrierDetected|b:0;carrierId|b:0;debugFlag|N;locationCookie|s:153:"One
Sherborn St.|Boston|MA|02215|42.34939|71.10054|boston|Boston
University|bostonuniversity|BU||516|Terriers|bostonuniversity|54809|America/New_York|est";savedLocations|a:2:{i:0;s:53:"San+Francisco+CA+%2894154%29|San
Francisco CA (94154)";i:1;s:52:"Boston+University|Boston
University|bostonuniversity";}blankLog|a:36:{s:6:"domain";s:0:"";s:6:"server";s:0:"";s:9:"partnerId";s:0:"";s:9:"carrierId";s:0:"";s:7:"brandId";s:0:"";s:11:"affiliateId";s:0:"";s:6:"userId";s:0:"";s:6:"street";s:0:"";s:4:"city";s:0:"";s:5:"state";s:0:"";s:3:"zip";s:0:"";s:10:"pickMarket";s:0:"";s:3:"lat";s:0:"";s:4:"long";s:0:"";s:17:"activateTimeStamp";s:0:"";s:10:"clientType";s:0:"";s:15:"userAgentString";s:0:"";s:8:"clientIp";s:0:"";s:10:"deviceMake";s:0:"";s:11:"deviceModel";s:0:"";s:9:"timeStamp";s:0:"";s:7:"referer";s:0:"";s:7:"pageUrl";s:0:"";s:9:"sourceUrl";s:0:"";s:14:"destinationUrl";s:0:"";s:12:"previousArea";s:0:"";s:11:"currentArea";s:0:"";s:10:"categoryId";s:0:"";s:13:"subCategoryId";s:0:"";s:9:"contentId";s:0:"";s:6:"viewId";s:0:"";s:9:"sectionId";s:0:"";s:8:"marketId";s:0:"";s:7:"pubDate";s:0:"";s:5:"title";s:0:"";s:13:"urlParameters";s:0:"";}sourceIds|a:6:{s:3:"web";s:1:"2";s:6:"mobile";s:1:"3";s:3:"sms";s:1:"6";s:3:"xml";s:1:"4";s:3:"rss";s:1:"5";s:4:"atom";s:1:"7";}markupIds|a:10:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:4:"hdml";s:1:"2";s:3:"wml";s:1:"3";s:3:"xml";s:1:"4";s:4:"html";s:1:"5";s:5:"xhtml";s:1:"6";s:7:"xhtmlmp";s:1:"7";s:3:"rss";s:1:"8";s:4:"atom";s:1:"9";}pageUrlTypes|a:23:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:7:"section";s:1:"2";s:7:"content";s:1:"3";s:6:"search";s:1:"4";s:7:"sitemap";s:1:"5";s:6:"legacy";s:1:"6";s:11:"redirection";s:1:"7";s:9:"marketing";s:1:"8";s:6:"mahele";s:1:"8";s:11:"sweepstakes";s:1:"8";s:5:"error";s:1:"9";s:5:"debug";s:2:"10";s:10:"directions";s:2:"11";s:5:"yesno";s:2:"12";s:18:"termsandconditions";s:2:"13";s:8:"tracking";s:2:"14";s:6:"movies";s:2:"15";s:6:"dining";s:2:"16";s:7:"weather";s:1:"2";s:8:"forecast";s:1:"2";s:8:"location";s:1:"2";s:11:"setlocation";s:1:"1";}clientTypes|a:5:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:3:"web";s:1:"2";s:9:"mobileweb";s:1:"3";s:10:"webservice";s:1:"4";}statsLogDetails|a:38:{s:6:"domain";s:23:"mobile.sportingnews.com";s:6:"server";s:4:"fe06";s:9:"partnerId";s:0:"";s:9:"carrierId";s:0:"";s:7:"brandId";s:0:"";s:11:"affiliateId";s:0:"";s:6:"userId";s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";s:6:"street";s:12:"<![CDATA[]]>";s:4:"city";s:13:"San
Francisco";s:5:"state";s:2:"CA";s:3:"zip";s:5:"94154";s:10:"pickMarket";s:12:"sanfrancisco";s:3:"lat";s:6:"37.775";s:4:"long";s:8:"122.4183";s:17:"activateTimeStamp";s:0:"";s:10:"clientType";s:1:"2";s:15:"userAgentString";s:99:"<![CDATA[Mozilla/5.0
(compatible; Yahoo! Slurp/3.0;
http://help.yahoo.com/help/us/ysearch/slurp)]]>";s:8:"clientIp";s:13:"72.30.142.215";s:10:"deviceMake";s:5:"Robot";s:11:"deviceModel";s:14:"Bot
or Crawler";s:9:"timeStamp";s:19:"2009-09-20
01:09:18";s:7:"referer";s:12:"<![CDATA[]]>";s:7:"pageUrl";s:0:"";s:9:"sourceUrl";s:23:"<![CDATA[/setcollege]]>";s:14:"destinationUrl";s:23:"<![CDATA[/setcollege]]>";s:12:"previousArea";s:0:"";s:11:"currentArea";s:0:"";s:10:"categoryId";s:0:"";s:13:"subCategoryId";s:0:"";s:9:"contentId";s:0:"";s:6:"viewId";s:0:"";s:9:"sectionId";s:0:"";s:8:"marketId";s:0:"";s:7:"pubDate";s:0:"";s:5:"title";s:0:"";s:13:"urlParameters";s:96:"<urlParameters>
                <c><![CDATA[bostonuniversity]]></c>
           
</urlParameters>";s:9:"sessionId";s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";s:8:"sourceId";s:1:"2";}currentDisplayLocation|s:17:"Boston
University";currPickMarket|s:6:"boston";

The main problem is that when the final extent of the final DATAFILE for the sessions
TABLESPACE was used up, the cluster crashed when trying to add one more record. IT is
more about it crashing instead of it saying "sorry we are full", or "go make more space
before continuing". The result of the crash once we got the cluster back up and runnign
is that the information_schema was unreadable and un-alterable even by root (could not
add DATAFILES, TABLESPACES, or anything.
Again, the cluster was magic until I tried to add too many records to a database that as
"storage DISK engine=nbd" and there was no more space in the TABLESPACE

Thanks for looking at this for me!
[12 Oct 3:51] theblue Sage
any word on this ? The silence makes me nervous ... :)
[7 Dec 16:13] Jonas Oreland
Hi,

(no) news: I can't repeat the problem neither with 7.07 nor with 7.0.9
I'm using the session_store table with a 64M tablespace and fill it up,
delete, fill it up etc...(with parallel threads)

No problems...

Suggestion: Can you try a newer version to see if problem persists?

/Jonas