MySQL Bugs: #47496: Got error 4009 'Cluster Failure' from NDB

Bug #47496	Got error 4009 'Cluster Failure' from NDB - information_schema corrupt.
Submitted:	21 Sep 2009 21:30	Modified:	7 Jan 2010 15:13
Reporter:	theblue Sage	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.0	OS:	Linux (CentOS 5.3)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	5.1.35-ndb-7.0.7-cluster-gpl, disk storage, tablespaces

Description:
Cluster crashed completely (all SQL/API nodes disconnect, and ndbd shuts down on the datanode due to using the last extent on a disk write, and then trying to do another write. the ERRMSG on the ndbd node was :

Time: Monday 21 September 2009 - 10:27:25
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: ndbd.cpp
Program: ndbd
Pid: 32624
Trace: /var/lib/mysql/ndb_4_trace.log.3
Version: mysql-5.1.35 ndb-7.0.7
***EOM***

At the time my cluster looked like this :
Admin : 
Cluster Configuration
---------------------
[ndbd(NDB)]	1 node(s)
id=4	@10.6.1.76  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.6.1.70  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=2	@10.6.1.77  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.6.1.78  (mysql-5.1.35 ndb-7.0.7)

Data node had a tablespace called called session_table_space. It had 4 datafiles in it, of size, 32M, 2G, 2G, 2G. I tried to add about 1.3 million records. as I saw the free_extents disappearing, I tried to add a new datafile, but (root) got permission denied (I assume this is because I cannot add a new datafile to a tablespace that is concurrently having records added to it, which makes sense to me.) The entire tablespace filled up at record 792160, and then the whole system crashed when I tried to add one more.

When I restarted the cluster, (ndbd -v on the datanode) and waited for it to come up before starting the SQL/API nodes, it did come back with what I though was the correct number of records, and the right amount of data/index usage ...

Cluster Configuration
---------------------
[ndbd(NDB)]	1 node(s)
id=4	@10.6.1.76  (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@10.6.1.70  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]	2 node(s)
id=2	@10.6.1.77  (mysql-5.1.35 ndb-7.0.7)
id=3	@10.6.1.78  (mysql-5.1.35 ndb-7.0.7)

ndb_mgm>  Node 4: Started (version 7.0.7)
Node 4: Data usage is 9%(12250 32K pages of total 131072)
Node 4: Index usage is 2%(3274 8K pages of total 131104)

-----
I log into one of hte SQL/API nodes , switch to the session table, and can get a result from :
mysql> select count(*) from session_store;
+----------+
| count(*) |
+----------+
|   792161 | 
+----------+
1 row in set (0.00 sec)

however, when I then try to add a new datafile to the tablespace, I am now getting :
 
mysql> ALTER TABLESPACE session_table_space ADD DATAFILE 'session_data_five.dat' INITIAL_SIZE 2G ENGINE NDBCLUSTER;
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database 'information_schema'
mysql> grant all on *.* to root@localhost;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLESPACE session_table_space ADD DATAFILE 'session_data_five.dat' INITIAL_SIZE 2G ENGINE NDBCLUSTER;
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database 'information_schema'
mysql> 

now nothing I can do can alter the information_schema.

How to repeat:
create a tablespace in a cluster. add a datafile for disk storage. Fill it with rows until the datafile is full. Try to add one more row.

Suggested fix:
I expected the database to return "table full".

tracefiles ? (see ndb_error_reporter)
schema ?

ndb_Error_report documents

Attachment: ndb_error_report_20090922051001.tar.bz2 (application/x-bzip, text), 155.02 KiB.

I would happily upload the schema that the bug relates to, but any accessing of hte schema 'information_schema' (which is the one with the problem), causes he cluster to crash. It is also my understanding (though I am likely to be quite wrong) that the information schema is not actually a real table as such so cant/shouldnt be dumped.

tracefiles shows that it crashes during an update on table 124.
I.e not strictly information schema related.
so please upload your schema, maybe some data, and some typical queries

/Jonas

schema of table 124 (database fe_sessions)
CREATE TABLE `session_store` (
  `session_id` varchar(40) NOT NULL,
  `create_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `last_updated` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `session_data` longtext NOT NULL,
  PRIMARY KEY (`session_id`)
) 
TABLESPACE session_tablespace STORAGE DISK
ENGINE=ndbcluster DEFAULT CHARSET=latin1;

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
about hte data :
the sessions_id field is a valid created UUID, examples :

mysql> select session_id, create_Date, last_updated from session_store limit 5;
+--------------------------------------+---------------------+---------------------+
| session_id                           | create_Date         | last_updated        |
+--------------------------------------+---------------------+---------------------+
| f547aed4-8f18-3964-85b0-427878da9735 | 2009-09-23 10:36:14 | 2009-09-23 10:36:15 | 
| 00a5d0cb-ec7e-b944-cd5d-e44b4a29b947 | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00efe7cf-ed17-bfb4-5d74-c307370fee86 | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00e288ad-cd8f-9694-690b-d1664769a9cb | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
| 00dfee06-92a3-8fa4-e521-5f8f9837363e | 2009-09-23 10:37:10 | 2009-09-23 10:37:10 | 
+--------------------------------------+---------------------+---------------------+
5 rows in set (0.00 sec)

The session_data field is a serialized php session pertaining to our product. It ranges from about 2.5K to 6K in size, with the average at around 3.5K . Here is one example :

sessionTimeStamp|i:1253408957;UUID|s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";Wurfl8-28-2009|b:1;wurfl_device_id|s:11:"Yahoo_Slurp";isWap|b:0;imageType|s:3:"gif";screenWidth|i:770;tables|b:0;phoneModel|s:14:"Bot or Crawler";phoneBrand|s:5:"Robot";isWirelessDevice|b:0;crumbs|a:0:{}heldSection|s:10:"setcollege";lastHeldSection|N;brandExp|i:0;partnerid|s:0:"";carrierid|s:0:"";brandid|s:0:"";affiliateid|s:0:"";carrierDetected|b:0;carrierId|b:0;debugFlag|N;locationCookie|s:153:"One Sherborn St.|Boston|MA|02215|42.34939|71.10054|boston|Boston University|bostonuniversity|BU||516|Terriers|bostonuniversity|54809|America/New_York|est";savedLocations|a:2:{i:0;s:53:"San+Francisco+CA+%2894154%29|San Francisco CA (94154)";i:1;s:52:"Boston+University|Boston University|bostonuniversity";}blankLog|a:36:{s:6:"domain";s:0:"";s:6:"server";s:0:"";s:9:"partnerId";s:0:"";s:9:"carrierId";s:0:"";s:7:"brandId";s:0:"";s:11:"affiliateId";s:0:"";s:6:"userId";s:0:"";s:6:"street";s:0:"";s:4:"city";s:0:"";s:5:"state";s:0:"";s:3:"zip";s:0:"";s:10:"pickMarket";s:0:"";s:3:"lat";s:0:"";s:4:"long";s:0:"";s:17:"activateTimeStamp";s:0:"";s:10:"clientType";s:0:"";s:15:"userAgentString";s:0:"";s:8:"clientIp";s:0:"";s:10:"deviceMake";s:0:"";s:11:"deviceModel";s:0:"";s:9:"timeStamp";s:0:"";s:7:"referer";s:0:"";s:7:"pageUrl";s:0:"";s:9:"sourceUrl";s:0:"";s:14:"destinationUrl";s:0:"";s:12:"previousArea";s:0:"";s:11:"currentArea";s:0:"";s:10:"categoryId";s:0:"";s:13:"subCategoryId";s:0:"";s:9:"contentId";s:0:"";s:6:"viewId";s:0:"";s:9:"sectionId";s:0:"";s:8:"marketId";s:0:"";s:7:"pubDate";s:0:"";s:5:"title";s:0:"";s:13:"urlParameters";s:0:"";}sourceIds|a:6:{s:3:"web";s:1:"2";s:6:"mobile";s:1:"3";s:3:"sms";s:1:"6";s:3:"xml";s:1:"4";s:3:"rss";s:1:"5";s:4:"atom";s:1:"7";}markupIds|a:10:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:4:"hdml";s:1:"2";s:3:"wml";s:1:"3";s:3:"xml";s:1:"4";s:4:"html";s:1:"5";s:5:"xhtml";s:1:"6";s:7:"xhtmlmp";s:1:"7";s:3:"rss";s:1:"8";s:4:"atom";s:1:"9";}pageUrlTypes|a:23:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:7:"section";s:1:"2";s:7:"content";s:1:"3";s:6:"search";s:1:"4";s:7:"sitemap";s:1:"5";s:6:"legacy";s:1:"6";s:11:"redirection";s:1:"7";s:9:"marketing";s:1:"8";s:6:"mahele";s:1:"8";s:11:"sweepstakes";s:1:"8";s:5:"error";s:1:"9";s:5:"debug";s:2:"10";s:10:"directions";s:2:"11";s:5:"yesno";s:2:"12";s:18:"termsandconditions";s:2:"13";s:8:"tracking";s:2:"14";s:6:"movies";s:2:"15";s:6:"dining";s:2:"16";s:7:"weather";s:1:"2";s:8:"forecast";s:1:"2";s:8:"location";s:1:"2";s:11:"setlocation";s:1:"1";}clientTypes|a:5:{s:4:"none";s:1:"0";s:5:"other";s:1:"1";s:3:"web";s:1:"2";s:9:"mobileweb";s:1:"3";s:10:"webservice";s:1:"4";}statsLogDetails|a:38:{s:6:"domain";s:23:"mobile.sportingnews.com";s:6:"server";s:4:"fe06";s:9:"partnerId";s:0:"";s:9:"carrierId";s:0:"";s:7:"brandId";s:0:"";s:11:"affiliateId";s:0:"";s:6:"userId";s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";s:6:"street";s:12:"<![CDATA[]]>";s:4:"city";s:13:"San Francisco";s:5:"state";s:2:"CA";s:3:"zip";s:5:"94154";s:10:"pickMarket";s:12:"sanfrancisco";s:3:"lat";s:6:"37.775";s:4:"long";s:8:"122.4183";s:17:"activateTimeStamp";s:0:"";s:10:"clientType";s:1:"2";s:15:"userAgentString";s:99:"<![CDATA[Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)]]>";s:8:"clientIp";s:13:"72.30.142.215";s:10:"deviceMake";s:5:"Robot";s:11:"deviceModel";s:14:"Bot or Crawler";s:9:"timeStamp";s:19:"2009-09-20 01:09:18";s:7:"referer";s:12:"<![CDATA[]]>";s:7:"pageUrl";s:0:"";s:9:"sourceUrl";s:23:"<![CDATA[/setcollege]]>";s:14:"destinationUrl";s:23:"<![CDATA[/setcollege]]>";s:12:"previousArea";s:0:"";s:11:"currentArea";s:0:"";s:10:"categoryId";s:0:"";s:13:"subCategoryId";s:0:"";s:9:"contentId";s:0:"";s:6:"viewId";s:0:"";s:9:"sectionId";s:0:"";s:8:"marketId";s:0:"";s:7:"pubDate";s:0:"";s:5:"title";s:0:"";s:13:"urlParameters";s:96:"<urlParameters>
                <c><![CDATA[bostonuniversity]]></c>
            </urlParameters>";s:9:"sessionId";s:36:"00dfee06-92a3-8fa4-e521-5f8f9837363e";s:8:"sourceId";s:1:"2";}currentDisplayLocation|s:17:"Boston University";currPickMarket|s:6:"boston";

The main problem is that when the final extent of the final DATAFILE for the sessions TABLESPACE was used up, the cluster crashed when trying to add one more record. IT is more about it crashing instead of it saying "sorry we are full", or "go make more space before continuing". The result of the crash once we got the cluster back up and runnign is that the information_schema was unreadable and un-alterable even by root (could not add DATAFILES, TABLESPACES, or anything.
Again, the cluster was magic until I tried to add too many records to a database that as "storage DISK engine=nbd" and there was no more space in the TABLESPACE

Thanks for looking at this for me!

any word on this ? The silence makes me nervous ... :)

Hi,

(no) news: I can't repeat the problem neither with 7.07 nor with 7.0.9
I'm using the session_store table with a 64M tablespace and fill it up,
delete, fill it up etc...(with parallel threads)

No problems...

Suggestion: Can you try a newer version to see if problem persists?

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".