Bug #37335 Problems with Mysql Cluster Disk Data when importing big databases
Submitted: 11 Jun 2008 10:00 Modified: 11 Jun 2008 15:41
Reporter: David Cruz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S2 (Serious)
Version:5.1.23-6.2.15 OS:Linux (Ubuntu Server x64)
Assigned to: CPU Architecture:Any
Tags: cluster, disk data, import

[11 Jun 2008 10:00] David Cruz
Description:
I am importing a big set of data to a new database created in a Mysql Cluster, 6.2.15, compiled with compile-max under Ubuntu Server x64.

Got no problem creating databases, tables, tablespaces, etc...until.

Message: Temporary on access to file (Internal error, programming error or missing error message, please report a bug)
Error: 2809
Error data: DBLQH: File system write failed during LogFileOperationRecord state 17. OS errno: 5
Error object: DBLQH (Line: 12931) 0x0000000e
Program: /usr/local/mysql/libexec/ndbd
Pid: 6044
Trace: /var/lib/mysql-cluster/ndbd2/ndb_3_trace.log.1
Version: mysql-5.1.23 ndb-6.2.15
***EOM***

While importing a 106Mb sql file to fill the newly created database with data.  The INSERT statement is somewhat big, but similiar to prior updates which worked well.

There is plenty of tablespace space available since i was monitoring it and there were more than enough free extents. Only 2 GB of memory used in the system, since i'm storing all this in disk, i don't need too much memory.

Every table is being stored in disk using tablespaces and a big logfile.

--------------------------------------

Structure:
2-node cluster with 2 SQL nodes. 1 manager.

Config:
FragmentLogFileSize=32M
NoOfFragmentLogFiles=128
RedoBuffer=128M

Tablespaces are 1GB size.
LogFileGroup file is 512M size.

----------------------------------------

Behaviour after crash:

- 1 node still alive, but unresponsive. Takes more than a minute to answer a query.

- The crashed node cannot be restarted again.
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: 
Error object: DBLQH (Line: 12923) 0x0000000a
Program: /usr/local/mysql/libexec/ndbd
Pid: 7963
Trace: /var/lib/mysql-cluster/ndbd2/ndb_3_trace.log.2
Version: mysql-5.1.23 ndb-6.2.15
***EOM***

so..had to use backup to restore that storage node.

I may provide the trace file.

How to repeat:
Create database.
Create logfile and some big tablespaces, 1GB or more.

Try big inserts during several minutes.
[11 Jun 2008 10:42] Hartmut Holzgraefe
"OS errno: 5" is just "Input/Output error" so this could be almost anything :(

Are you using local disks for the table space or is it located on some remote storage like a SAN, NAS or network file system?

Trace files would be helpful so please provide them.

It is also not clear from your description whether this is a one time incident or whether the failure is reproduceable when importing the same dump again. 

If the failure with importing that dump is consistent would you be able to provide us with the dump (and any other required CREATE statements)?
[11 Jun 2008 11:08] David Cruz
Hi there.

The failure happens every time i do the import in a sequence.  

Storage is a local disk. Something like this:
/dev/sda3            115353572   1604596 107889268   2% /var/lib/mysql-cluster

Traces are attached now to this bug report.

Trying to reproduce it with dummy data. I cannot provide the imports because it's private data from my company.

David
[11 Jun 2008 15:41] David Cruz
Ok, solved.

In the end, while i was trying to use random data to repeat the error, i got this message in /var/log/messages.

Jun 10 13:02:38 d17 kernel: [1712205.526090] sda3: rw=1, want=234375144, limit=234372285
Jun 10 13:02:38 d17 kernel: [1712205.526092] attempt to access beyond end of device

So, errors on disk or in partition, or in filesystem.

I made again the whole partition and filesystem, and the imports works SO fine now.

Guess the error message in Mysql was just not clear enough. Maybe a message aboug "I just cannot write to this tablespace" should be enough! 

;-)

David