Bug #52696 ndb_restore with --print_data does not reference hidden blob tables
Submitted: 8 Apr 2010 16:26 Modified: 17 May 2010 13:06
Reporter: Tom Farvour Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:FreeBSD (8.0-RELEASE)
Assigned to: Andrew Hutchings CPU Architecture:Any
Tags: 7.0.13, export, flat file, ndb_restore, print_data, tab

[8 Apr 2010 16:26] Tom Farvour
Description:
I wasn't sure whether to classify this as a bug or a feature, but I think it's a bug since it's not really the intended output expected when running ndb_restore --print_data --tab

for s in 10 11 12 13; do
	ndb_restore \
	-b 10 \
	-n $s \
	--backup_path=/backup/ndbbackup-history/BACKUP-10 \
	--verbose=0 \
	--print_data \
	--append \
	--tab=./$d/ \
	--hex \
	--fields-terminated-by="tjf_fterm" \
	--lines-terminated-by="tjf_lterm" \
	--include-databases=$d;
done;

Once the flat files are created, they contain all the appropriate data, except only the first 256 characters are actually in them because the ndb_restore utility is not referencing the hidden BLOB table.

I am aware that by using ndb_restore into a real cluster would fixed this problem since the hidden table would just get re-created and referenced through the ndb engine; however, when exporting to a flat file, this data is worthless since it doesn't contain the entire blob column.

-- answers --
Version: 1
Fragment type: 9
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 5
Number of primary keys: 1
Length of frm data: 343
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
FragmentCount: 16
TableStatus: Retrieved
-- Attributes --
id Unsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR
ticket Int NULL AT=FIXED ST=MEMORY
message Text(256,2000,0;latin1_swedish_ci) NULL AT=MEDIUM_VAR ST=MEMORY BV=2 BT=NDB$BLOB_300_2
timestamp Int NOT NULL AT=FIXED ST=MEMORY
rep Int NOT NULL AT=FIXED ST=MEMORY

-- Indexes --
PRIMARY KEY(id) - UniqueHashIndex
rep(rep) - OrderedIndex
ticket(ticket) - OrderedIndex
PRIMARY(id) - OrderedIndex

NDBT_ProgramExit: 0 - OK

------------
^^

As you can see above, the ndb_restore utility should be referencing the NDB$BLOB_300_2 hidden blob table for the `message` column, and not using the Text(256) data when using the --print_data --tab flags.

Obviously it should not when restoring into a real cluster using the -c connect string.

How to repeat:
*Perform a normal ndb backup using START BACKUP
*Move all backup files into a single directory to do an ndb_restore on them per node.
*Use the --print_data and --tab flag to export the data into separate CSV data files.
*Once all the flat files are created, check them and you will see that only the first 256 characters are actually in the files... the hidden blob table was not referenced to obtain the rest of the data.

Suggested fix:
Add either a flag or have the hidden blob table get referenced automatically when doing a --print_data --tab.

The --print_data flag is practically worthless without the hidden blob tables' data being included.
[17 May 2010 13:06] Andrew Hutchings
After some discussion over this we are going to set as Won't Fix for now.

Basically the backup format has the blob tables separate from the main tables.  This means that for every blob tuple would mean a scan of the backup file for the rest of the blob data.  It would be quite complex and messy, as well as incredibly slow.

The alternative is to radically change the backup format, this would be very difficult and has an impact on many other things.
[19 Aug 2010 16:31] Jon Stephens
See also BUG#56123, to which this bug is related.