Bug #8321 myisampack bug in compression algorithm
Submitted: 4 Feb 2005 16:47 Modified: 10 Jul 2005 15:52
Reporter: SINISA MILIVOJEVIC Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S2 (Serious)
Version:4.0 and above OS:Any (any)
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: corruption, myisam

[4 Feb 2005 16:47] SINISA MILIVOJEVIC
Description:
myisampack  makes error in packing some text / blob columns with short data length of 
specific content.

The error is not discovered during packing, but during building index , when a record has to  be unpacked.

As data is belonging to the client and has highly confidential info, please contact us from the support on the whereabouts of data and debugging environment.

How to repeat:
Contact support for further info.

Also read our internal info.
[11 Mar 2005 20:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/22950
[12 Mar 2005 10:54] Ingo Strüwing
The bug database is defect.
It set the bug to the "Patch pending" state, though there is no bug fix pending yet.
While it is true that I found a bug and I believe that it is "the" bug and I have a fix for it,
I am still working on tests to gain a comfortable certitude that the fix is correct.
During my work on this bug, I inserted a lot of comments, trace messages,
additional checks, and even a decoding function. Since this stuff is much more than
the "real" bug fix, I committed it independently from the bug fix for the reviewers
convenience. The bug fix will follow after my tests. And I plan to commit a third
changeset with suggestions for further improvements of the code.
[12 Mar 2005 19:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/22966
[12 Mar 2005 20:16] Ingo Strüwing
MyISAM compression is based on Huffman encoding. It assigns codes of different lengths to the bytes depending on the incidence of every byte value in the data. A full equal distribution of bytes would lead to eight-bit long Huffman codes (no compression at all for ideal random data). The more the byte incidences are apart from each other, the longer the codes (for the least often occuring bytes) will be.

In the customers data at hand, I found a column which contained 202 distinct byte values (characters). The total number of all the bytes of that column over all rows was 2917027827 (< 3GB). The distribution of the bytes was such that there were 34 bytes which became a Huffman code of 32 bits in length assigned (while the 4 most often occuring bytes got 4-bit codes assigned).

The problem was that the MyISAM compression algorithm failed on Huffman codes with 32 bits in length. I guess that this situation has not been triggered before. And it explains why it was not possible, to reproduce the problem with less data. Though it is possible to get 32 bit codes with even somewhat less data, there is still a minimum amount of data required. It will definitely not be possible with a couple of megabytes. This explains why I cannot provide a small test case.
[18 Mar 2005 17:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23186
[21 Mar 2005 19:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23240
[22 Mar 2005 14:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23288
[23 Mar 2005 17:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23321
[28 Apr 2005 11:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24444
[28 Apr 2005 13:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24455
[29 Apr 2005 18:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24489
[5 May 2005 13:24] Ingo Strüwing
The pure bugfix has already been pushed. It will be in 4.0.25, 4.1.12, and 5.0.6. Issue 4155 can be closed. The bug report is kept open until the two other changes (more improvements) are pushed.
[8 Jun 2005 18:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/25768
[24 Jun 2005 19:20] Ingo Strüwing
The final patch for this went into 5.0.9. (The pure bugfix did already go into 4.1.12 and 5.0.6, this patch contains 64-bit extensions and code comments.)
[10 Jul 2005 15:52] Jon Stephens
Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

Additional info:

Documented bugfix in 4.0.25, 4.1.12, 5.0.6, and 5.0.9 change history; closed.
[17 Jul 2005 20:03] Serhiy Polyakov
I am running MySQL 5.0.9-beta-Max on Fedora Core 4. Processor is AMD Opteron. I used Linux AMD64 / Intel EM64T RPMs.

I am using MyISAM engine and have tables from 3GB to 73GB. I was able to index tables < 10GB but when I run
ALTER TABLE ‘TableName’ ADD INDEX (ColumnName);
on tables where column size I want to index is > 3GB I have this error after eight hours or so:

ERROR 1034 (HY000): Number of rows changed from 522360787 to 996010215.

996010215 is real number of rows in that table.

Column Type is Varchar(20). Every record in that column is exactly 14 characters long.

I can see that MySQL starts building file TableName.TMD but then, when error happens, it quits.
I had the same error with MySQL 5.0.6.

From:
http://dev.mysql.com/doc/mysql/en/news-5-0-9.html
“D.1.2. Changes in release 5.0.9 (15 July 2005)
Bugs fixed:
There was a compression algorithm issue with myisampack for very large datasets (where the total size of of all records in a single column was on the order of 3 GB or more) on 64-bit platforms. (A fix for other platforms was made in MySQL 5.0.6.) (Bug #8321)”

Either my problem is not related to Bug #8321 of bug was not fixed???

Thanks,
Serhiy
[18 Jul 2005 7:15] Ingo Strüwing
I must admit that the changelog entry does not reflect precisely what was fixed. The problem of the original bug reporter was that his data bytes have a distribution, which required Huffman codes of 32 bits in length. This did not work and was fixed before. The new patch made changes which allows for Huffman codes above 32 bits, up to 64 bits. This was independent of the platform (not 64-bit platform specific). All these changes did affect myisampack only. If the distribution of the data bytes are so that too big huffman codes were generated, the resulting table would produce table crashes in the server, because it is wrongly compressed.

From your description I cannot see the relation to this bugfix. But lets try to get at it. Can you please add a --debug flag when compressing your table (i.e. myisampack --debug=d,info:O,myisampack.log ...) and email me the resulting myisampack.log? This should tell me if your data bytes distribution tries to create Huffman codes >= 32 bits in length. You need a mysql*-debug* version for this.
[20 Jul 2005 1:53] Serhiy Polyakov
Ingo,

Thanks for your reply. Actually, I am not sure now weather my problem I described ([17 Jul 22:03] Serhiy Polyakov) is related to very large size of the column, table or large number of number of rows.

Anyway, I installed MySQL-debuginfo*.rpm and ran:
myisampack --debug=d,info:O,myisampack.log TableName.MYD
It packed TableName.MYD of 27GB --> 10GB without errors. No log was created.

Then, again, when I ran
ALTER TABLE ‘TableName’ ADD INDEX (ColumnName);
First it copied TableName.MYD of 10GB to temporary #sql-7ec-21.MYD of 27GB (unpaked back?) and also copied *.frm and *.MYI files to #sql-7ec-21.frm, #sql-7ec-21.MYI.

Than it started to build #sql-7ec-21.MYI and #sql-7ec-21.TMD files. When size of #sql-7ec-21.TMD reached 27GB it quit with the error (after ~17 hours):

ERROR 1034 (HY000): Number of rows changed from 522360787 to 996010215.

#sql-7ec-21.MYI files was about 1.7GB at this moment (initially it was 1K because I did not have any indexes).

Please advise, may be I have to post this problem report under different topic.

Thanks,
Serhiy
[20 Jul 2005 8:29] Ingo Strüwing
I tried to add an index to a compressed table and got "ERROR 1036 (HY000): Table 't1' is read only".

Can you confirm that you followed the advice of myisampack:
"Remember to run myisamchk -rq on compressed tables"
before you tried to do anything else with the compressed table?
See also reference manual "8.2. myisampack, the MySQL Compressed Read-only Table Generator".

You will need to create all indexes before compressing the table. This is a waste of time, I know, since myisamchk rebuilds them anew. But you could create the indexes on an empty table, ALTER TABLE ... DISABLE KEYS, load the data, myisampack, and myisamchk -rq (which re-enables the keys and creates the indexes).
[23 Jul 2005 4:02] Serhiy Polyakov
(1)
I was able to add index on compressed tables because I changed files permissions from default:
-rw-rw---- 1 mysql mysql
to
-rw-r----- 1 mysql mygroupname
Otherwise I have the same error “ERROR 1036 (HY000): Table t1 is read only”
Actually, not clear why changing permissions like that allowed me indexing but I need that for other reasons.

(2) I did not run myisamchk -rq on compressed table before creating the index.

I think my problem has nothing to do with compression. I thought so because when I add index on big uncompressed table mysql creates *.TMD file which is referred in documentation as temporary for compression. And I thought that creating index on big tables requires mysql performing some kind of intermediate compression (I do not need compressed tables myself).

But not, I have original table of 27GB (if I had compressed it would be 10GB). When I add index on original 27GB table mysql creates *.TMD file, it reaches 26.9GB size and then error happens.

And here is additional piece of info. Tiday I tried to add index on TINYINT column, table of 21.5GB (uncompressed) which contains 1,062,844,989 rows (column size is < 1GB). I had the same error: ERROR 1034 (HY000): Number of rows changed from 1006632866 to 1062844989.

What I will do is create empty table, add indexes and load data from old table to new. May be indexing while loading data will not give me that error. Previously I was trying to index tables after loading data.

Thanks,
Serhiy
[25 Jul 2005 13:29] Ingo Strüwing
TMD files are temporary files, which are created during compression and during repair. Their existence does not necessarily indicate a compression operation. And there is no intermediate compression for creating indexes. Also their should be no full repair done by ADD INDEX. Thus no TMD should be created.

So I agree. It does not have to do with compression. Hence it does not have to do with _this_ bug report. Please file a new bug report. What you described suggests the existence of another bug. Because of the size of your data it might be difficult for us to repeat the bug at our site. But we will try anyway. If you see any chance to repeat the problem with less data, then please help us by describing it as good as you can.

Regards,
Ingo