Bug #8321 myisampack bug in compression algorithm
Submitted: 4 Feb 2005 17:47 Modified: 10 Jul 2005 17:52
Reporter: Sinisa Milivojevic
Status: Closed
Category:Client Severity:S2 (Serious)
Version:4.0 and above OS:Any (any)
Assigned to: Ingo Strüwing Target Version:
Tags: corruption, myisam

[4 Feb 2005 17:47] Sinisa Milivojevic
Description:
myisampack  makes error in packing some text / blob columns with short data length of 
specific content.

The error is not discovered during packing, but during building index , when a record has
to  be unpacked.

As data is belonging to the client and has highly confidential info, please contact us
from the support on the whereabouts of data and debugging environment.

How to repeat:
Contact support for further info.

Also read our internal info.
[11 Mar 2005 21:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/22950
[12 Mar 2005 11:54] Ingo Strüwing
The bug database is defect.
It set the bug to the "Patch pending" state, though there is no bug fix pending yet.
While it is true that I found a bug and I believe that it is "the" bug and I have a fix
for it,
I am still working on tests to gain a comfortable certitude that the fix is correct.
During my work on this bug, I inserted a lot of comments, trace messages,
additional checks, and even a decoding function. Since this stuff is much more than
the "real" bug fix, I committed it independently from the bug fix for the reviewers
convenience. The bug fix will follow after my tests. And I plan to commit a third
changeset with suggestions for further improvements of the code.
[12 Mar 2005 20:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/22966
[12 Mar 2005 21:16] Ingo Strüwing
MyISAM compression is based on Huffman encoding. It assigns codes of different lengths to
the bytes depending on the incidence of every byte value in the data. A full equal
distribution of bytes would lead to eight-bit long Huffman codes (no compression at all
for ideal random data). The more the byte incidences are apart from each other, the
longer the codes (for the least often occuring bytes) will be.

In the customers data at hand, I found a column which contained 202 distinct byte values
(characters). The total number of all the bytes of that column over all rows was
2917027827 (< 3GB). The distribution of the bytes was such that there were 34 bytes which
became a Huffman code of 32 bits in length assigned (while the 4 most often occuring bytes
got 4-bit codes assigned).

The problem was that the MyISAM compression algorithm failed on Huffman codes with 32
bits in length. I guess that this situation has not been triggered before. And it
explains why it was not possible, to reproduce the problem with less data. Though it is
possible to get 32 bit codes with even somewhat less data, there is still a minimum
amount of data required. It will definitely not be possible with a couple of megabytes.
This explains why I cannot provide a small test case.
[18 Mar 2005 18:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23186
[21 Mar 2005 20:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23240
[22 Mar 2005 15:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23288
[23 Mar 2005 18:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23321
[28 Apr 2005 13:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24444
[28 Apr 2005 15:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24455
[29 Apr 2005 20:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24489
[5 May 2005 15:24] Ingo Strüwing
The pure bugfix has already been pushed. It will be in 4.0.25, 4.1.12, and 5.0.6. Issue
4155 can be closed. The bug report is kept open until the two other changes (more
improvements) are pushed.
[8 Jun 2005 20:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/25768
[24 Jun 2005 21:20] Ingo Strüwing
The final patch for this went into 5.0.9. (The pure bugfix did already go into 4.1.12 and
5.0.6, this patch contains 64-bit extensions and code comments.)
[10 Jul 2005 17:52] Jon Stephens
Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

Additional info:

Documented bugfix in 4.0.25, 4.1.12, 5.0.6, and 5.0.9 change history; closed.
[17 Jul 2005 22:03] Serhiy Polyakov
I am running MySQL 5.0.9-beta-Max on Fedora Core 4. Processor is AMD Opteron. I used Linux
AMD64 / Intel EM64T RPMs.

I am using MyISAM engine and have tables from 3GB to 73GB. I was able to index tables <
10GB but when I run
ALTER TABLE ‘TableName’ ADD INDEX (ColumnName);
on tables where column size I want to index is > 3GB I have this error after eight hours
or so:

ERROR 1034 (HY000): Number of rows changed from 522360787 to 996010215.

996010215 is real number of rows in that table.

Column Type is Varchar(20). Every record in that column is exactly 14 characters long.

I can see that MySQL starts building file TableName.TMD but then, when error happens, it
quits.
I had the same error with MySQL 5.0.6.

From:
http://dev.mysql.com/doc/mysql/en/news-5-0-9.html
“D.1.2. Changes in release 5.0.9 (15 July 2005)
Bugs fixed:
There was a compression algorithm issue with myisampack for very large datasets (where
the total size of of all records in a single column was on the order of 3 GB or more) on
64-bit platforms. (A fix for other platforms was made in MySQL 5.0.6.) (Bug #8321)”

Either my problem is not related to Bug #8321 of bug was not fixed???

Thanks,
Serhiy
[18 Jul 2005 9:15] Ingo Strüwing
I must admit that the changelog entry does not reflect precisely what was fixed. The
problem of the original bug reporter was that his data bytes have a distribution, which
required Huffman codes of 32 bits in length. This did not work and was fixed before. The
new patch made changes which allows for Huffman codes above 32 bits, up to 64 bits. This
was independent of the platform (not 64-bit platform specific). All these changes did
affect myisampack only. If the distribution of the data bytes are so that too big huffman
codes were generated, the resulting table would produce table crashes in the server,
because it is wrongly compressed.

From your description I cannot see the relation to this bugfix. But lets try to get at
it. Can you please add a --debug flag when compressing your table (i.e. myisampack
--debug=d,info:O,myisampack.log ...) and email me the resulting myisampack.log? This
should tell me if your data bytes distribution tries to create Huffman codes >= 32 bits
in length. You need a mysql*-debug* version for this.
[20 Jul 2005 3:53] Serhiy Polyakov
Ingo,

Thanks for your reply. Actually, I am not sure now weather my problem I described ([17
Jul 22:03] Serhiy Polyakov) is related to very large size of the column, table or large
number of number of rows.

Anyway, I installed MySQL-debuginfo*.rpm and ran:
myisampack --debug=d,info:O,myisampack.log TableName.MYD
It packed TableName.MYD of 27GB --> 10GB without errors. No log was created.

Then, again, when I ran
ALTER TABLE ‘TableName’ ADD INDEX (ColumnName);
First it copied TableName.MYD of 10GB to temporary #sql-7ec-21.MYD of 27GB (unpaked
back?) and also copied *.frm and *.MYI files to #sql-7ec-21.frm, #sql-7ec-21.MYI.

Than it started to build #sql-7ec-21.MYI and #sql-7ec-21.TMD files. When size of
#sql-7ec-21.TMD reached 27GB it quit with the error (after ~17 hours):

ERROR 1034 (HY000): Number of rows changed from 522360787 to 996010215.

#sql-7ec-21.MYI files was about 1.7GB at this moment (initially it was 1K because I did
not have any indexes).

Please advise, may be I have to post this problem report under different topic.

Thanks,
Serhiy
[20 Jul 2005 10:29] Ingo Strüwing
I tried to add an index to a compressed table and got "ERROR 1036 (HY000): Table 't1' is
read only".

Can you confirm that you followed the advice of myisampack:
"Remember to run myisamchk -rq on compressed tables"
before you tried to do anything else with the compressed table?
See also reference manual "8.2. myisampack, the MySQL Compressed Read-only Table
Generator".

You will need to create all indexes before compressing the table. This is a waste of
time, I know, since myisamchk rebuilds them anew. But you could create the indexes on an
empty table, ALTER TABLE ... DISABLE KEYS, load the data, myisampack, and myisamchk -rq
(which re-enables the keys and creates the indexes).
[23 Jul 2005 6:02] Serhiy Polyakov
(1)
I was able to add index on compressed tables because I changed files permissions from
default:
-rw-rw---- 1 mysql mysql
to
-rw-r----- 1 mysql mygroupname
Otherwise I have the same error “ERROR 1036 (HY000): Table t1 is read only”
Actually, not clear why changing permissions like that allowed me indexing but I need
that for other reasons.

(2) I did not run myisamchk -rq on compressed table before creating the index.

I think my problem has nothing to do with compression. I thought so because when I add
index on big uncompressed table mysql creates *.TMD file which is referred in
documentation as temporary for compression. And I thought that creating index on big
tables requires mysql performing some kind of intermediate compression (I do not need
compressed tables myself).

But not, I have original table of 27GB (if I had compressed it would be 10GB). When I add
index on original 27GB table mysql creates *.TMD file, it reaches 26.9GB size and then
error happens.

And here is additional piece of info. Tiday I tried to add index on TINYINT column, table
of 21.5GB (uncompressed) which contains 1,062,844,989 rows (column size is < 1GB). I had
the same error: ERROR 1034 (HY000): Number of rows changed from 1006632866 to
1062844989.

What I will do is create empty table, add indexes and load data from old table to new.
May be indexing while loading data will not give me that error. Previously I was trying
to index tables after loading data.

Thanks,
Serhiy
[25 Jul 2005 15:29] Ingo Strüwing
TMD files are temporary files, which are created during compression and during repair.
Their existence does not necessarily indicate a compression operation. And there is no
intermediate compression for creating indexes. Also their should be no full repair done
by ADD INDEX. Thus no TMD should be created.

So I agree. It does not have to do with compression. Hence it does not have to do with
_this_ bug report. Please file a new bug report. What you described suggests the
existence of another bug. Because of the size of your data it might be difficult for us
to repeat the bug at our site. But we will try anyway. If you see any chance to repeat
the problem with less data, then please help us by describing it as good as you can.

Regards,
Ingo