Bug #34701 | Restore Hangs for TPCB database | ||
---|---|---|---|
Submitted: | 20 Feb 2008 19:24 | Modified: | 29 Aug 2008 9:11 |
Reporter: | Hema Sridharan | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Backup | Severity: | S1 (Critical) |
Version: | mysql-6.0-backup | OS: | Linux |
Assigned to: | Øystein Grøvlen | CPU Architecture: | Any |
[20 Feb 2008 19:24]
Hema Sridharan
[21 Feb 2008 11:03]
MySQL Verification Team
Thank you for the bug report.
[4 Mar 2008 12:39]
Rafal Somla
I managed to repeat the problem and found the possible reason. DETAILED PROBLEM DESCRIPTION RESTORE hangs because of a bug in the backup stream library code causing it to fail to detect end of the backup stream. Backup stream is divided into fixed size blocks. The size of the block is stored at the beginning of the stream. It must be correctly read before the stream can be processed. It happens in this place inside load_buffer() function in stream_v1_transport.c: > for (i= 0; i<4; ++i) > { > block_size >>= 8; > block_size |= (*(s->buf.begin++)) << 3*8; > } Here block_size is of type "long unsigned int" and s->buf.begin is a pointer to "unsigned char". The problem is caused by implicit cast to "int" type which happens inside the expression "(*(s->buf.begin++)) << 3*8". If the highest bit becomes 1, compiler thinks that this is a negative integer. When assigning negative integer to block_size which is of unsigned type, if the size of block_size is bigger than the size of int type, the overflowing bits are set to 1, not to 0! This leads to wrong value of block_size. A fix is to use an explicit cast to "long unsigned int" like this > for (i= 0; i<4; ++i) > { > block_size >>= 8; > block_size |= ((unsigned long int)*(s->buf.begin++)) << 3*8; > } Interestingly, the problem manifests itself only on 64bit platform, because only there "long int" is bigger thant "int". Also, due to the logic of reading backup stream, the problem shows up only when more than one input block is read and that required restoring a big database. There are two problems to be fixed: - verify that no such incorrect implicit casts are done in the code, - make backup stream library more robust, so that it terminates gracefully in case stream ends, even if it is confused about block size or other parameters of the stream. Note that completely satisfactory solution of this problem is to introduce checksums in the stream which is planned to be done in the future.
[13 Mar 2008 11:50]
Sander de Boer
is it possible that 5.0.54a-enterprise-gpl-log has the same bug? a restore of my database hangs too!
[9 May 2008 12:44]
Rafal Somla
There are two related bugs: BUG#36624 and BUG#36586. After closing these bugs it should be checked if the issue was resolved.
[21 May 2008 12:13]
Rafal Somla
This is probably duplicate of BUG#36571.