Bug #79400 Uncompress function could be much better
Submitted: 24 Nov 2015 20:15 Modified: 17 Dec 2018 13:29
Reporter: Kyle Lahnakoski Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:14.14 Distrib 5.6.27 OS:Any
Assigned to: CPU Architecture:Any

[24 Nov 2015 20:15] Kyle Lahnakoski
Description:
The UNCOMPRESS function is perfectly capable of unzipping blobs of data put into a table by Python's zlib.compress() function, except for those first four bytes containing the little-endian `length` of the uncompressed string.  I do not know what role the `length` plays while decompressing, but it is preventing me from accessing the uncompressed values!

How to repeat:

For some reason, this works on every example I have so far:

select 
  id, 
  length(`blob`) `length`, 
  coalesce(  
    uncompress(concat(unhex('0900000'), `blob`))
  ) `bytes`
from job_artifact

Mind you, I only have 34K examples, and none very big.  

Suggested fix:

Please have UNCOMPRESS ignore those first four bytes so magic number 9 is not needed.
[30 Nov 2015 8:58] MySQL Verification Team
Hello Kyle,

Thank yo for the report.
Could you please provide exact repeatable test case(schema and subset of data)? You may want to mark it as private after uploading here.

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Thanks,
Umesh
[5 Dec 2015 17:38] Kyle Lahnakoski
Sorry for the delay.

For the breaking test:  I expect 

    SELECT
      hex(uncompress(concat(unhex('0000000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5))))

to return '65792F81F95AD61F1A25', not null.   This test is compressing a string, and replacing the first four bytes with 0x0000, then decompressing.

For some reason this works (notice the '09...')

    SELECT
      hex(uncompress(concat(unhex('0900000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5))))

Here is a more comprehensive, but random, test I use to find particular problem sequences:

    drop table if exists test_numbers;
    create table test_numbers (
      value integer
    );
    INSERT INTO TEST_NUMBERS VALUES (0);
    INSERT INTO TEST_NUMBERS VALUES (1);
    INSERT INTO TEST_NUMBERS VALUES (2);
    INSERT INTO TEST_NUMBERS VALUES (3);
    INSERT INTO TEST_NUMBERS VALUES (4);
    INSERT INTO TEST_NUMBERS VALUES (5);
    INSERT INTO TEST_NUMBERS VALUES (6);
    INSERT INTO TEST_NUMBERS VALUES (7);
    INSERT INTO TEST_NUMBERS VALUES (8);
    INSERT INTO TEST_NUMBERS VALUES (9);

    SELECT
      substring(a.data, 1) expected,
      uncompress(concat(unhex('0000000'), substring(compress(data), 5))) result
    FROM
      (
      select
        group_concat(char(round(rand()*256)) separator '') data
      from
        test_numbers
      ) a
[13 Nov 2018 20:04] MySQL Verification Team
miguel@luz:~/dbs $ ./57c
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.25 Source distribution BUILD: 2018-NOV-11

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > CREATE DATABASE e;
Query OK, 1 row affected (0,03 sec)

mysql 5.7 > USE e
Database changed
mysql 5.7 > create table TEST_NUMBERS (
    ->       value integer
    ->     );
Query OK, 0 rows affected (0,86 sec)
mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (1);
Query OK, 1 row affected (0,13 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (2);
Query OK, 1 row affected (0,39 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (3);
Query OK, 1 row affected (0,13 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (4);
Query OK, 1 row affected (0,13 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (5);
Query OK, 1 row affected (0,14 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (6);
Query OK, 1 row affected (0,13 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (7);
Query OK, 1 row affected (0,13 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (8);
Query OK, 1 row affected (0,14 sec)

mysql 5.7 >     INSERT INTO TEST_NUMBERS VALUES (9);
Query OK, 1 row affected (0,11 sec)

mysql 5.7 > SELECT
    ->       hex(uncompress(concat(unhex('0000000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5))));
+--------------------------------------------------------------------------------------------------+
| hex(uncompress(concat(unhex('0000000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5)))) |
+--------------------------------------------------------------------------------------------------+
| NULL                                                                                             |
+--------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0,00 sec)

mysql 5.7 > SHOW WARNINGS;
+---------+------+----------------------------+
| Level   | Code | Message                    |
+---------+------+----------------------------+
| Warning | 1259 | ZLIB: Input data corrupted |
+---------+------+----------------------------+
1 row in set (0,00 sec)

mysql 5.7 > SELECT
    ->       hex(uncompress(concat(unhex('0900000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5))));
+--------------------------------------------------------------------------------------------------+
| hex(uncompress(concat(unhex('0900000'), substring(compress(unhex('65792F81F95AD61F1A25')), 5)))) |
+--------------------------------------------------------------------------------------------------+
| 65792F81F95AD61F1A25                                                                             |
+--------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)
[14 Nov 2018 12:38] MySQL Verification Team
Hi,

The length prefix is required for the uncompressing function to know how much memory is required for the uncompressed data. That is how it functions.

All that we can do is to document that fact.
[15 Dec 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".