MySQL Bugs: #1576: Problems with large (>16MB) BLOBs

Bug #1576	Problems with large (>16MB) BLOBs
Submitted:	16 Oct 2003 10:29	Modified:	19 Nov 2003 9:51
Reporter:	[ name withheld ]	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / J	Severity:	S1 (Critical)
Version:	3.0.x	OS:	Linux (Redhat Linux 9)
Assigned to:	Mark Matthews	CPU Architecture:	Any

Description:
Using MySQL server 4.0.15, I am trying to insert and then extract a large BLOB (a file of approx. 22MB) into an InnoDB table. I have set the value of max_allowed_packet to a value much greater than 16MB (tried several settings - right now it's set at 128MB). I have run tests against all Connector/J releases from 3.0.6 thru 3.0.9 and have not had success with any, though the behavior differs in pretty much every case. Here are the results:

- 3.0.6: Can't insert large blob ... PacketTooBigException (I understand that this release of the driver did not set the client-side max_allowed_packet value?)

- 3.0.7: Inserts/Extracts the blob ok, but has getInt() issues (more on this below).

- 3.0.8: Extracted blob ok, but resulting file differs from original file (though the sizes are the same)

- 3.0.9: Extracted file is truncated to be smaller than the original. Extracted file is only 246 bytes.

If simply inserting and then extracting the file, the 3.0.7 release works correctly. If, however, there is also a ResultSet.getInt() call involved after the select statement to get the table row, then the following error occurs:

java.sql.SQLException: Invalid value for getInt() - 'o'

This getInt() problem occurs when using both the 3.0.7 and 3.0.9 drivers.

I have seen other bugs and references in the mailing lists related to both the large BLOB and the getInt() issues. Based on the information in these sources, both problems should be fixed in 3.0.9. I can't, however, seem to make it work.

How to repeat:
1. insert blob
2. extract blob
3. use ResultSet.getInt() to see the SQLException Invalid value for getInt()
4. don't user ResultSet.getInt() to see problems with the extracted file

Here's my table create:

create table blobtest (id int, name varchar(64), content longblob) type = innodb;

Here's a smallish java class I used to run the tests:

import java.io.*;
import java.sql.*;
 
public class Blob
{
  private static final String HOST = "localhost";
  private static final String DB = "jblobtest";
  private static final String USER = "blobtest";
  private static final String PASSWORD = "blobtest";
 
  private Connection _connection;
   
  private Blob() throws Exception
  {
    Class.forName("com.mysql.jdbc.Driver").newInstance();
 
    _connection = DriverManager.getConnection(
        "jdbc:mysql://" + HOST + "/" + DB +
        "?user=" + USER + "&password=" + PASSWORD);
  }
 
  private void insert(String filename) throws Exception
  {
    PreparedStatement ps = _connection.prepareStatement("delete from blobtest");    ps.executeUpdate();
     
    ps = _connection.prepareStatement(
        "insert into blobtest (id, name, content) values (?,?,?)");
    ps.setInt(1, 1);
    ps.setString(2, "blobtest.in");
    ps.setBinaryStream(3, new FileInputStream(filename), -1);
     
    ps.executeUpdate();
    ps.close();
  }
 
  private void extract(String filename) throws Exception
  {
    PreparedStatement ps =
      _connection.prepareStatement("select * from blobtest");
    ResultSet rs = ps.executeQuery();
     
    if ( ! rs.next() )
    {
      System.out.println("No rows...");
      return;
    }

    //     
    // Use 'int id = 0;' to get past getInt() problem...
    // 

    //int id = 0;
    int id = rs.getInt("id");
    String name = rs.getString("name");
    InputStream is = rs.getBinaryStream("content");
     
    BufferedOutputStream bos =
      new BufferedOutputStream(new FileOutputStream(filename));
     
    int c;
     
    while ( (c = is.read()) != -1 )
    {
      bos.write(c);
    }
     
    bos.flush();
    bos.close();
 
    rs.close();
    ps.close();
  }
   
  public static void main(String[] args) throws SQLException, Exception
  {
    if ( args.length != 2 )
    {
      System.out.println("usage: Blob <insert|extract> filename");
      System.exit(1);
    }
 
    Blob blob = new Blob();
     
    if ( args[0].equals("insert") ) blob.insert(args[1]);
    else if ( args[0].equals("extract") ) blob.extract(args[1]);
    else
    {
      System.out.println("error: arg[1] must be 'insert' or 'extract'");
    }
  }
}

1. java Blob insert <some input file larger than 16MB>
2. java Blob extract <some output file>
3. if no getInt() exception, diff input file and output file

FWIW, I just tried the very same Java program on a RedHat 9 (2.4.20) with 4.1.0-alpha and 3.0.9-stable and I get problems as well at the 16MB limit.
All tests below 16M is working fine with insert/extract producing identical blob files. However, at the 16M mark, insert seems to work fine, but extract hangs the Java program on the executeQuery() call. No apparent activities on the system according to top.
According to strace, the Java program is waiting on recv() for fd 3 which is the TCP connection to mysqld. I am bit new to debugging mysql but it looks like there is a thread stuck on a futex() call.
I am running with max_allowed_packet=100M and -Xmx100m on the Java program.

I'm slowly getting into debugging mysql and the JDBC driver and now I think I have some more on the problem. It was the JDBC driver hanging waiting for more data even though mysqld had finished sending. I have come to the conclusion that the multi-packet loop in MysqlIO.reuseAndReadPacket is expecting a special end-message (of length 0 or 1 byte) although the comments in sql/net_serv.cc saids that the final packet is recognized by having a length less than MAX_PACKET_LENGTH.
I added an extra break condition at the end of the loop and now the Java program runs through. However, I am now getting a small result file (108 bytes instead of 16778240). I am in the process of tracking down the reason for this.

I believe I got something working. However, I don't feel I have enough knowledge on the protocol and the JDBC code to say that all problems have been fixed. This should give some hints on what to do:

First, there is code in the beginning of MysqlIO.nextRow() that sets an offset value in case the packet was a multi-packet. This offset is then added to the buffer position for each field (column) that is read from the packet into the resultset. I don't see what should be skipped in the packet, so by leaving the offset to 0 even for multi-packets, the fields are extracted more correctly.

Secondly, there is a Buffer.readFieldLength() method which is supposed to retrieve the next field length. This method uses Buffer.readLong() for when the length code is 254. readLong constructs a length out of the next four bytes from the packet. However, according to the mysql code (function net_store_length in libmysqld/protocol.cc) there is no length encoding consisting of four bytes. Instead, for code 254 a length encoded in eight bytes is generated. This error caused the JDBC driver to include the four 00-bytes from the length in the final blob value.

Should I attach diff files for my changes or?

Final comment: I have turned on the tracing in the jdbc driver and even though the test code is producing the correct results, I am noticing that the trace saids that it has fetched 2 rows from server. The fields for the second row are all set to a length of 0. This behaviour is not seen when retrieving a smaller blob (only one row fetched according to trace). This leads me to believe there might still be more to fix.

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Please try testing a nightly snapshot from http://downloads.mysql.com/snapshots.php

This issue should be fixed, now. Thank you for your bug report.

http://bugs.mysql.com/bug.php?id=2916 with  mysql-connector-java-3.1
-nightly-20040221-bin.jar appears to be a regression to this.