Bug #54367 Prepared statement batch insertion loss
Submitted: 9 Jun 2010 13:44 Modified: 17 Sep 2012 8:10
Reporter: RICHARDS PETER Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version: 5.1.41-community OS:Linux (ubuntu 2.6.28-11-generic)
Assigned to: Alexander Soklakov CPU Architecture:Any

[9 Jun 2010 13:44] RICHARDS PETER
Description:
I have a program that inserts some values into the database from a queue using JMS. The database contains a column to store a blob data where I will insert a byte[]. 

The speed of the application is very important. If I don't process the messages quickly the JMS queues might hang due to pending messages. So I am trying to insert the values in the database using batch insertions. I configured the application to insert the values into the database using prepared statements(in JMS message consumers) in such a way that at every configurable amount of time or at every configurable batch size the batch will be executed. For example at the end of every 5 seconds or when the batch size becomes 100(100 messages came in the JMS message consumers since last batch insertion) inserts the prepared statement batch will be executed. I have coded it in such a way that only one will happen at a time and it resets both the count and the timestamp.

Now with configuration of batchSize as 100 msg and batchTimeOut as 5 seconds I tried inserting 2,00,000 messages into the database. Each time I add the insert query into the batch I log a message(Added message to the batch). When the thread inserts the values into the database I find the number of insertions using the java statement like this preStmtSaveRequest.executeBatch().length. I log this value also in the log file.

For all the attempts I made to insert the messages I observed that the number of messages added to the batch are 2,00,000(evident from the log statement). But the database contained fewer records than the one that was actually inserted. In all cases I had about 1 to 5 message loss. The exact count is obtained on adding the log statment value(preStmtSaveRequest.executeBatch().length) eg: 1,99,997.

I tried to do the test with batchSize as 25 msg, 50 msg, 75 msg, 1000 msg. I had losses for 75 msg and 1000 msg. For 1000 msg batchSize I got only 1,96,187 records only in the database when I tried to insert 2,00,000 messages.

I used normal Statement instance to insert values into the database. But I dont have any losses there. They are working perfectly. But in the particular case I need to insert a BLOB. So I prefered to use PreparedStatement.
 
My questions:
Is there any such property that I have to set when I use PreparedStatement with mysql?
Why do I experience these losses for say higher message size?

Mysql connector: mysql-connector-java-5.1.12-bin.jar and mysql-connector-java-5.1.8-bin
MySQL engine: MyISAM and InnoDb

How to repeat:
The scenario is well explained in the above description.
[9 Jun 2010 13:47] Tonci Grgin
Hi Peter and thanks for your report.

Would you mind attaching small but complete test case (incl connection string, DML/DDL) so I can check.
[9 Jun 2010 14:04] Mark Matthews
Are you sure you're not swallowing an exception somewhere, or ignoring warnings? We know of know cases where the driver is "losing" batch statements, but if you're not checking the return values of executeBatch(), or ignoring any exceptions, or warnings, it very well could be that some of your rows had errors, or were converted to existing rows, depending on how they were inserted.
[9 Jun 2010 14:05] RICHARDS PETER
The scenario can be reproduced using a test case like this. This scenario is with respect to the project I work on.

1) Create a JMS queue producer that writes an Object message into a queue. The object message should contain the appropriate parameters for database insertion like the id, the byte[] for insertion, etc.

2) Create a JMS queue producer that consumes messages from the queue. When we get a message in the message consumer, set the message parameters into the PreparedStatement and add it to the batch. Log this with a statement so that we can find the total number of additions into the batch. When the batch size becomes the configured value try executing the batch and commit. Log this preStmtSaveRequest.executeBatch().length so that we can find the number of total insertions. Set the timeStamp with currentTime as I said in the above description.

3) Span a new thread that performs this commit operation when the configured time is reached. Set the timeStamp with currentTime and set the batchSize as 0 at the time of commit operation.

4) Try pushing messages to the queue at a high rate. Start the listeners also.

Find the record count difference in logs and database.
[9 Jun 2010 14:13] RICHARDS PETER
Yes I am sure that I am not swallowing any Exceptions. If any Exceptions are thrown I log them into the file with the parameters. But I did not find any Exceptions in the log.

I had a similar experience earlier when I tried to insert a String value into the database using the prepared statement. There instead of the BLOB data type I used varchar. At that time we switched to normal Statment object in java. Due to time constraints we avoided the String value in the database. If the query has any ' character in any of the varchar value we won't be allowed to insert it using normal Statment right. So in that particular case we thought of logging them in the file instead of insertion into the database. But this experience with BLOB object is critical in our project. We need a solution for that.
[9 Jun 2010 15:17] Mark Matthews
Without seeing exactly how you're writing to the database (your java code), and your schema, it's hard to say what's going on. 

I'm not sure there's a bug in the JDBC driver here, only because I know there are millions of rows inserted into MySQL databases via JDBC this way every hour without issues (I work on an application myself that does just that). If there is a bug, it's specific to your use case, and a simple description doesn't allow us to reproduce the issue so that we can fix it for you.

Can you show us a testcase (in code) that doesn't rely on JMS, etc. and just does the inserts the exact same way your application does which reproduces the problem you're experiencing?
[10 Jun 2010 9:40] RICHARDS PETER
The problem comes in my project when I create a PreparedStatement instance and the Statement instance from the same Connection instance. Then I use the PreparedStatement batch and the Statement batch to add the records simultaneously into the tables using the same Connection . I am not sure if I am going wrong somewhere in the code. My project's code might look similar to the attachment. The attached code can create reproduce the issue.

It could be the problem with the logic I follow to execute the task. Please let me know if I am going wrong somewhere.

Richards.
[10 Jun 2010 9:42] RICHARDS PETER
I am attaching the table structure and the java code require. Please use mysql-connector-java-5.1.12-bin.jar file also in cp

Attachment: Connector.zip (application/zip, text), 354.52 KiB.

[10 Jun 2010 12:14] RICHARDS PETER
I am attaching a java code with not much threading that can still produce the issue. Let me know where I went wrong.

Attachment: PreInsert.java (text/x-java), 7.48 KiB.

[11 Jun 2010 5:03] RICHARDS PETER
Hi Tonci and Mark,

Any update on this issue?
[14 Jun 2010 5:58] RICHARDS PETER
Hi all,

I made a change in my code. It seems like it was my code that created the problem. I made some synchronization on the prepared statement and the statement variables when I added queries to the batch and executed the batch.

This issue never appeared with the Statement  instance. I never lost any messages though they were used by multiple threads. Just tell me if this synchronization is mandatory.

I am attaching the src code.
Richards.
[14 Jun 2010 5:59] RICHARDS PETER
Modified code with synchronization that solved the issue.

Attachment: PreInsert.java (text/x-java), 7.65 KiB.