Bug #83304 Queries fail to return results when using Connector/J 5.1.40
Submitted: 7 Oct 2016 20:35 Modified: 12 Jan 2017 5:22
Reporter: monty solomon Email Updates:
Status: Duplicate Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:5.1.40, 5.1.39 OS:CentOS (6.7)
Assigned to: Filipe Silva CPU Architecture:Any

[7 Oct 2016 20:35] monty solomon
Description:
We upgraded from Connector/J 5.1.38 to Connector/J 5.1.40 and various queries started returning no rows.

How to repeat:
Here is an example

liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: Expected single row from select count(*) from DATABASECHANGELOGLOCK but got 0

The table does contain a row

mysql> select count(*) from DATABASECHANGELOGLOCK;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select * from DATABASECHANGELOGLOCK\G
*************************** 1. row ***************************
         ID: 1
     LOCKED:  
LOCKGRANTED: NULL
   LOCKEDBY: NULL
1 row in set (0.00 sec)

CREATE TABLE `DATABASECHANGELOGLOCK` (
  `ID` int(11) NOT NULL,
  `LOCKED` bit(1) NOT NULL,
  `LOCKGRANTED` datetime DEFAULT NULL,
  `LOCKEDBY` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
[7 Oct 2016 20:58] monty solomon
We tried using version 5.1.39 and verified the bug does not happen with version 5.1.39.
[8 Oct 2016 2:09] Lars Mikkelsen
This appears to be related to the deprecation of the EOF packet https://dev.mysql.com/worklog/task/?id=7766.

When query caching is enabled and Connector/J versions 5.1.38 and 5.1.39 are used concurrently the client may read the result incorrectly.

Version 5.1.38 always reads the EOF packet:
https://github.com/mysql/mysql-connector-j/blob/5.1.38/src/com/mysql/jdbc/MysqlIO.java#L42...

Version 5.1.39 only reads the EOF packet if the server doesn't support the CLIENT_DEPRECATE_EOF capability:
https://github.com/mysql/mysql-connector-j/blob/5.1.39/src/com/mysql/jdbc/MysqlIO.java#L42...

It seems the server (in our case version 5.7.13-6-log Percona Server) doesn't vary the query cache response based on the CLIENT_DEPRECATE_EOF client parameter.
[8 Oct 2016 2:15] Lars Mikkelsen
To clarify, if version 5.1.38 reads a response cached for version 5.1.39, the getResultSet() will call reuseAndReadPacket() to read the EOF packet although the EOF packet is missing. This causes the stream to be advanced to an invalid position for nextRowFast().
[10 Oct 2016 12:59] Filipe Silva
Hi Monty and Lars,

Thank you for taking the time to report this and for the great analysis you've done.

So far we've been unable to reproduce the issue as described by Lars.

Anyway, It's not clear to me if you are both working on the same systems and using the same configurations or not, or even if the two comments are related. So, I really need some more information. I'd like to know about server versions you're using, main configurations, both on server and on connection string used in Connector/J and operating systems used. And, please, a working test case.

Taking Lars analysis as a possibility, I would also like to know if restarting the server solved the initial problem, and if you are running different Connector/J versions against the same database simultaneously.
[10 Oct 2016 17:04] Lars Mikkelsen
MySqlTest.java

Attachment: MySqlTest.java (application/octet-stream, text), 742 bytes.

[10 Oct 2016 17:13] Lars Mikkelsen
Filipe,

Monty and I are indeed working on the same system. I'm able to reproduce the issue against 5.7.13-6-log Percona Server using the attached Java file against the schema Monty provided.

$ javac MySqlTest.java
$ java -cp mysql-connector-java-5.1.40.jar:. MySqlTest
id: 1 locked: false
Done.
$ java -cp mysql-connector-java-5.1.38.jar:. MySqlTest
Done.

We are running different Connector/J versions against the same database simultaneously and restarting the database does resolve the issue as long as we stick to one version of Connector/J.
[10 Oct 2016 18:51] monty solomon
Disabling the query cache prevents the problem from occurring.

When we experienced the problem we downgraded to version 5.1.38 and executed the statement FLUSH LOCAL TABLES to fix it. After further testing we verified that it happens when the query cache is enabled on the master. It doesn't seem to happen on the slave.

We verified the problem and can reproduce it using MySQL 5.7.15, Percona 5.7.13, and Percona 5.7.14.

I attached a copy of our my.cnf file.
[10 Oct 2016 18:59] monty solomon
We are running on CentOS 6.7.
[12 Oct 2016 18:36] Filipe Silva
I was able to replicate the bug with the information provided.

However, this needs to be fixed on the server side. I've filed the Bug#83346 to report this, you can track progress there from now on.

I'm considering this report as duplicate of Bug#83346 since it should be fixed there.

Thank you again for the excellent report and analysis.
[12 Jan 2017 5:22] monty solomon
The bug is a blocker for upgrading Connector/J from version 5.1.38 to properly use JSON columns.

Does it make sense to roll back the EOF packet fix added in Connector/J version 5.1.39 until the server team fixes the bug in Bug#83346?
[21 Mar 2017 10:26] Filipe Silva
I'm afraid we can't roll back the EOF packet deprecation patch.

But still, this is only a problem for you because you use both Connector/J 5.1.38 (or below) and Connector/J 5.1.39 (or above) with the same server at the same time, right? Can't you just update all your Connector/J libraries to the latest version?

As you know, you have the alternative to turn the query cache off. At least until the bug is not fixed in the server. I understand you may not want to do it but, it's still a workaround, right?

Thanks for your understanding.
[23 Mar 2017 15:18] Matt Ball
> Can't you just update all your Connector/J libraries to the latest version?

I don't think it is feasible for us to do this. We run a microservice architecture, with thousands of different JVMs running simultaneously, against hundreds of MySQL instances. We deploy without downtime, which means that there is always a small amount of time where old and new code is running against the same MySQL server. There is no single build, and no single deploy, that would let us upgrade everything at the same time.

Disabling the query cache is a workaround, but given the possibility of a performance regression, I think our best path forward is to wait for the server team to fix https://bugs.mysql.com/bug.php?id=83346.