Bug #3790 Opinion: JDBC begin transaction should not require setAutoCommit(false)
Submitted: 17 May 2004 1:28 Modified: 17 May 2004 15:58
Reporter: Robin Bygrave Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S4 (Feature request)
Version:309 OS:
Assigned to: CPU Architecture:Any

[17 May 2004 1:28] Robin Bygrave
Description:
I believe that with Connector/J the 'Begin Transaction' occurs at the last commit(), rollback() or setAutoCommit(false) method call.  I don't think this is the correct behaviour as per JDBC spec.

I believe there is a different interpretation of the JDBC spec.  In that when setAutoCommit(false) has been set... commit() and rollback() mark the END of a transaction, BUT the begining of the NEXT transaction is not this time, but in fact when the next statement that needs a transaction is executed.... "new transaction is started when the current SQL statement requires one and there is no transaction already in place".

Given a single connection and the following calls at times t0... t6

t0 setAutoCommit(false)
t1 executeUpdate()...
t2 commit()
t3 executeQuery()..
t4 commit()
t5 executeUpdate()..
t6 commit()

3 transactions are commited() and the transaction demarkation is..
JDBC Spec (as I see it):           Connector/J currently works
transaction1  [t1 to t2]           [t0 to t2]
transaction2  [t3 to t4]           [t2 to t4]
transaction3  [t5 to t6]           [t4 to t6]

Actually I read somewhere on this site that developers should demark the beginning of a transaction with setAutoCommit(false).  Doing this, does indeed set the begining of the transaction to the correct time.  This seems to me to be a pretty good workaround.

However, I do believe that with Connector/J transaction demarkation without setAutoCommit() does not match the JDBC spec.

JDBC 3.0 Spec extract:>>>
10.1 Transaction Boundaries and Autocommit
When to start a new transaction is a decision made implicitly by either the JDBC
driver or the underlying data source. Although some data sources implement an
explicit “begin transaction” statement, there is no JDBC API to do so. Typically, a new transaction is started when the current SQL statement requires one and there is no transaction already in place. Whether or not a given SQL statement requires a transaction is also specified by SQL99.
The Connection attribute auto-commit specifies when to end transactions.
<<<

How to repeat:
Two Connections, connection1 and connection2

// BEGIN SETUP..
create table test (ID Integer ,NAME varchar ) type=InnoDb;
insert into test Values (1, 'FIRST');
commit;
//END SETUP

t0    connection1.setAutoCommit(false)
t1    connection2.setAutoCommit(false)

t2    connection1.executeUpdate().. // SET NAME='SECOND' WHERE ID=1
t3    connection1.commit()

t4    connection2.executeQuery()... // sees NAME=FIRST, transaction begins at
                                    // t1 and I'd expect it to begin at t4
t5    connection2.commit()

If this is not clear let me know, and I'll send a proper working test case.

Suggested fix:
I think that when a connection is not using auto-commit, that a transaction begin should occur when the next statement that needs a transaction is executed [Rather than the last call to commit(), rollback() or setAutoCommit() as these times a too early].

Note: the current behaviour does not match that of another vendor - O.  That adds nothing more than to say if someone was to convert to mysql from O, the JDBC driver behaviour is different :(
[17 May 2004 2:06] Mark Matthews
The behavior you observe is an artifact of InnoDB. The JDBC driver can't easily know which queries require a transaction and which do not, it has to defer to the server. What you are seeing is _not_ an artifact of transaction boundary per-se, but the fact that you have implicitly started a transaction by setting autoCommit to false (which is of course, allowed by the JDBC spec, as when a transaction starts is implementation-dependent), which is where InnoDB takes it's snapshot.

If you want 'fresher' data, you should bracket your transactions closer to where you need the data, or commit and begin again to get a newer snapshot (as it states in the InnoDB manual). There is no efficient way for the driver to do this automatically for you, as it would require the driver to parse statements and issue extra statements to toggle the transaction boundary.

I'm turning this into a feature request that will we will look at scheduling when the server supports the ability to 'postpone' transaction demarcations.
[17 May 2004 3:07] Robin Bygrave
OK, I buy most of that with a couple of clarifications.

If you ignore checking to see if the statement requires a transaction, then the Feature request sounds much easier. It becomes...

"Assuming - autocommit false, After a commit() or rollback() the next transaction begins at the next statement executed [rather than the time of the last commit() or rollback()] - regardless of the statement."

To me, I think ignoring the 'does statement require a transaction" a reasonable thing to do.

1. I still think this is an issue of transaction boundary.
   Moreover, the difference in time between Connector/J and
   other JDBC drivers can be HUGE when using a Connection pool.

   Its a case of WHEN the transaction starts after a
   commit(), rollback() or setAutoCommit(false).

   Does it start at that time, or at the next statement?
   The time difference can be large. ie. The time difference
   between t2 and t3 can be very large.

   In my example there where 2 transactions that started with 
   the previous commit() [No setAutoCommit(false) there at all].
   In these cases I still believe that the transaction should start
   with the next statement and not the last commit(). The time between
   which could be significant when using Connection pooling. 

2. I interpret the current behaviour as...
   You need to write your JDBC code so that you EXPLICITLY begin the
   transaction when you want it to begin.  OK fine, but the JDBC spec doesn't
   say this.  AKa existing JDBC code may not do this.

3. With some connection pools there could be a VERY LONG TIME difference between
   when transactions begin with Connector/J and other JDBC drivers.
   That is, the last commit(), rollback() may have occured a long time ago 
   compared with the next statement executed.  Again the time difference 
   between t2 and t3, or t4 and t5.
[17 May 2004 5:14] Mark Matthews
> OK, I buy most of that with a couple of clarifications.
> 
> If you ignore checking to see if the statement requires a transaction,
> then the Feature request sounds much easier. It becomes...
> 
> "Assuming - autocommit false, After a commit() or rollback() the next
> transaction begins at the next statement executed [rather than the time
> of the last commit() or rollback()] - regardless of the statement."
> 
> To me, I think ignoring the 'does statement require a transaction" a
> reasonable thing to do.
> 
> 
> 1. I still think this is an issue of transaction boundary.
>    Moreover, the difference in time between Connector/J and
>    other JDBC drivers can be HUGE when using a Connection pool.

It's not the driver that is necessarily at fault, it is when the database itself decides a transaction starts. If anything, this would be a feature request against InnoDB to defer taking a snapshot until the first statement is issued after setAutoCommit(false). The driver could go through all kinds of machinations to provide the functionality you ask for, but ultimately it would make more sense to have this as a 'mode' for MySQL itself. I will check with the InnoDB team to see whether this is actually already available or not.
 
>    Its a case of WHEN the transaction starts after a
>    commit(), rollback() or setAutoCommit(false).

With a JDBC-compliant connection pool, this should not matter, as there will be no in-progress transaction when the connection is handed to your application (see further discussion below).

> 
>    Does it start at that time, or at the next statement?
>    The time difference can be large. ie. The time difference
>    between t2 and t3 can be very large.
> 
>    In my example there where 2 transactions that started with 
>    the previous commit() [No setAutoCommit(false) there at all].
>    In these cases I still believe that the transaction should start
>    with the next statement and not the last commit(). The time between
>    which could be significant when using Connection pooling. 
> 
> 2. I interpret the current behaviour as...
>    You need to write your JDBC code so that you EXPLICITLY begin the
>    transaction when you want it to begin.  OK fine, but the JDBC spec
> doesn't
>    say this.  AKa existing JDBC code may not do this.
> 
> 3. With some connection pools there could be a VERY LONG TIME
> difference between
>    when transactions begin with Connector/J and other JDBC drivers.

The JDBC spec states that a connection's default autocommit state is 'true'. Therefore it would be your code that is setting the autocommit state to false, thus producing the implicit transaction begin (therefore the transaction begins when your (or alternatively the container if you're using something like CMP) calls setAutoCommit(false) on the connection after retrieving it from the pool). 

Are you using a connection pool that _doesn't_ return connections in autocommit 'true' state?

>    That is, the last commit(), rollback() may have occured a long time
> ago 
>    compared with the next statement executed. 

Which, in my opinion is irrelevent when using a connection pool, because as far as your application should be concerned, the connection _should_ be returned to you in autocommit=true state...If the connection pool _does_ happen to not set that until immediately before your application 'sees' the connection, then yes there is another 'implicit' transaction, but your application should never 'see' it, and new transaction will be started when _your_ application calls setAutoCommit(false).

If you could please let us know what is the case in your application, we might be able to devise a feature to support it, however without knowing what state your connection pool is returning connections to you, we'd be just guessing, unfortunately.
[17 May 2004 8:14] Robin Bygrave
>>>
It's not the driver that is necessarily at fault, it is when the
database itself decides a transaction starts. If anything, this would
be a feature request against InnoDB to defer taking a snapshot until
the first statement is issued after setAutoCommit(false).
<<<
OK, except (with complete ignorance) a current JDBC Driver call to setAutoCommit(false) seems to start a transaction with Connector/J.  So, if you move that underlying 'start transaction' call so that it also gets called just before the first statement then problem solved, maybe?  Ha, I'll have to look at the code now :)

For code where setAutoCommit(false) is very close to the first statement there is no problem.  For code where there is some time between setAutoCommit(false) and the next statement, the result is different from Other JDBC drivers.
 
Note: For me, I CAN make the setAutoCommit(false) close to the first statement and for me that is a workaround that works.  For others it may not.  This is the only JDBC driver I know where the transaction starts with setAutoCommit(false) though [Having used O & MS drivers].

>>>
The JDBC spec states that a connection's default autocommit state is
'true'. Therefore ....
<<<<
I Agree EXCEPT I understood this is for when the Connection is actually Created. That is, I did not assume this is necessarily the case for when a connection is returned from a connection pool. My Connection pool defaulted all the connections to autocommit=false.

To me this is assuming setAutoCommit(false) is called VERY CLOSE to when people expect the transaction to start.  This maybe a reasonable assumption in the real world (works for me) but then again I've had to change JDBC code that worked perfectly fine with other JDBC drivers.

>>>
it would be your code that is setting the autocommit
state to false, thus producing the implicit transaction begin
(therefore the transaction begins when your (or alternatively the
container if you're using something like CMP) calls
setAutoCommit(false) on the connection after retrieving it from the
pool). Are you using a connection pool that _doesn't_ return connections in
autocommit 'true' state?
<<<
Yes, all true. The jdbc code in question assumes the connection pool is returning Connections with autocommit=false, which is set when connections are returned to the pool :)  I have changed this and for me this is fine. 

However, for someone else writing jdbc code, there maybe some time difference between when they call setAutoCommit(false) and when they execute their next statement.  For those people, they WILL get different behaviour between Connector/J and other JDBC drivers.
[17 May 2004 15:23] Heikki Tuuri
Mark,

InnoDB starts its transaction from the first SQL statement of the transaction (SELECT, INSERT, UPDATE, ...).

I do not understand how the JDBC driver makes InnoDB to start the transaction earlier, or does it?

Regards,

Heikki
[17 May 2004 15:34] Heikki Tuuri
Hi!

I tested this with the interactive mysql client, and it worked as it should.

Are you sure JDBC or your application does not run some SELECT ... from InnoDB after

SET AUTOCOMMIT=0
?

http://www.innodb.com/ibman.php#Consistent.read
"
If you are running with the default REPEATABLE READ isolation level, then all consistent reads within the same transaction read the snapshot established by the first such read in that transaction.
"

Regards,

Heikki

Your MySQL connection id is 42 to server version: 4.0.20-debug-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test;
+------+--------+
| ID   | NAME   |
+------+--------+
|    1 | SECOND |
+------+--------+
1 row in set (0.00 sec)

mysql>
[17 May 2004 15:58] Mark Matthews
> Mark,
> 
> InnoDB starts its transaction from the first SQL statement of the
> transaction (SELECT, INSERT, UPDATE, ...).
> 
> I do not understand how the JDBC driver makes InnoDB to start the
> transaction earlier, or does it?

Heikki, it does not do anything special, so if this is the case, the user should not be seeing this behavior. There would be nothing sent to the database between setAutoCommit(false) and the next query....(unless the connection pool is sending something to test if the connection is alive). 

I will make up a testcase later today, and send you exactly what the driver is sending to the database.
[17 May 2004 17:54] Mark Matthews
Heikki, I observe the same (as expected) behavior as you do with the following testcase, which updates the table on one connection, and then commits, and then immediately selects the data from another connection.

Robin, are you sure that your connection pool (or some other bit of code) isn't issuing a statement on the connection in question _before_ your code sees it? (i.e. a 'liveness check' of 'SELECT ...' or similar?)

Testcase:

public void testBug3790() throws Exception {
    	String field2OldValue = "foo";
    	String field2NewValue = "bar";
    	int field1OldValue = 1;
    	
    	Connection conn1 = null;
    	Connection conn2 = null;
    	Statement stmt1 = null;
    	Statement stmt2 = null;
    	ResultSet rs2 = null;
    	
    	Properties props = new Properties();
    		
    	try {
    		this.stmt.executeUpdate("DROP TABLE IF EXISTS testBug3790");
    		this.stmt.executeUpdate("CREATE TABLE testBug3790 (field1 INT NOT NULL PRIMARY KEY, field2 VARCHAR(32)) TYPE=InnoDB");
    		this.stmt.executeUpdate("INSERT INTO testBug3790 VALUES (" + field1OldValue + ", '" + field2OldValue + "')");
    		
    		conn1 = getConnectionWithProps(props); // creates a new connection
    		conn2 = getConnectionWithProps(props); // creates another new connection
    		conn1.setAutoCommit(false);
    		conn2.setAutoCommit(false);
    		
    		stmt1 = conn1.createStatement();
    		stmt1.executeUpdate("UPDATE testBug3790 SET field2 = '" + field2NewValue + "' WHERE field1=" + field1OldValue);
    		conn1.commit();
    		
    		stmt2 = conn2.createStatement();
    		
    		rs2 = stmt2.executeQuery("SELECT field1, field2 FROM testBug3790");
    		
    		assertTrue(rs2.next());
    		assertTrue(rs2.getInt(1) == field1OldValue);
    		assertTrue(rs2.getString(2).equals(field2NewValue));
    	} finally {
    		this.stmt.executeUpdate("DROP TABLE IF EXISTS testBug3790");
    		
    		if (rs2 != null) {
    			rs2.close();
    		}
    		
    		if (stmt2 != null) {
    			stmt2.close();
    		}
    		
    		if (stmt1 != null) {
    			stmt1.close();
    		}
    		
    		if (conn1 != null) {
    			conn1.close();
    		}
    		
    		if (conn2 != null) {
    			conn2.close();
    		}
    	}
    }
[17 May 2004 23:24] Robin Bygrave
Mark, I think you are dead on the money.  My Pool does check for liveness with a SELECT... and doesn't issue a commit() (Yikes)!!  In this case the transaction potentially begins at this time, long before I expect which is what I'm observing.

Perhaps the other JDBC drivers I've used don't start a transaction at a SELECT statement.  Hmm, seems a bit 'loose' on their part?

I'll make sure I can reproduce & then fix my bug and yet you know.  Thanks for all the help.
[21 May 2004 6:21] Robin Bygrave
Yes, I believe this is not a Bug or Feature Request.  The behaviour of Connector/J is correct and the problem was a result of two bugs in my own jdbc code.

Thanks to Mark for your efforts, and apologies for your time wasted.

For anyone interesting I note my 2 jdbc bugs.
1.  I was using a Connection pool that tested liveness with a select...
    and the associated transaction was not commited().
    This has the effect of starting a transaction well before I would expect
    assuming transaction isolation of repeatable read, or serializable.

2.  I made an assumption of the transaction isolation level.
    I assumed Connections would be created with TRANSACTION_SERIALIZABLE or TRANSACTION_REPEATABLE_READ, but in fact the default is probably jdbc vendor specific.  In short, I was previously testing code that was only using TRANSACTION_READ_COMMITED which meant that I had previously not hit problem 1 with other JDBC drivers.

In fixing my Connection Pool code, and in testing outside a connection pool the behaviour is all as I would expect - yeah!!