Bug #28150 INSERT SELECT accepts invalid SELECT statement
Submitted: 28 Apr 2007 21:14 Modified: 30 Apr 2007 16:42
Reporter: Baron Schwartz (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.15-beta OS:Any
Assigned to: Valeriy Kravchuk CPU Architecture:Any
Tags: insert, qc

[28 Apr 2007 21:14] Baron Schwartz
Description:
The server accepts an invalid SELECT statement after the INSERT statement.  The statement references a table which has no source in any in-scope FROM clause.

How to repeat:
mysql> CREATE TABLE `test` (
  `i` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

mysql> insert into test(i) values(1);
Query OK, 1 row affected (0.06 sec)

mysql> insert into test(i) select i+(select max(i) from test);
ERROR 1582 (23000): Duplicate entry '1' for key 'PRIMARY'

-- So the statement did execute, and only had an error because there
-- was a duplicate key.  Adding a constant to the expression to avoid
-- the duplicate key will cause the INSERT to succeed.

-- But the SELECT is invalid, as shown here by simply removing the INSERT part:
mysql> select i+(select max(i) from test);
ERROR 1054 (42S22): Unknown column 'i' in 'field list'

mysql> select VERSION();
+-------------+
| VERSION()   |
+-------------+
| 5.1.15-beta | 
+-------------+
1 row in set (0.00 sec)

Suggested fix:
I am not sure if this is really a bug; I am not an expert on SQL's scoping rules.  but I think the 'i' source column in the outermost SELECT should be an error, because there is no FROM clause from which it can come.
[29 Apr 2007 5:56] Valeriy Kravchuk
Thank you for a problem report. "select i" in INSERT refers to column i of table you INSERT INTO (test). Try to INSERT INTO table without i column...

But, I think, this is at least a reasonable request for a more clear documentation. One can read, for example (http://dev.mysql.com/doc/refman/5.1/en/insert-select.html):

"- Currently, you cannot insert into a table and select from the same table in a subquery.

 - To avoid ambigious column reference problems when the SELECT and the INSERT refer to the same table, provide a unique alias for each table used in the SELECT part, and qualify column names in that part with the appropriate alias."

First item does not seem to be always true, based on your test case. Second item should be more clear about scoping.
[29 Apr 2007 14:36] Baron Schwartz
You said

"select i" in INSERT refers to column i
of table you INSERT INTO (test). Try to INSERT INTO table without i
column...

I don't understand how this works.  It must be selecting i FROM both a table, and from a row.  If it's selecting from the table it's inserting into, it must be selecting from the rows that already exist in the table, I think.  But there are two bizarre things about this:

1) How can it insert into a new row values that exist in some other row in the INSERT target, yet have no source in the SELECT?  What if there are many other rows?  In this case I would think it should fail with an error message similar to what happens when a scalar subquery returns more than one row.

2) it doesn't behave this way anyway.  Here is another test case with two rows in the table to begin with:

mysql> delete from test;
Query OK, 3 rows affected (0.07 sec)

mysql> insert into test(i) values(1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into test(i) select i + 5 + (select max(i) from test);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test;
+---+
| i |
+---+
| 1 | 
| 2 | 
| 7 | 
+---+
3 rows in set (0.00 sec)

That is very strange.  It is as though the 'i' part of the addition expression became zero.

Here's what I think is really happening.  It is indeed selecting 'i' from the row it is creating with the insert.  I think it creates the row with default values, then executes the subquery, then evaluates the entire SELECT, then updates the newly created row.

To test this, I changed column 'i' default value:

mysql> alter table test modify i int not null default 50;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into test(i) select i + 5 + (select max(i) from test);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test;
+----+
| i  |
+----+
|  1 | 
|  2 | 
|  7 | 
| 62 | 
+----+
4 rows in set (0.00 sec)

That result seems consistent with my theory.

This can be documented and then it'll be fine, but just don't let Joe Celko hear about it or his head will explode ;-)

I would prefer for this query to fail with an error, because no matter how well documented it is, most people will never read that documentation and the behavior is subtle enough that it will confuse them and cause errors they don't notice initially.  Plus, even if they read the documentation, I think it will be hard to understand this behavior.
[30 Apr 2007 5:36] Valeriy Kravchuk
Thank you for a more detailed investigation. Please, try to repeat with a newer version, 5.1.17, and check if it gives the same (wrong!, I agree with you) results.

I tried to repeat your last test with 5.0.37, and here is what I've got:

mysql> CREATE TABLE `test` (
    ->   `i` int(11) NOT NULL,
    ->   PRIMARY KEY (`i`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.13 sec)

mysql> alter table test modify i int not null default 50;
Query OK, 0 rows affected (0.30 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into test(i) values(1),(2);
Query OK, 2 rows affected (0.05 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into test(i) select i + 5 + (select max(i) from test);
ERROR 1110 (42000): Column 'i' specified twice
mysql> insert into test(i) select i + (select max(i) from test);
ERROR 1110 (42000): Column 'i' specified twice
mysql> insert into test(i) select i + (select max(i) from test) from test;
Query OK, 2 rows affected (0.06 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test;
+---+
| i |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.00 sec)

mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 5.0.37-community-nt |
+---------------------+
1 row in set (0.00 sec)

So, we have error messages until we add from clause. 

I was wrong in my reasoning yesterday. Sorry.
[30 Apr 2007 14:18] Baron Schwartz
I am getting /slightly/ different error messages on 5.1.17-beta, but it is throwing errors nonetheless, which is good.  I think this bug is fixed, then:

mysql> create table test(i int not null primary key) engine=innodb;
Query OK, 0 rows affected (0.06 sec)

mysql> alter table test modify i int not null default 50;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into test(i) values(1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into test(i) select i + 5 + (select max(i) from test);
ERROR 1054 (42S22): Unknown column 'i' in 'field list'
mysql> insert into test(i) select i + (select max(i) from test);
ERROR 1054 (42S22): Unknown column 'i' in 'field list'
mysql> insert into test(i) select i + (select max(i) from test) from test;
Query OK, 2 rows affected (0.43 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test;
+---+
| i |
+---+
| 1 | 
| 2 | 
| 3 | 
| 4 | 
+---+
4 rows in set (0.00 sec)

mysql> select version();
+-------------+
| version()   |
+-------------+
| 5.1.17-beta | 
+-------------+
1 row in set (0.00 sec)
[30 Apr 2007 16:42] Valeriy Kravchuk
Closed as this bug seems fixed in current 5.0.x and 5.1.x versions.