Bug #102757 FIRST_VALUE() function fails with "Subquery returns more than one row"
Submitted: 28 Feb 2021 12:10 Modified: 4 Mar 2021 14:16
Reporter: Anthony Marston Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.23 OS:Windows (Windows 10)
Assigned to: CPU Architecture:Any
Tags: FIRST_VALUE()

[28 Feb 2021 12:10] Anthony Marston
Description:
This function is supposed to return the specified value from the first row in an ordered result set, yet it complains when the result set contains more than one row. This behaviour is illogical.

How to repeat:
I have two tables called PRODUCT and PRODUCT_REVISION which are constructed as follows:

CREATE TABLE product (
  product_id nvarchar(40) NOT NULL,
  product_name nvarchar(80) NOT NULL,
  PRIMARY KEY (product_id)
);
CREATE TABLE product_revision (
  product_id nvarchar(40) NOT NULL,
  revision_id nvarchar(16) NOT NULL,
  effective_date date NOT NULL DEFAULT '2001-01-01',
  discontinue_date date DEFAULT NULL,
  revision_desc nvarchar(80) DEFAULT NULL,
  prod_rev_status_id nvarchar(8) DEFAULT NULL,
  PRIMARY KEY (product_id,revision_id)
);

I then fill them with values as follows:

INSERT INTO product (product_id, product_name) VALUES (‘product1’,’Product #1);
INSERT INTO product (product_id, product_name) VALUES ('product2','Product #2);
INSERT INTO product_revision (product_id, revision_id, effective_date, discontinue_date, revision_desc) VALUES ('product2', 'V-1', '2014-07-01', '9999-12-31', 'Version 1');
INSERT INTO product_revision (product_id, revision_id, effective_date, discontinue_date, revision_desc) VALUES ('product2', 'V-2', '2020-07-01', '9999-12-31', 'Version 2');

I then run the following query:

SELECT product_id, product_name
, (SELECT FIRST_VALUE(revision_id) 
   OVER (ORDER BY effective_date DESC) 
   FROM product_revision AS pr
   WHERE pr.product_id=product.product_id
   ) AS revision_id
FROM product;

This fails with "Subquery returns more than one row"

Suggested fix:
The check on the size of the result set should be removed as it should only deal with the FIRST_ROW in that result set.
[28 Feb 2021 13:55] Frederic Descamps
Hi Anthony,

I'm not an expert in SQL Window Functions, but I think you have wrong assumption here, it will return that first value for each records matching the filter.

Now if you want only one you could limit to 1 or use DISTINCT like this:

 SQL > SELECT product_id, product_name , 
             (SELECT DISTINCT 
                 FIRST_VALUE(revision_id)     
                   OVER (ORDER BY effective_date DESC) 
             FROM product_revision AS pr    
             WHERE pr.product_id=product.product_id    ) AS revision_id 
       FROM product;
+------------+--------------+-------------+
| product_id | product_name | revision_id |
+------------+--------------+-------------+
| product1   | Product #1   | V-1         |
| product2   | Product #2   | V-2         |
+------------+--------------+-------------+
2 rows in set (0.0016 sec)

 SQL > SELECT product_id, product_name , 
             (SELECT  
                 FIRST_VALUE(revision_id)     
                   OVER (ORDER BY effective_date DESC) 
             FROM product_revision AS pr    
             WHERE pr.product_id=product.product_id LIMIT 1    ) AS revision_id 
       FROM product;
+------------+--------------+-------------+
| product_id | product_name | revision_id |
+------------+--------------+-------------+
| product1   | Product #1   | V-1         |
| product2   | Product #2   | V-2         |
+------------+--------------+-------------+
2 rows in set (0.0005 sec)

But of course the MySQL Verification Team, will process your bug and let us know if it's a bug or not.

Cheers,
[28 Feb 2021 21:59] Anthony Marston
The description of this function that I have found in several places on the internet, such as at https://www.sqltutorial.org/sql-window-functions/sql-first_value and https://www.oracletutorial.com/oracle-analytic-functions/oracle-first_value, is that this function returns a value from the first row on an ordered result set. Nowhere does it say that the result set can only consist of a single record, and the very name FIRST_VALUE implies the first of several, not the first of one.
[2 Mar 2021 14:52] MySQL Verification Team
Hi Mr. Marston,

Thank you for your bug report.

However, this is not a bug. MySQL does not follow the behaviour described in the documents you described. You better look at our Reference Manual, where you will note that , for the time being, this function is used only within windowing functions.

It is also notable to mention that sorting of  nested queries is frequently dropped, when it does not make sense. This is case here, since this function is not implemented outside of the windowing scope.

Not a bug.
[2 Mar 2021 19:39] Anthony Marston
So what you are saying is that your implementation is partial and incomplete based on your understanding of what the term "window" means for the various window functions. The most common description I have found is "The term window describes the set of rows on which the function operates" which means that the set of rows is provided by the "FROM ... ORDER BY ..." that is currently in operation. 

The description of this function is that it returns a value from the specified column from the first row in an ordered set of values. That is why the function signature is defined as "FIRST_ROW(column) OVER (ORDER BY ...)". The fact that this function requires an ORDER BY clause indicates that it can operate on several rows as a single row would never need to be sorted.

The SQL specification does not state that this function should operate differently if it is included in a subquery, so I don't see why you think that it should.

The error message "Subquery returns more than one row" is obviously wrong as this function is specifically designed to return a value from the first row in an ordered set, so while the input to the function may contain several rows the output certainly does not. How can it possibly complain that the output from a function which is designed to return a single value from a single row is "more than one"?

I repeat, this error message is totally illogical and therefore wrong.
[3 Mar 2021 15:44] MySQL Verification Team
Hi Mr. Marston,

If you read our Reference Manual, you will find out that nested queries in the SELECT list must return only one value.

Also, since first_value() is implemented only in windowing function and not in SELECT list nested queries, the error message is quite logical.

Now, the most important part. SQL standard specifies first_value() and last_value() only in windowing functions, so we are in full compliant with current SQL standard.
[3 Mar 2021 18:29] Anthony Marston
> nested queries in the SELECT list must return only one value

The sole purpose of the FIRST_VALUE() function is that it returns a value from the first row in an ordered set. It does not matter how many rows are in the input set as only one is selected for output. Therefore, by definition, it can only ever return one value.

> since first_value() is implemented only in windowing function

The fact that none of the examples show this function in a nested query is irrelevant. The manual page at https://dev.mysql.com/doc/refman/8.0/en/subqueries.html does not state that window functions are excluded, neither does the page at https://dev.mysql.com/doc/refman/8.0/en/subquery-restrictions.html nor the one at https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html.

The fact that the SQL standard does not say that window functions in general or any window function in particular cannot be used in a nested query surely means that they can, and to say otherwise is a pure fabrication.
[4 Mar 2021 13:14] MySQL Verification Team
Hi Mr. Marston,

You have completely misunderstood what I wrote. 

SQL standard stipulates that first_value() and last_value() can be used ONLY with windowing queries .....