Bug #1008 Error in SQL_CALC_FOUND_ROWS using UNION with LIMIT
Submitted: 7 Aug 2003 1:34 Modified: 20 Aug 2003 6:26
Reporter: David Sancho Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.14 OS:Windows (Win2000)
Assigned to: Michael Widenius CPU Architecture:Any

[7 Aug 2003 1:34] David Sancho
Description:
SQL_CALC_FOUND_ROWS returns bad row number if it's used in UNIONs that have a record LIMIT. For example, "SELECT SQL_CALC_FOUND_ROWS * FROM table1 UNION SELECT * FROM table1 LIMIT 0, 10" and SELECT FOUND_ROWS() returns only 10. Table1 contains more than 10 records.

How to repeat:
Just create a query with a UNION and a LIMIT. For example (change table1)
SELECT SQL_CALC_FOUND_ROWS * FROM table1 UNION SELECT * FROM table1 LIMIT 0, 10
SELECT FOUND_ROWS()
[7 Aug 2003 2:16] David Sancho
The example is wrong because MySQL in UNIONS does not duplicate rows (is this a bug?) so use the following example:

CREATE TABLE `directclientusersmessage` (
  `IdUser` int(11) NOT NULL default '0',
  `IdDirectMessage` int(11) NOT NULL default '0',
  `Readed` datetime default NULL,
  PRIMARY KEY  (`IdUser`,`IdDirectMessage`),
  KEY `IdDirectMessage` (`IdDirectMessage`),
);

CREATE TABLE `directclientmessages` (
  `IdDirectMessage` int(11) NOT NULL default '0',
  `MessageData` text NOT NULL,
  `DateOfMessage` datetime default NULL,
  PRIMARY KEY  (`IdDirectMessage`)
);

INSERT INTO `directclientmessages` (`IdDirectMessage`, `MessageData`, `DateOfMessage`) VALUES
  (1,'Texto','2003-08-06 00:00:00'),
  (2,'Texto','2003-08-06 00:00:00'),
  (3,'Texto','2003-08-06 00:00:00'),
  (4,'Texto','2003-08-06 00:00:00'),
  (5,'Texto','2003-08-06 00:00:00'),
  (6,'Texto','2003-08-06 00:00:00'),
  (7,'Texto','2003-08-06 00:00:00'),
  (8,'Texto','2003-08-06 00:00:00'),
  (9,'Texto','2003-08-06 00:00:00'),
  (10,'Texto','2003-08-06 00:00:00'),
  (11,'Texto','2003-08-06 00:00:00'),
  (12,'Texto','2003-08-06 00:00:00'),
  (13,'Texto','2003-08-06 00:00:00'),
  (14,'Texto','2003-08-06 00:00:00'),
  (15,'Texto','2003-08-06 00:00:00'),
  (16,'Texto','2003-08-06 00:00:00'),
  (17,'Texto','2003-08-06 00:00:00'),
  (18,'Texto','2003-08-06 00:00:00'),
  (19,'Texto','2003-08-06 00:00:00'),
  (20,'Texto','2003-08-06 00:00:00'),
  (21,'Texto','2003-08-06 00:00:00'),
  (22,'Texto','2003-08-06 00:00:00');

COMMIT;

INSERT INTO `directclientusersmessage` (`IdUser`, `IdDirectMessage`, `Readed`) VALUES
  (4,1,'2003-08-07 10:10:13'),
  (4,2,'2003-08-07 10:10:13'),
  (4,3,'2003-08-07 10:10:13'),
  (4,4,'2003-08-07 10:10:13'),
  (4,5,'2003-08-07 10:10:13'),
  (4,6,'2003-08-07 10:10:13'),
  (4,7,'2003-08-07 10:10:13'),
  (4,8,'2003-08-07 10:10:13'),
  (4,9,'2003-08-07 10:10:13'),
  (4,10,'2003-08-07 10:10:13'),
  (4,11,'2003-08-07 10:10:13'),
  (4,12,'2003-08-07 10:10:13'),
  (4,13,'2003-08-07 10:10:13'),
  (4,14,'2003-08-07 10:10:13'),
  (4,15,'2003-08-07 10:10:13'),
  (4,16,'2003-08-07 10:10:13'),
  (4,17,'2003-08-07 10:10:13'),
  (4,18,'2003-08-07 10:10:13'),
  (4,19,'2003-08-07 10:10:13'),
  (4,20,'2003-08-07 10:10:13'),
  (4,21,'2003-08-06 16:51:04'),
  (4,22,'2003-08-06 16:51:19');

COMMIT;

SELECT SQL_CALC_FOUND_ROWS DirectClientMessages.* FROM DirectClientMessages INNER JOIN DirectClientUsersMessage ON DirectClientMessages.IdDirectMessage = DirectClientUsersMessage.IdDirectMessage WHERE IdUser = 4 AND Readed Is NULL UNION SELECT DirectClientMessages.* FROM DirectClientMessages INNER JOIN DirectClientUsersMessage ON DirectClientMessages.IdDirectMessage = DirectClientUsersMessage.IdDirectMessage WHERE IdUser = 4 AND NOT (DirectClientUsersMessage.Readed is NULL) ORDER BY DateOfMessage Limit 0,20

SELECT FOUND_ROWS()
Returns 20 instead of 22 (if you executed the same query without "Limit 0,20" then SELECT Found_ROWS() returns 22
[11 Aug 2003 4:58] MySQL Verification Team
Thank you for your bug report. This issue has been fixed in the latest
development tree for that product. You can find more information about
accessing our development trees at 
    http://www.mysql.com/doc/en/Installing_source_tree.html
[20 Aug 2003 6:26] Michael Widenius
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

I have now pushed a fix for SELECT SQL_CALC_FOUND_ROWS and UNION,
which solves some of the problems with this functionality.
The patch also solves a problem with LIMIT #,# when used with
UNION where there is no braces around the selects.

This doesn't however completely solve all issues that you have
described in your bug report.

First an answer to your question about 'duplicated rows'.  By default
UNION removes all duplicated rows (this is according to ANSI SQL). If
you want to keep duplicates you should use UNION ALL.

When using UNION without ALL, which automaticly deletes duplicates, MySQL
can't give an accurate number for FOUND_ROWS() when using LIMIT on
union parts.

The reason for this is that if we stored all possible matching rows
for all UNION parts, this will cause side effects when calculating the
result set in a later UNION part as we would have to distinguish with
duplicates and would-be-duplicates.

Doing an expensive work around for this special case, isn't worthwhile for
us at this point of time.

The intention of SQL_CALC_FOUND_ROWS is that it should work mainly with
global limits like:

(SELECT SQL_CALC_FOUND_ROWS * FROM t1 LIMIT 5) UNION (SELECT * FROM t2) LIMIT 1

in which case FOUND_ROWS() will return the number of rows the result
set would cointain if ONLY the last limit would be removed.

SQL_CALC_FOUND_ROWS works with UNION ALL without braces:

SELECT SQL_CALC_FOUND_ROWS * FROM t1 LIMIT 2 UNION ALL SELECT * FROM t2 LIMIT 1;

In this case FOUND_ROWS() will return the result as if there was no
limits in the query.

When used with UNION without ALL and without braces FOUND_ROWS()
should return an upper limit of the number of rows the query will returns.
(Basicly this is the number of rows without taking all duplicates into
consideration).

In MySQL 4.0.15, which will include the patch, your last query should
work as expected if you just replace UNION with UNION ALL.

If you keep using UNION, then FOUND_ROWS() may return more rows than the
original query, without limit, returns.

Regards,
Monty