Bug #19054 FEDERATED storage engine generates non-efficient queries with LIMIT clause
Submitted: 12 Apr 2006 15:00 Modified: 28 Feb 2007 22:25
Reporter: Manuel Matonin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Federated storage engine Severity:S4 (Feature request)
Version:5.1.12-BK, 5.0.23-BK, 5.0.17 OS:Any (All)
Assigned to: CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[12 Apr 2006 15:00] Manuel Matonin
Description:
FEDERATED storage engine generates SQL queries based on local table metainfo.  Current implementation meant for SELECT queries first describes table, then generates new query based on fields info and sends it to server. No LIMIT clause is taken into consideration! 

Imagine what is going to happen on client side if I have a server with 1 million records in it and I have created a federated link to this table - client side is going to fetch all data (1 million records) when I just execute a simple SELECT query with LIMIT clause. Query should return only 1 row:

SELECT 1 FROM federated_table LIMIT 1;

How to repeat:
Follow the description field.
[13 Apr 2006 13:13] Valeriy Kravchuk
Thank you for a problem report. Please, check with a newer version, 5.0.20, and inform about the results.
[17 Apr 2006 7:02] Manuel Matonin
I compared the source code of 5.0.17 vs 5.0.20 ha_federated.cpp and found no changes related to LIMIT clause!
I also tested version 5.0.20 mysqld-max-nt.exe to be 100% sure that the implementation of LIMIT clause is missing!
[17 Apr 2006 7:32] Manuel Matonin
--------------------------------------------------------------
At SERVER SIDE 
--------------------------------------------------------------

mysql> desc person;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(11)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(32) | NO   |     |         |                |
| zip   | varchar(16) | NO   |     |         |                |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> select count(*) from person;
+----------+
| count(*) |
+----------+
|   257250 |
+----------+
1 row in set (0.00 sec)

--------------------------------------------------------------
At CLIENT SIDE where FEDERATED table resides
--------------------------------------------------------------

mysql> select * from person limit 3,1;
+----+---------------+------+
| id | name          | zip  |
+----+---------------+------+
|  4 | Kalle Koolpen | 2222 |
+----+---------------+------+
1 row in set (6.36 sec)

--------------------------------------------------------------
At SERVER SIDE, SQL LOG file content related to FEDERATED table
--------------------------------------------------------------

060417 10:13:43      2 Query       SHOW TABLE STATUS LIKE 'person'
                     2 Query       SELECT `id`, `name`, `zip` FROM `person`

As seen from log file no LIMIT clause is sent to server, so all the date 
is fetched into client then. This is confirmed by the memory usage also
as the virtual memory usage grew from 44776K to 139340K on client side
during this simple query which should return only 1 row!
[20 Apr 2006 11:00] Manuel Matonin
While testing Federated engine I found more bugs. Besides LIMIT clause was missing from Federated SQL queries there were missing also the following standard clauses:

1) GROUP BY
2) HAVING
3) ORDER BY

So it means Federated engine is pretty useless until new version comes out (at the moment the latest is 5.0.20). I was wondering what is the purpose of generating new SQL query based on old query - that is the way Federated engine works right now. So far this approach has only caused bugs...
[8 Jun 2006 14:20] Valeriy Kravchuk
Verified just as described with latest 5.0.23-BK on Linux. You can use any table on "server" side, and FEDERATED storage engine will send queries like:

060608 14:53:11       4 Connect     root@toshiba-user.mshome.net on test
060608 14:53:35       4 Query       SHOW TABLE STATUS LIKE 't1'
                      4 Query       SELECT `sku`, `pr` FROM `t1`
060608 14:53:43       4 Query       SHOW TABLE STATUS LIKE 't1'
                      4 Query       SELECT `sku`, `pr` FROM `t1`

to it, both for:

SELECT * FROM federated_t1 LIMIT 1;

and for:

SELECT * FROM federated_t1 ORDER BY pk_column LIMIT 1;
[11 Jun 2006 19:43] Giuseppe Maxia
More specifically, the FEDERATED engine will cause a full table scan if the query does not use an index.

e.g., on the server side:
CREATE TABLE t1 (id int not null primary key, c char(10), key (c));

on the client side
CREATE TABLE t1_fed(id int not null primary key, c char(10))  -- notice: NO index 
engine = federated connection="mysql://user:pass@remote/test/t1";

CREATE TABLE t1_ndx_fed(id int not null primary key, c char(10), key(c)) 
engine = federated connection="mysql://user:pass@remote/test/t1";

Now, issuing these two queries
SELECT * FROM t1_fed WHERE c = 'ABC';
SELECT * FROM t1_ndx_fed WHERE c = 'ABC';

the result on the server will be:

SELECT `id`, `c` FROM t1;
SELECT `id`, `c` FROM t1 WHERE c = 'ABC';

So, the deciding factor is whether there is an index defined on the CLIENT SIDE, and this is true even when the original table does not have an index.

In this scenario, it's quite obvious that LIMIT clauses can't be honored, because the FEDERATED engine does not even honor indexes in the first place!

What's worse, though, is that a simple query like

SELECT COUNT(*) from t1_fed;

will be translated into

SELECT `id`, `c` FROM t1;

regardless of any index on either side. Clearly, *this* needs to be fixed.

Giuseppe Maxia
[26 Aug 2006 12:47] Valeriy Kravchuk
Bug #21472 was marked as a duplicate of this one. The reason of the problem mentioned there should be similar.
[26 Oct 2007 0:46] Artem Russakovskii
Have there been any developments with this bug? I am shocked to find that LIMIT is still not supported, and doing LIMIT 1 will return the whole table (I'm doing it on a table of 15 mil rows).
[26 Oct 2007 1:08] David Andersen
The severity of this should surely not be "Feature Request". As many users will use Federated to connect to tables over the internet this missing feature means that the Federated storage engine is essentially useless once one has just a few thousand rows. I feel really bad for all the users who did not go the extra mile to spend hours reading bug reports like this one before having implemented some system depending on this feature, just to discover that their solution does not scale whatsoever.
[28 Apr 2009 14:11] Daniel Fiske
2009 and still no progress? Specifically GROUP BY is "optimized" out. Can confirm that this is still present in 5.0.77
[28 Apr 2009 14:17] Daniel Fiske
Related to #26697