Bug #25571 EXPLAIN returns incorrect estimation of rows for federated table
Submitted: 12 Jan 2007 7:27
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Federated storage engine Severity:S3 (Non-critical)
Version:5.0.34-BK, 5.0.27 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[12 Jan 2007 7:27] Valeriy Kravchuk
Description:
EXPLAIN shows incorrect rows estimation for federated table (if "ref" access path is used).

Let's create table as follows:

mysql> CREATE TABLE `test` (
    -> `id` int(11) NOT NULL auto_increment,
    -> `j` int(11) NOT NULL,
    -> PRIMARY KEY (`id`),
    -> KEY `j` (`j`)
    -> ) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.23 sec)

mysql> insert into test values(null, 1);
Query OK, 1 row affected (0.20 sec)

mysql> insert into test values(null, 2);
Query OK, 1 row affected (0.01 sec)

mysql> insert into test values(null, 3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test values(null, 4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test values(null, 5);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test where j=2;
+----+---+
| id | j |
+----+---+
| 10 | 2 |
+----+---+
1 row in set (0.10 sec)

mysql> explain select * from test where j=2;
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | row
s | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
|  1 | SIMPLE      | test  | ref  | j             | j    | 4       | const |
1 |       |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
1 row in set (0.00 sec)

We have correct estimation of rows for the query. Now, let's create FEDERATED table pointing to this one:

mysql> CREATE TABLE `testf` ( `id` int(11) NOT NULL auto_increment, `j` int(11)
 NOT NULL, PRIMARY KEY (`id`), KEY `j` (`j`) ) ENGINE=FEDERATED DEFAULT CHARSET
=latin1 CONNECTION='mysql://root:root@192.168.0.1:3307/test/test';
Query OK, 0 rows affected (0.02 sec)

mysql> select * from testf;
+----+---+
| id | j |
+----+---+
|  9 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
| 13 | 5 |
+----+---+
5 rows in set (0.06 sec)

mysql> select * from testf where j=2;
+----+---+
| id | j |
+----+---+
| 10 | 2 |
+----+---+
1 row in set (0.02 sec)

So, data are the same, as expected, but:

mysql> explain select * from testf where j=2;
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | row
s | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
|  1 | SIMPLE      | testf | ref  | j             | j    | 4       | const |
2 |       |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+-------+
1 row in set (0.01 sec)

EXPLAIN now estimates number of rows as 2. For "range" estimation is correct:

mysql> explain select * from testf where j<>2;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
|  1 | SIMPLE      | testf | range | j             | j    | 4       | NULL |
4 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
1 row in set (0.02 sec)

mysql> select * from testf where j<>2;
+----+---+
| id | j |
+----+---+
|  9 | 1 |
| 11 | 3 |
| 12 | 4 |
| 13 | 5 |
+----+---+
4 rows in set (0.02 sec)

And, because of the following:

mysql> analyze table testf;
+------------+---------+----------+---------------------------------------------
-------------+
| Table      | Op      | Msg_type | Msg_text
             |
+------------+---------+----------+---------------------------------------------
-------------+
| test.testf | analyze | note     | The storage engine for the table doesn't sup
port analyze |
+------------+---------+----------+---------------------------------------------
-------------+
1 row in set (0.01 sec)

there is not way to influence this incorrect estimation in case of "ref".

How to repeat:
On one server:

drop table test
CREATE TABLE `test` (
`id` int(11) NOT NULL auto_increment,
`j` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `j` (`j`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

insert into test values(null, 1);
insert into test values(null, 2);
insert into test values(null, 3);
insert into test values(null, 4);
insert into test values(null, 5);

select * from test where j=2;
explain select * from test where j=2;

On the other:

CREATE TABLE `testf` (
`id` int(11) NOT NULL auto_increment,
`j` int(11) NOT NULL,
PRIMARY KEY (`id`), 
KEY `j` (`j`) ) 
ENGINE=FEDERATED DEFAULT CHARSET = latin1
CONNECTION='mysql://root:root@192.168.0.1:3307/test/test';

select * from testf where j=2;
explain select * from testf where j=2;

Suggested fix:
Get EXPLAIN estimations from remote server? Or, better, give a way to get and store proper statistics in local data dictionary :)
[17 Jan 2007 4:41] Calvin Sun
The federated engine needs to pass EXPLAIN to the remote server and get the result back.