MySQL Bugs: #13709: strange changing values in TABLE_ROWS of INFORMATION

Bug #13709	strange changing values in TABLE_ROWS of INFORMATION_SCHEMA.TABLES
Submitted:	3 Oct 2005 11:30	Modified:	3 Nov 2005 0:59
Reporter:	Laurenz Albe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	5.0.13-rc, 5.0.14-rc	OS:	Linux (Linux x86 (RHEL 3))
Assigned to:	Paul DuBois	CPU Architecture:	Any

Description:
When I create an InnoDB table test.test and insert enough values, the query
 
  SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES
     WHERE TABLE_NAME='test' AND SCHEMA_NAME='test';

will not only return wrong values, but the value returned will vary from
invocation to invocation.

There seems to be a connection to the length of the table row:
the longer the row, the less values I have to insert to reproduce the problem.

Running ANALYZE on the table does not make a change.

How to repeat:
The following is a shell script that reproduces the problem.
It assumes that a server is running on localhost and I can connect with
'mysql -s test'

#!/bin/bash

function generate() {
        # create a test table

        echo 'drop table if exists test.test;'
        echo 'create table test.test (id integer not null primary key,' \
                'val char(255) not null) charset=utf8 engine=innodb;'

        # insert some values

        typeset -i i=0;
        while [ $i -lt $limit ]; do
                echo "insert into test.test values ($((i+1)), 'mama')," \
                "($((i+2)), 'mama'), ($((i+3)), 'mama'), ($((i+4)), 'mama')," \
                "($((i+5)), 'mama'), ($((i+6)), 'mama'), ($((i+7)), 'mama')," \
                "($((i+8)), 'mama'), ($((i+9)), 'mama'), ($((i+10)), 'mama');"
                i=i+10
                if [ $((i%100)) -eq 0 ]; then
                        echo 'commit;'
                fi
        done
        echo 'commit;'

        # here comes the bug

        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "analyze local table test.test;"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"
        echo "select table_rows from information_schema.tables" \
                " where table_schema='test' and table_name='test';"

        # clean up

        echo 'drop table test.test;'
}

typeset -i limit=100
generate|mysql -s test
echo
echo 'Description of the output:'
echo 'the first four lines are the value TABLE_ROWS of INFORMATION_SCHEMA.TABLES'
echo 'for our test table, queried four times with the same query.'
echo 'Then, an ANALYZE TABLE is issued.'
echo 'Then the value of TABLE_ROWS is queried for more times.'
echo
echo "Note: the actual row count is $limit."

Yes, the shell script provided gives strange results really on my 5.0.14-rc BK build of September 30th, running on Fedora Core 1:

[openxs@Fedora 5.0]$ ./13709.sh
table_rows
163
table_rows
163
table_rows
112
table_rows
82
Table   Op      Msg_type        Msg_text
test.test       analyze status  OK
table_rows
70
table_rows
58
table_rows
58
table_rows
97

Description of the output:
the first four lines are the value TABLE_ROWS of
INFORMATION_SCHEMA.TABLES
for our test table, queried four times with the same query.
Then, an ANALYZE TABLE is issued.
Then the value of TABLE_ROWS is queried for more times.

Note: the actual row count is 100.

At least, this should be documented, even if it is an intended behavior.

Actually, this behaviour of InnoDB tables is documented in http://dev.mysql.com/doc/mysql/en/innodb-restrictions.html:

"ANALYZE TABLE counts cardinality by doing eight random dives to each of the index trees and updating index cardinality estimates accordingly. Note that because these are only estimates, repeated runs of ANALYZE TABLE  may produce different numbers. This makes ANALYZE TABLE fast on InnoDB tables but not 100% accurate as it doesn't take all rows into account."

and later:

"SHOW TABLE STATUS does not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization."

But I think, the http://dev.mysql.com/doc/mysql/en/tables-table.html page should have this noted in the prominent place too. Changed the category accordingly.

The passages you quoted from the documentation do not explain the observed
behaviour to me.
 
They make me think that a) repeated ANALYZE TABLE will calculate different
row counts and b) the TABLE_ROWS value will be inaccurate as a consequence.
 
What I observe is that the value of TABLE_ROWS changes from query to query
even if I do not issue an ANALYZE TABLE statement between the queries.

Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).