Description:
The value of avg_row_length does not report the average row length.
Looks like it reports the data_size/rows. The problem is that this does not take into consideration the empty space inside the data pages. This makes the returned information misleading.
Having the right information could help detect fragmentation inside the page. Currently, we have no way to detect this fragmentation without manually calculating the avg_row_length.
How to repeat:
I created a table with 2000000 rows and executed the following code. As you can see, the avg_row_length changes with the row density at page level. I used innodb_fill_factor to reduce the number of rows per page and simulate fragmentation.
mysql> select count(*) from avg_row_length_bug;
+----------+
| count(*) |
+----------+
| 2000000 |
+----------+
1 row in set (0,06 sec)
mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status | OK |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)
mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 41
1 row in set (0,01 sec)
mysql> set global innodb_fill_factor=75;
Query OK, 0 rows affected (0,00 sec)
mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (3,64 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status | OK |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)
mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 59
1 row in set (0,00 sec)
mysql> set global innodb_fill_factor=50;
Query OK, 0 rows affected (0,00 sec)
mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (3,02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status | OK |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)
mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 91
1 row in set (0,01 sec)
mysql> set global innodb_fill_factor=25;
Query OK, 0 rows affected (0,00 sec)
mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (4,35 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status | OK |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)
mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 186
1 row in set (0,00 sec)
Suggested fix:
The calculation of avg_row_length should not be done by dividing the number of table_rows by data_length. It should analyze the contents of the sampled pages to get the actual row length.
I guess this could be calculated (heap_top - 120) / number_of_records. This would return the avg_row_length for that page.
As we sample several pages, the calculation of the table avg_row_length is a bit more complex. Something like
(avg_row_length_page_sampled_1 * number_of_records_page_sampled_1 + ... avg_row_length_page_sampled_N * number_of_records_page_sampled_N ) / (number_of_records_page_sampled_1 + ... + number_of_records_page_sampled_N)