Bug #107739 Information_schema TABLES.AVG_ROW_LENGTH does not return the average row length.
Submitted: 2 Jul 2022 21:28 Modified: 4 Jul 2022 14:53
Reporter: Pep Pla Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Information schema Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any

[2 Jul 2022 21:28] Pep Pla
Description:
The value of avg_row_length does not report the average row length.

Looks like it reports the data_size/rows. The problem is that this does not take into consideration the empty space inside the data pages. This makes the returned information misleading.

Having the right information could help detect fragmentation inside the page. Currently, we have no way to detect this fragmentation without manually calculating the avg_row_length.

How to repeat:
I created a table with 2000000 rows and executed the following code. As you can see, the avg_row_length changes with the row density at page level. I used innodb_fill_factor to reduce the number of rows per page and simulate fragmentation.

mysql> select count(*) from avg_row_length_bug;
+----------+
| count(*) |
+----------+
|  2000000 |
+----------+
1 row in set (0,06 sec)

mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status   | OK       |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)

mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 41
1 row in set (0,01 sec)

mysql> set global innodb_fill_factor=75;
Query OK, 0 rows affected (0,00 sec)

mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (3,64 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status   | OK       |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)

mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 59
1 row in set (0,00 sec)

mysql> set global innodb_fill_factor=50;
Query OK, 0 rows affected (0,00 sec)

mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (3,02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status   | OK       |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)

mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 91
1 row in set (0,01 sec)

mysql> set global innodb_fill_factor=25;
Query OK, 0 rows affected (0,00 sec)

mysql> alter table avg_row_length_bug engine innodb;
Query OK, 0 rows affected (4,35 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> analyze table avg_row_length_bug;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| bug.avg_row_length_bug | analyze | status   | OK       |
+------------------------+---------+----------+----------+
1 row in set (0,01 sec)

mysql> select avg_row_length from information_schema.tables where table_name like 'avg_row_length_bug'\G
*************************** 1. row ***************************
AVG_ROW_LENGTH: 186
1 row in set (0,00 sec)

Suggested fix:
The calculation of avg_row_length should not be done by dividing the number of table_rows by data_length. It should analyze the contents of the sampled pages to get the actual row length.

I guess this could be calculated (heap_top - 120) / number_of_records. This would return the avg_row_length for that page.

As we sample several pages, the calculation of the table avg_row_length is a bit more complex. Something like

(avg_row_length_page_sampled_1 * number_of_records_page_sampled_1 + ... avg_row_length_page_sampled_N * number_of_records_page_sampled_N ) / (number_of_records_page_sampled_1 + ... + number_of_records_page_sampled_N)
[4 Jul 2022 12:27] MySQL Verification Team
Hi Mr. Pla,

Thank you for your bug report.

However, this is not a bug, but a welcome feature request.

Current avg_row_length is useful for many purposes and used by a vast number of applications. We can not change its behaviour, but, we might introduce a new column in the I_S.

Hence, we are verifying your report as a valid feature request.
[4 Jul 2022 14:53] Pep Pla
I don't think this is a feature request.

The documentation states that AVG_ROW_LENGTH is "The average row length." and it is not. If you don't want to fix this and return the correct estimation of "average row length" you'll have to amend the documentation.

What this value is returning currently is DIV(DATA_LENGTH, TABLE_ROWS).

Could you explain what you mean by "Current avg_row_length is useful for many purposes"? Because as a value that is calculated using the integer division of two other values, it does not look beneficial.

I'm also surprised to discover that this is "used by a vast number of applications"... I thought those applications would benefit from getting the correct value instead of the wrong one.
[5 Jul 2022 12:07] MySQL Verification Team
Hi,

Many applications use this value, but not as a measure of the fragmentation. They use it for other purposes, most of all for the re-design of the relational entities. This is widely used for that purpose.

Regarding bug versus feature request, there will be internal discussions on many of verified features and there could be a change of status, in either direction. However, timings and dynamics are unknown and even when known, will remain strictly internal.