Bug #109219 doc: max_sort_length have no effect
Submitted: 28 Nov 2022 10:27 Modified: 29 Nov 2022 2:52
Reporter: Bingxi Wu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: documenation

[28 Nov 2022 10:27] Bingxi Wu
Description:
refer: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_sort_lengt...
```
he server uses only the first max_sort_length bytes of any such value and ignores the rest. Consequently, such values that differ only after the first max_sort_length bytes compare as equal for GROUP BY, ORDER BY, and DISTINCT operations.
```

In my test case: max_sort_length  no effect. 

How to repeat:
```
import random
import string
import pandas as pd

def gen_prefix(n):
    chars = string.ascii_letters + string.digits
    return ''.join(random.choice(chars) for _ in range(n))

def repeat_prefix(s,n):
    l_s = []
    l_i = []
    for i in range(n):
        l_s.append(s+str(i))
        l_i.append(i+1)
    return l_s, l_i

if __name__ == "__main__":
    n = 3

    for i in range(5000):
        s = gen_prefix(i)
        l_s, l_i = repeat_prefix(s,100)
        df = pd.DataFrame({'col1':l_s, 'col2':l_i})
        df['col3']=i
        df.to_csv('gen.csv',mode='a', columns=None, header=False, index=False)
```
Use above generate data

table struct:
```
create table tb_order(
    c1 varchar(10000),
    c2 int,
    c3 int
);
load data infile '/tmp/gen.csv' into table tb_order FIELDS TERMINATED BY ',';
```

select c1, count(*) from tb_order group by c1;

The query  will return 100 rows.

Suggested fix:
I think the result is reasonable. 
But it doesn't match the manual description. And it also takes up more memory.
[28 Nov 2022 10:30] Bingxi Wu
The query:
select  count(*), c1 from tb_order where c3=5000 group by c1;

Return 100 rows
[28 Nov 2022 10:30] Bingxi Wu
The query:
select  count(*), c1 from tb_order where c3=5000 group by c1;

Return 100 rows
[28 Nov 2022 10:33] Bingxi Wu
Please change data generation program

df['col3']=i  -> df['col3']=i+1
[28 Nov 2022 13:13] MySQL Verification Team
Hi Mr. Wu,

Thank you for your bug report.

However, this is not a bug.

First of all, we do not have a test case. You have created a bug for the server, not for some Connector. Hence, you should either provide us with a test case in SQL or upload mysqldump file.

Much more important than the above fact is that your collation and attributes do not use "PAD SPACE" flag at all, so it is not a test case for us.

Not a bug.
[28 Nov 2022 14:18] Bingxi Wu
table struct and data

Attachment: tb01.sql (application/octet-stream, text), 491.56 KiB.

[28 Nov 2022 14:22] Bingxi Wu
Add data file and test case:

MySQL [wubx]> select length(c1) from tb01 limit 10;
+------------+
| length(c1) |
+------------+
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
+------------+
10 rows in set (0.000 sec)

set max_sort_length=100;
select c1,count(*) from tb01 group by c1 ;
...
100 rows in set (0.003 sec)
[28 Nov 2022 14:22] Bingxi Wu
Add data file and test case:

MySQL [wubx]> select length(c1) from tb01 limit 10;
+------------+
| length(c1) |
+------------+
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
|       5000 |
+------------+
10 rows in set (0.000 sec)

set max_sort_length=100;
select c1,count(*) from tb01 group by c1 ;
...
100 rows in set (0.003 sec)
[28 Nov 2022 15:00] MySQL Verification Team
Hi,

This info is not enough.

You should let us know the value of max_sort_length for your installation and the collation that is chosen for that particular table.
[29 Nov 2022 2:49] Bingxi Wu
MySQL [(none)]> set names utf8mb4;
Query OK, 0 rows affected (0.000 sec)

MySQL [(none)]> \s
--------------
mysql  Ver 15.1 Distrib 10.7.3-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id:		2029
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server:			MySQL
Server version:		8.0.30 MySQL Community Server - GPL
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb4
Conn.  characterset:	utf8mb4
UNIX socket:		/tmp/mysql3306.sock
Uptime:			31 days 22 hours 57 min 39 sec

Threads: 1  Questions: 78946  Slow queries: 65  Opens: 1415  Flush tables: 6  Open tables: 360  Queries per second avg: 0.028
--------------

MySQL [(none)]> show create table wubx.tb01;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                  |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tb01  | CREATE TABLE `tb01` (
  `c1` varchar(10000) DEFAULT NULL,
  `c2` int DEFAULT NULL,
  `c3` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

MySQL [(none)]> show global variables like "max_sort_%";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_sort_length | 1024  |
+-----------------+-------+
1 row in set (0.001 sec)
[29 Nov 2022 2:52] Bingxi Wu
Can you share with me what the behavior looks like if the field is larger than the length specified by max_sort_length?
[29 Nov 2022 12:44] MySQL Verification Team
Hi Mr. Wu,

If you have read our Reference Manual, you would have found out that  attributes of your table do not affect max_sort_length at all, 

Our Manual also has the answer for the question that you asked.

Do notice that this is not a support forum, but for the reports of the true bug. And this is not a bug ....