Bug #116572 Inconsistent MATCH() Relevance Scores in BOOLEAN MODE When Grouping Results
Submitted: 6 Nov 2024 19:58 Modified: 8 Nov 2024 19:12
Reporter: Noman Waseem Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: full-text search

[6 Nov 2024 19:58] Noman Waseem
Description:
When using the MATCH() function in BOOLEAN MODE or in NATURAL LANGUAGE MODE, the relevance scores (tolerance values) returned in individual row queries differ from those obtained when grouping by the MATCH() result. Specifically, the grouped query assigns the same tolerance value to all rows, even when individual queries show different tolerance values.

How to repeat:
I don't know how to repeat this without being able to attach the full search_indices table here. That said, I've tried the following two queries, one up-grouped, and one grouped, in both my local machine (WampServer), as well as my production machine (Ubuntu 20.04). The table rows are different, but the effect is replicated on both, namely, the value of COUNT(DISTINCT si.id) when grouped by MATCH() AGAINST() relevance score does not match the actual number of rows when running the un-grouped query and manually counting the number of distinct si.id's for a given relevance score.

Un-Grouped Query:
-----------------

SELECT
	si.id,
	MATCH(si.dump) AGAINST ("red" IN NATURAL LANGUAGE MODE) AS `tolerance`
FROM
	search_indices si
WHERE
	MATCH(si.dump) AGAINST ("red" IN NATURAL LANGUAGE MODE)

Grouped Query:
--------------

SELECT
	MATCH(si.dump) AGAINST ("red" IN NATURAL LANGUAGE MODE) AS `tolerance`,
	COUNT(DISTINCT si.id) AS `count`
FROM
	search_indices si
WHERE
	MATCH(si.dump) AGAINST ("red" IN NATURAL LANGUAGE MODE)
GROUP BY
	`tolerance`

Here are the set of result tables for both my local machine and production machine:

Un-Grouped Results, Local:
--------------------------

674 rows with relevance score of 1.657546043395996
181 rows with relevance score of 3.315092086791992
2 rows with relevance score of 4.972638130187988
1 row with relevance score of 6.630184173583984

Grouped Results, Local:
-----------------------

674 rows with relevance score of 3.315092086791992
181 rows with relevance score of 4.972638130187988
2 rows with relevance score of 6.630184173583984
1 row with relevance score of 6.630184173583984

Un-Grouped Results, Production:
-------------------------------

1171 rows with relevance score of 1.5533268451690674
6 rows with relevance score of 3.1066536903381348

Grouped Results, Production:
-------------------------------

1171 rows with relevance score of 3.1066536903381348
6 rows with relevance score of 3.1066536903381348

Suggested fix:
I tried to improve the scores by first grouping by another column and then combining via a UNION, but I still see inaccurate grouped counts.
[6 Nov 2024 20:03] Noman Waseem
"...the value of COUNT(DISTINCT si.id) when grouped by MATCH() AGAINST() relevance score does not match the actual number of rows when running the un-grouped query and manually counting the number of distinct si.id's for a given relevance score."

Should have been:

"...a query grouped by MATCH() AGAINST() relevance score does not show the correct relevance score as compared to the same un-grouped query where I manually count the number of rows that have a given relevance score."
[7 Nov 2024 10:16] MySQL Verification Team
HI Mr. Waseem,

Thank you for your bug report.

However, let us inform you that this is a forum for the reports with fully repeatable test cases. Each of those test cases should consist of a set of SQL statements that always shows the problem that you are reporting.

You have not provided any of the relevant test cases which would make visible the problems that you are reporting. 

Hence, we can not continue to process your report.

Also, version 8.2 is no longer supported, so please use either latest 8.0 or 8.4 or 9.1.0.

Can't repeat.
[7 Nov 2024 21:08] Noman Waseem
Thanks for the feedback, and I've attached the full SQL to reproduce this. Here are the results:

--
-- Un-Grouped Select Query Results
--

id	tolerance	
46	1.7975969314575195	
47	1.7975969314575195	
66	1.7975969314575195	
67	1.7975969314575195	
37	1.759332537651062	
34	1.7408047914505005	
9	1.6008177995681763	
11	1.6008177995681763	
13	1.6008177995681763	
15	1.6008177995681763	
17	1.6008177995681763	
19	1.6008177995681763	
36	1.560104489326477

--
-- Grouped Select Query Results
--

tolerance	count	
1.6008177995681763	1	
1.7408047914505005	6	
1.759332537651062	1	
1.7975969314575195	1	
1.7975969314575195	4	

As you can see, the tolerance/count columns of the grouped query do not match up with those of the ungrouped query.
[8 Nov 2024 12:20] MySQL Verification Team
Hi Mr. Waseem,

Thank you for your bug report.

We have repeated your test case on 8.0.40, 8.4.3, 9.0.1 and 9.1.0.

We noticed few other errors as well:

id	tolerance
36	1.560104489326477
9	1.6008177995681763
11	1.6008177995681763
13	1.6008177995681763
15	1.6008177995681763
17	1.6008177995681763
19	1.6008177995681763
34	1.7408047914505005
37	1.759332537651062
46	1.7975969314575195
47	1.7975969314575195
66	1.7975969314575195
67	1.7975969314575195
tolerance	count
1.6008177995681763	1
1.7408047914505005	6
1.759332537651062	1
1.7975969314575195	1
1.7975969314575195	4

Do notice how 1.560104489326477 value simply disappeared .......

Verified for 8.0 and all higher versions.
[8 Nov 2024 19:12] Noman Waseem
Thank you, and I'm not familiar with the submission process. I've subscribed for email updates here, can I expect to be notified if there are any workarounds or patches for this?
[11 Nov 2024 10:05] MySQL Verification Team
Hi Mr. Waseem,

No, it does not work like that.

After a period of time, a team in charge of this category will put this bug in the queue for fixing. Since there a very large number of verified bugs in the queue and since higher severity bugs are coming up, the schedule for fixing any bug is totally unknown.

Also, due to the many new bugs and versions releases, the schedules are changed practically every week. Once a bug is fixed and push to some yet unavailable release, it has to pass a very severe checking by QA team. 

Once a bug has made into a release and when that release is published, then there will be a comment in this bug report where was bug fixed and  in which release. Then this report will be closed. After that, you have to wait that the release is published, which happens after short period of time.

That is a very short description of the very complex process that goes totally internally.