Bug #105285 | comparing a `bigint` value with an identical `bigint unsigned` value fails | ||
---|---|---|---|
Submitted: | 21 Oct 2021 11:18 | Modified: | 25 Oct 2021 14:00 |
Reporter: | Mats Karstad | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any (AWS RDS) |
Assigned to: | CPU Architecture: | Any |
[21 Oct 2021 11:18]
Mats Karstad
[21 Oct 2021 13:52]
MySQL Verification Team
Hi Mr. Karstad, Thank you for your bug report and for your kind words. We do not think that this is a bug, but we would like you to check certain causes of the result that you get. First of all, although both columns are big integers, you are not populating your tables with integers, but with strings. Hence, this is first thing that you should check. Next, you use a lot of nested queries, which do not seem necessary, particularly since they are both in the SELECT list. Both could be simplified , one of them to a query without a nested query and second to a single layer nested query. However, this is beyond the scope of the bug report. What you could check is whether the error is in the manner in which query is written , or not. Run only two queries. One that gives "expected" results and the one that does not give those results. However, each of those should be run first without top-level GROUP BY and then with GROUP BY. Then compare two result sets. If those make sense, then the error is in the design of your query. We are waiting on your feedback.
[22 Oct 2021 7:54]
Mats Karstad
Hi and thank you for taking the time to look at the issue we are having. 1. First of all, although both columns are big integers, you are not populating your tables with integers, but with strings. Hence, this is first thing that you should check. A. I have corrected this. The program i was using (TablePlus) added the quotes ' when copying the data as insert statements. See correct data below (I have also been able to reduce the data set needed): INSERT INTO query_issue_sale (id, calendar_id, customer_id) VALUES (12456642, 12664667, 919722), (12457298, 12665293, 919726), (12457309, 12665293, 919722), (12470462, 12678279, 919726); INSERT INTO query_issue_calendar (id) VALUES (12664667), (12665293), (12678279); 2. Next, you use a lot of nested queries, which do not seem necessary, particularly since they are both in the SELECT list. Both could be simplified , one of them to a query without a nested query and second to a single layer nested query. However, this is beyond the scope of the bug report. This is correct, the query is a bit contrived. The reason for the way the query is built up can not been seen in the test case I have created. 3. What you could check is whether the error is in the manner in which query is written , or not. Run only two queries. One that gives "expected" results and the one that does not give those results. However, each of those should be run first without top-level GROUP BY and then with GROUP BY. Then compare two result sets. If those make sense, then the error is in the design of your query. A. As mentioned in my initial bug report, I am able to get the correct result by dropping the outermost group_concat, however this is not a fix that is possible to use on the query that made us aware of this issue. I have simplified the dataset used, and the steps needed to reproduce. Please take a look at the following: select sale.customer_id, ( SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total ) as grouped_sale_ids from query_issue_sale as sale group by sale.customer_id; This query initially returns the incorrect result: customer_id grouped_sale_ids 919722 12456642,12457309 919726 12470462 If however you change the type of calender_id in query_issue_sale to bigint unsigned the correct result is returned: ALTER TABLE query_issue_sale CHANGE calendar_id calendar_id bigint unsigned NOT NULL COMMENT ''; Now running the same query again gives the expected correct result: customer_id grouped_sale_ids 919722 12456642,12457309 919726 12457298,12470462 I do agree with you that this query is suboptimal and a bit contrived. I do however find it strange and unexpected that changing a type from bigint to bigint unsigned changes the results given by the same query. I am not sure if this is helpful. But when the result is incorrect and the type is bigint, the dependant derived part of the query that matches on calendar_id is as follows: Using where; Using index. While when the type is bigint unsigned it states: Using index Analyse when type is bigint (incorrect result) 1 "PRIMARY" "sale" null "ALL" null null null null 4 100.00 "Using temporary" 2 "DEPENDENT SUBQUERY" "<derived3>" null "ALL" null null null null 2 100.00 null 3 "DEPENDENT DERIVED" "bal_sale" null "ALL" null null null null 4 25.00 "Using where" 3 "DEPENDENT DERIVED" "bal_cal" null "eq_ref" "PRIMARY" "PRIMARY" "8" "test.bal_sale.calendar_id" 1 100.00 "Using where; Using index" Analyse when type is bigint unsigned (correct result) 1 "PRIMARY" "sale" null "ALL" null null null null 4 100.00 "Using temporary" 2 "DEPENDENT SUBQUERY" "<derived3>" null "ALL" null null null null 2 100.00 null 3 "DEPENDENT DERIVED" "bal_sale" null "ALL" null null null null 4 25.00 "Using where" 3 "DEPENDENT DERIVED" "bal_cal" null "eq_ref" "PRIMARY" "PRIMARY" "8" "test.bal_sale.calendar_id" 1 100.00 "Using index" Looking forward to hearing back from you. Best regards, Mats
[22 Oct 2021 13:35]
MySQL Verification Team
Hi Mr. Karstad, Thank you for your feedback and the kind words. You have not tried to run a query with or without GROUP BY ..... However, it is better to analyse the case properly, because it could be a bug. Do note that all your nested queries are dependent one, which could be a case why this might not be a bug. Hence, I would like you to run the following queries twice: 1. SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id; 2. SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total; 3. select sale.customer_id, ( SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total ) as grouped_sale_ids from query_issue_sale as sale; 4. select sale.customer_id, ( SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total ) as grouped_sale_ids from query_issue_sale as sale group by sale.customer_id; Hence, you should run it twice. Once, when both BIGINTs are UNSIGNED and once when one of those is signed. Therefore, we are waiting on 8 (eight) result sets from you. Next, please explain why do you need two GROUP_CONCAT() queries, when single level does the job just fine. Hence, you can do without: ( SELECT group_concat(sale_ids) FROM ( )) as grouped_sale_ids; We are waiting on your feedback.
[25 Oct 2021 9:11]
Mats Karstad
Hi and thanks for the swift respons, please see my results and answer below. Results from running queries with signed and unsigned bigint: 1.; SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id; * (removed the condition "WHERE bal_sale.customer_id = sale.customer_id" as table sale is not referenced in this query) sale_ids signed: 12456642,12457298,12457309,12470462 unsigned: 12456642,12457298,12457309,12470462 2.; SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id ) AS total; * (removed the condition "WHERE bal_sale.customer_id = sale.customer_id" as table sale is not referenced in this query) sale_ids signed: 12456642,12457298,12457309,12470462 unsigned: 12456642,12457298,12457309,12470462 3.; select sale.customer_id, ( SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total ) as grouped_sale_ids from query_issue_sale as sale; customer_id grouped_sale_ids signed: 919722 12456642,12457309 919726 12470462 919722 12456642,12457309 919726 12470462 unsigned: 919722 12456642,12457309 919726 12457298,12470462 919722 12456642,12457309 919726 12457298,12470462 4.; select sale.customer_id, ( SELECT group_concat(sale_ids) FROM ( SELECT group_concat(bal_sale.id) as sale_ids FROM query_issue_sale AS bal_sale JOIN query_issue_calendar AS bal_cal ON bal_sale.calendar_id = bal_cal.id WHERE bal_sale.customer_id = sale.customer_id ) AS total ) as grouped_sale_ids from query_issue_sale as sale group by sale.customer_id; customer_id grouped_sale_ids signed: 919722,12456642,12457309 919726,12470462 unsigned: 919722 12456642,12457309 919726 12457298,12470462 Q. "Next, please explain why do you need two GROUP_CONCAT() queries, when single level does the job just fine." A. First of all, the group_concat is added to better visualise the problem we discovered (shows which record is not fetched). In the original query this is a sum function. The original query returns multiple summations. The outer most query returns the total summation for a user, while the part you have seen with nested group_concat is present due to the complexity of that summation. The next sum has conditions that requires a union query in order to ensure efficient index usage. Due to the union query two rows are returned, converting these two rows to one are the reason for the outer most sum (group_concat). Hope the above query results and an explanation for the reason for the convoluted query is sufficient. Do not hesitate to get back to me if something is unclear or you require additional information. Best regards, Mats
[25 Oct 2021 12:45]
MySQL Verification Team
Hi Mr. Karstad, Thank you for your feedback. We managed to replicate your findings, which means that this is truly a bug in our nested queries code. We do have to notice that will probably make a low priority bug, since queries like that are quite rare and workaround is easy. Verified as reported.
[25 Oct 2021 13:29]
MySQL Verification Team
Changed several fields .......
[25 Oct 2021 14:00]
Mats Karstad
Hi, I am glad the provided feedback gave you what you needed in order to isolate the issue. Thanks for a swift and professional bug reporting experience. In regards to the priority of the bug, I understand and agree with your prioritization. Mats
[26 Oct 2021 12:08]
Knut Anders Hatlen
Posted by developer: The query was rejected by MySQL before WL#461 (Derived tables dependent of outer select) added in MySQL 8.0.14. In MySQL 8.0.13 and earlier, the query failed with: "Unknown column 'sale.customer_id' in 'where clause'" The current behaviour of the query, where the value 12457298 is omitted, seems to have started with the fix for bug#31790217 (REWRITE STREAMING AGGREGATION) in MySQL 8.0.23.