Bug #112372 Select query returns duplicate rows
Submitted: 18 Sep 2023 7:19 Modified: 21 Sep 2023 2:20
Reporter: Charles Sykes Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Parser Severity:S3 (Non-critical)
Version:8.0.33 OS:MacOS
Assigned to: CPU Architecture:x86

[18 Sep 2023 7:19] Charles Sykes
Description:
I have a simple table defined as follows: 

CREATE TABLE `item_image` (
  `entityId` char(26) NOT NULL,
  `itemId` char(26) NOT NULL,
  `imageOrder` tinyint DEFAULT '0',
  PRIMARY KEY (`entityId`),
  KEY `fk_itemImage_itemId` (`itemId`),
  CONSTRAINT `fk_itemImage_itemId` FOREIGN KEY (`itemId`) REFERENCES `item` (`entityId`)
);

I find that the below select query is returning duplicate rows, and in some cases is not returning records matching the criteria.

Query
select entityId, itemId, imageOrder from item_image where itemId in (
	'54ILO7TOL5DBNEHYBJ7ZUPD364', 'ADMAV2YD6ZAR5IFHB5D4DLMJDA', 'N3HLQJMSTRA4TI7ZWKX4VNI7JA', 'NBZRLM67UVCTHEME4MRFILGIDU', 
	'7O5EHQ5SK5GB3IA5PUUPCJIEL4', 'EM7CWAV75BAZTFTTVQXZWKULYI', 'IK6WRK3IZREFRO2LGRHEHS7A5M', 'HW5XWDHBY5HW7CWEYKX365UGQI', 
	'BD5BDG4ZEZFO7F4IB7KK43ZPKQ', 'KGKLWD2CFFCSFFDHXLZHGT25WA', 'MQPTICQZOBGFZMIYI636V2R3NU', 'BV3UU7N5VFGUVGY3RKUUIWPJNE', 
	'NJG4JWHUUVCXDA2P3V7BVGSZNU', 'JIUH65FAXNCMHBXURNKXYQABJM', 'JCSI7SV435G25LPW4IPRF7EITU', 'MBBY2OW6N5ATDDKDRRMDLT4PNM', 
	'MUISD7TN3RCD3B4HVNYVKE6D4U', 'JZQTRFSC7RBL7GU67LHWMGKLFY', '6AKLVWT7EFGSJG2FXIW4QNTBUA', 'AHUJQC3QL5FI5PZMTK6BE6GC4M', 
	'M47TBT3MLBD73A4M4RARVJPNHM', '2HYHXJA7JJERDOFDOSOUA5ETE4', 'HXVGHY5DKRDUBO7PJHLYT22K6E', 'NDS6ADCAY5GFNNT65XMA4DPGAI')
order by itemId, imageOrder;

Result: 
FNPHL7URHFHJ3HODB4FMEIZK3E	2HYHXJA7JJERDOFDOSOUA5ETE4	1
FNPHL7URHFHJ3HODB4FMEIZK3E	2HYHXJA7JJERDOFDOSOUA5ETE4	1
FNPHL7URHFHJ3HODB4FMEIZK3E	2HYHXJA7JJERDOFDOSOUA5ETE4	1
FNPHL7URHFHJ3HODB4FMEIZK3E	2HYHXJA7JJERDOFDOSOUA5ETE4	1
OXENJVN2XFDWBNWUPLU52HTG6Y	2HYHXJA7JJERDOFDOSOUA5ETE4	2
6KSCTYVJU5A6THC7WNWHLQQ3LM	2HYHXJA7JJERDOFDOSOUA5ETE4	3
(omitted ...) 

When I simplify the IN clause to include just itemId 2HYHXJA7JJERDOFDOSOUA5ETE4, I see all item_image records are correctly returned. Similarly, when there are just a few itemIds in the IN clause it seems to be OK. 

Query:
select entityId, itemId, imageOrder from item_image where itemId in ('2HYHXJA7JJERDOFDOSOUA5ETE4') order by itemId, imageOrder;

Results:
P53WEY6RQFCZTJNZ4PP4VA4K2E	2HYHXJA7JJERDOFDOSOUA5ETE4	0
FNPHL7URHFHJ3HODB4FMEIZK3E	2HYHXJA7JJERDOFDOSOUA5ETE4	1
OXENJVN2XFDWBNWUPLU52HTG6Y	2HYHXJA7JJERDOFDOSOUA5ETE4	2
6KSCTYVJU5A6THC7WNWHLQQ3LM	2HYHXJA7JJERDOFDOSOUA5ETE4	3

How to repeat:
I will upload a DB dump to AWS S3 by tomorrow to reproduce these exact results.
[19 Sep 2023 10:55] MySQL Verification Team
Hi Mr. Sykes,

Thank you for your bug report.

We do have some remarks and questions.

We only accept test cases that are uploaded to our site. Go to "Files" tab and you will see how to upload your data, so that we can try to reproduce them.

You have not described why is it wrong to have duplicates. Your query is not written in a manner to avoid duplicates. You have to prove your point here on why are duplicates wrong.

Next, you have to prove your point that there are rows in the result set that are missing. Please, be very specific on which rows should be in and are not there.

Last, but not least, you should use ONLY our binary that can be downloaded from our dev.mysql.com site.

We are eagerly waiting on your full and comprehensive feedback.

For now, we are not able to repeat the behaviour.
[21 Sep 2023 2:20] Charles Sykes
Hello, 

Sorry for the state of my initial report - I was not aware of the file upload feature. I have attached a dataset for which I am seeing these results. To clarify the expected result of the provided query: 

select entityId, itemId, imageOrder from item_image where itemId in (
	'54ILO7TOL5DBNEHYBJ7ZUPD364', 'ADMAV2YD6ZAR5IFHB5D4DLMJDA', 'N3HLQJMSTRA4TI7ZWKX4VNI7JA', 'NBZRLM67UVCTHEME4MRFILGIDU', 
	'7O5EHQ5SK5GB3IA5PUUPCJIEL4', 'EM7CWAV75BAZTFTTVQXZWKULYI', 'IK6WRK3IZREFRO2LGRHEHS7A5M', 'HW5XWDHBY5HW7CWEYKX365UGQI', 
	'BD5BDG4ZEZFO7F4IB7KK43ZPKQ', 'KGKLWD2CFFCSFFDHXLZHGT25WA', 'MQPTICQZOBGFZMIYI636V2R3NU', 'BV3UU7N5VFGUVGY3RKUUIWPJNE', 
	'NJG4JWHUUVCXDA2P3V7BVGSZNU', 'JIUH65FAXNCMHBXURNKXYQABJM', 'JCSI7SV435G25LPW4IPRF7EITU', 'MBBY2OW6N5ATDDKDRRMDLT4PNM', 
	'MUISD7TN3RCD3B4HVNYVKE6D4U', 'JZQTRFSC7RBL7GU67LHWMGKLFY', '6AKLVWT7EFGSJG2FXIW4QNTBUA', 'AHUJQC3QL5FI5PZMTK6BE6GC4M', 
	'M47TBT3MLBD73A4M4RARVJPNHM', '2HYHXJA7JJERDOFDOSOUA5ETE4', 'HXVGHY5DKRDUBO7PJHLYT22K6E', 'NDS6ADCAY5GFNNT65XMA4DPGAI')
order by itemId, imageOrder;

Running this query on the provided dataset will show that the item_image record with entityId=FNPHL7URHFHJ3HODB4FMEIZK3E and itemId=2HYHXJA7JJERDOFDOSOUA5ETE4 is returned 4 times (3 duplicate records). Additionally, you will see that the item_image with entityId=P53WEY6RQFCZTJNZ4PP4VA4K2E (also belonging to itemId=2HYHXJA7JJERDOFDOSOUA5ETE4 with imageOrder 0) is not returned at all. 

Please let me know if I can provide further clarification.
[21 Sep 2023 9:36] MySQL Verification Team
Hi Mr. Sykes,

Sorry, but regarding duplicate rows, have you tried to use DISTINCT ???

How did it change your results ????

Also, there is one table here and you have uploaded three files.

Can you describe what are the contents of each of the uploaded files for ?????

We can also see that value of 2HYHXJA7JJERDOFDOSOUA5ETE4  is fully contained in the result set.

We are waiting on your feedback with explanations.