Bug #2053 distinct and left join not 100% stable?
Submitted: 9 Dec 2003 1:17 Modified: 11 Dec 2003 7:17
Reporter: Thomas Mayer Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.16 OS:Linux (Suse Linux 8.2)
Assigned to: Dean Ellis CPU Architecture:Any

[9 Dec 2003 1:17] Thomas Mayer
Description:
In very very seldom cases, MySQL seems either not do a correct LEFT JOIN with IS NULL parameter or it does not do a correct DISTINCT.

I have exactly the mentioned statements running several times a day with huge amount of data (some 100,000 rows/day) since about 1 year (using the first versions of mysql 4.0 and everything that followed).

It is now about the 4th or 5th time I get the error mentioned. And in my opinion it should not happen per definition. Usually, the statements run great, so the error is very very seldom.

I'm sorry, I could not reproduce it myself yet and gife you a good example.

How to repeat:
-- The table paketnummer_main looks like this
CREATE TABLE `paketnummer_main` (
  `paketnummer` bigint(11) unsigned NOT NULL default '0',
  `importdatum_max` date default NULL,
  PRIMARY KEY  (`paketnummer`)
) TYPE=InnoDB;

-- The log of my ImportScript was telling
03-12-09 06:45:14   SQL: 
create temporary table pm_neu
	select distinct t1.paketnummer
	from import_nobabscan t1 left join paketnummer_main t2 on (t1.paketnummer=t2.paketnummer)
	where t2.paketnummer is null
	and t1.paketnummer is not null;
=================    SQL ausgeführt    =================

========
03-12-09 06:45:29   SQL: 
-- Import neuer Paketnummern in paketnummer_main
insert into paketnummer_main (paketnummer)
	select paketnummer from pm_neu;
======================== SQL-FEHLER =====================
===== MySQL Error 1062: Duplicate entry '140' for key 1
=========== IMPORTABBRUCH und Programmende ==============

Suggested fix:
This should not happen per definition. Or am I wrong here?
[9 Dec 2003 10:29] Dean Ellis
It is possible if this proces is done by more than one connection, so that one connection retrieves the values to insert before another connection, and then they both attempt to insert the same values.  (Unless you are doing additional processing on the temporary table, you can skip that step and insert directly into paketnummer_main, incidentally)

If you are only doing this with a single connection, however, I have some questions:

1) Approximately how many rows are in paketnummer_main and import_nobabscan?
2) Approximately how many rows are inserted at one time?
3) Does this only occur after the server has been running for an extended period of time?
4) Once the error appears, does it keep happening until you restart mysqld or does it go away immediately?

Thank you
[10 Dec 2003 0:12] Thomas Mayer
I can be sure to only use one connection at a time for all writing access on the mentioned mysqld instance.

1) Approximately how many rows are in paketnummer_main and import_nobabscan?
paketnummer_main: 6654752 rows
import_nobabscan: 3200 rows
2) Approximately how many rows are inserted at one time?
In that case about 2800 rows. In other cases it is up to 2000000 working the same way.
3) Does this only occur after the server has been running for an extended period
of time?
Not sure. In my example it was running for 22 days until the error occured. But in earlier cases I really don't know.
4) Once the error appears, does it keep happening until you restart mysqld or
does it go away immediately?
The error did not occur once again. I did not restart mysqld and I didn't have a problem right now. Mysqld did not crash or something.
[11 Dec 2003 1:08] Thomas Mayer
This is not a bug!

The error occured because the field paketnummer in pm_neu and import_nobabscan is a string.

In paketnummer_main it is numeric.

Two distinct values (as string) can become one numeric value which leads to the error.

Thank you for your patience however.
[11 Dec 2003 7:17] Dean Ellis
Noted.

Thank you for the update.