Bug #73368 | NOT IN doesn't return expected result | ||
---|---|---|---|
Submitted: | 23 Jul 2014 20:49 | Modified: | 27 Oct 2014 17:36 |
Reporter: | Nikola Glavinceski | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.6.19 | OS: | Linux (Ubuntu 12.04) |
Assigned to: | CPU Architecture: | Any | |
Tags: | not exists, NOT IN |
[23 Jul 2014 20:49]
Nikola Glavinceski
[24 Jul 2014 13:58]
MySQL Verification Team
Hello, Your bug report looks very interesting. However, we need to have to get from you some additional data. We need to repeat the bug itself, so that we can verify it. So, .... First of all, please upload a dump of both tables. Upload them in .tar.gz format so that a file would be smaller. Next, please send us the results that differ between 5.6.13 and 5.6.19. Send only results of the query that differs between the two versions. Platform should not be the one causing this.
[24 Jul 2014 14:01]
MySQL Verification Team
Hello, Your bug report looks very interesting. However, we need to have to get from you some additional data. We need to repeat the bug itself, so that we can verify it. So, .... First of all, please upload a dump of both tables. Upload them in .tar.gz format so that a file would be smaller. Next, please send us the results that differ between 5.6.13 and 5.6.19. Send only results of the query that differs between the two versions. Platform should not be the one causing this. I guess that problematic query should be the one with NOT IN nested query. Please, also send the results, on both versions, for this query: SELECT COUNT(*) FROM `customers`, `emails_queue` WHERE `to_address` != email; Many thanks in advance.
[24 Jul 2014 14:30]
MySQL Verification Team
We also need dumps of the tables in order to check for the engine used and character sets.
[24 Jul 2014 14:41]
MySQL Verification Team
Please, can you try, with version 5.6.19, to run all of your queries with these two commands, each preceding a set of queries. set optimizer_switch='semijoin=off'; and set optimizer_switch='semijoin=on'; Run queries after each of those two SET commands. Let us know if you see any differences. Thank you in advance.
[24 Jul 2014 20:11]
Nikola Glavinceski
Hello, I am sorry but I can't provide dump of tables because they contains emails of our customers. But I was able to fix the issue by replacing "varchar(500)" with "varchar(255)" on `to_address` from the `emails_queue` table. Thanks, Nikola
[24 Jul 2014 20:25]
Nikola Glavinceski
Maybe is important to say that the charset of both tables is utf8. And the data of emails_queue.to_address are imported from customers.email. Thanks, Nikola
[24 Jul 2014 21:02]
Sveta Smirnova
Thank you for the feedback. Please provide output SHOW CREATE TABLE customers and SHOW CREATE TABLE emails_queue
[24 Jul 2014 21:35]
Nikola Glavinceski
db dump with dummy data
Attachment: test.tar.gz (application/x-gzip, text), 805.53 KiB.
[24 Jul 2014 21:38]
Nikola Glavinceski
I was able to reproduce the issue with dummy data. Please use the queries from above on MySql 5.6.19 and I believe you will see the issue.
[25 Jul 2014 18:28]
Sveta Smirnova
Thank you for the feedback. Verified as described. Bug stops showing after I delete following rows: delete from customers where id > 40000; Workaround: SELECT COUNT(*) FROM `customers` WHERE email NOT IN(SELECT `to_address` FROM `emails_queue` WHERE `to_address` = email);
[17 Sep 2014 10:06]
Guilhem Bichot
Posted by developer: other workaround: set optimizer_switch='materialization=off';
[27 Oct 2014 17:36]
Paul DuBois
Noted in 5.6.22, 5.7.6 changelogs. Certain queries for which subquery materialization or UNION DISTINCT was used together with a hash index on a temporary table could produce incorrect results or cause a server exit.
[10 Dec 2014 14:48]
Laurynas Biveinis
$ bzr log -r 6209 -n0 ------------------------------------------------------------ revno: 6209 [merge] committer: Guilhem Bichot <guilhem.bichot@oracle.com> branch nick: 5.6 timestamp: Fri 2014-10-17 11:43:53 +0200 message: merge with latest 5.6 ------------------------------------------------------------ revno: 6207.1.1 committer: Guilhem Bichot <guilhem.bichot@oracle.com> branch nick: 5.6 timestamp: Wed 2014-09-24 15:30:51 +0200 message: Fix for Bug#19297190 NOT IN DOESN'T RETURN EXPECTED RESULT We have a query with WHERE ... NOT IN (SELECT ...). The inner field is 1503 bytes (500 chars * 3, because it has utf8 charset). subquery_allows_materialization() considers subq-mat: per the fix for Bug 17566396: MATERIALIZATION IS NOT CHOSEN FOR LONG UTF8 VARCHAR FIELD, this field is small enough to _not_ be changed to a BLOB when stored into the tmp table (Item::is_blob_field() sees that 500 < CONVERT_IF_BIGGER_TO_BLOB), so subq-mat is chosen. Before that fix, the left operand of the inequality comparison would be the length in _bytes_ and subq-mat would not be chosen. In hash_sj_engine::setup(): create_tmp_table() uses MEMORY table with a regular unique index, without "unique constraint"; right after creation, setup() verifies that the table has no "unique constraint", otherwise subq-mat would fail - see comment in setup() and see how indexsubquery_engine::setup() does a regular index lookup of a key made with copy_ref_key(): to look up into "unique constraint" it would need to compute a checksum in copy_ref_key(). So far so good. At some point during materialization, MEMORY tmp table becomes full, we go into create_myisam_from_heap() then create_myisam_tmp_table(): if (keyinfo->key_length >= table->file->max_key_length() || keyinfo->user_defined_key_parts > table->file->max_key_parts() || share->uniques) { /* Can't create a key; Make a unique constraint instead of a key */ share->keys= 0; share->uniques= 1; using_unique_constraint=1; alas the first condition is 1503 > 1000 (engine is MyISAM) so we create "unique constraint" on the MyISAM table. Subq-mat is not ready for this, as we know, so we get a bad result: in fact, all look ups then return "not found", so "NOT IN" always returns TRUE. In 5.7, subq-mat has been made able to use "unique constraint" (see Bug 18227171 which was part of wl 6711); however this same testcase still crashes but for different reasons, a separate 5.7 patch will be made. For fixing 5.6: - we cannot simply undo the fix for Bug 17566396 as there was an associated customer - on the other hand we cannot modify the logic in create_myisam_tmp_table() to _not_ create a "unique constraint" in this case (it's 5.6, we must be conservative) - proposal: take a middle path: make subquery_allows_materialization() test the length (by extending is_blob_field()): if length is >=1000 bytes, then don't use subq-mat (because there is risk that MyISAM table is used and creates "unique constraint"). 1000 bytes is 333 utf8 characters. So we would have, for the limit of # of chars: - before the fix for Bug 17566396: 512/3 = 170 chars, - after it: 512 chars - after the proposed fix: 332 chars. The testcase in Bug 17566396 used 250 chars so still uses subq-mat, which is good.