Bug #30076 | Unexpected output in SELECT * FROM test ORDER BY foo DESC LIMIT 75, 15; | ||
---|---|---|---|
Submitted: | 26 Jul 2007 15:46 | Modified: | 10 Aug 2007 10:37 |
Reporter: | Fabio Sgamo | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | all | OS: | Any |
Assigned to: | Igor Babaev | CPU Architecture: | Any |
Tags: | ANSI SQL T611, limit, order by, SELECT |
[26 Jul 2007 15:46]
Fabio Sgamo
[26 Jul 2007 15:50]
Fabio Sgamo
Schema and data used, tested selects and my comments
Attachment: bug_report.sql (text/plain), 16.41 KiB.
[27 Jul 2007 9:36]
Sveta Smirnova
Thank you for the report. Please provide output of SHOW VARIABLES LIKE 'char%' and SHOW VARIABLES LIKE 'collation%'
[27 Jul 2007 14:23]
Fabio Sgamo
Thank you for considering my submission. I guess you're interested in 5.0+ only, so I'll answer for this version only. Here you are: # Server version: 5.0.45 Source distribution mysql> SHOW VARIABLES LIKE 'char%'; +--------------------------+----------------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql/share/mysql/charsets/ | +--------------------------+----------------------------------------+ 8 rows in set (0.05 sec) mysql> SHOW VARIABLES LIKE 'collation%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec) In my first post I included schema and data in "old" style, since this is exactly how I recreated the "bug" in all 3 MySQL versions I have at hand. I don't actually know if it can be useful, but I think the following can't hurt: mysql> SHOW CREATE TABLE test; +-------+----------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `ID_test` varchar(10) character set latin1 collate latin1_bin NOT NULL default '', `foo` tinyint(1) unsigned NOT NULL default '0', PRIMARY KEY (`ID_test`), KEY `foo` (`foo`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 | +-------+----------------------------------------------------------------------------+ 1 row in set (0.00 sec)
[30 Jul 2007 19:05]
Valeriy Kravchuk
Looks like you compiled from sources. In this case, please, send the entire configure command line used. I can not repeat the behaviour desribed with MySQL's binaries, hence the request.
[31 Jul 2007 8:39]
Fabio Sgamo
Configure command issued on "Compressed GNU TAR archive (tar.gz)" mysql-5.0.45.tar.gz: mysql-5.0.45# CFLAGS="-O3" CXX=gcc CXXFLAGS="-O3 -felide-constructors -fno-exceptions -fno-rtti -s" ./configure '--prefix=/usr/local/mysql' '--enable-assembler' '--with-mysqld-ldflags=-all-static' '--without-innodb' (I guess you're not much interested in old versions, but just know that "compiled form source" is common to all the version I tested: 4.0.15, 4.0.26, 5.0.45.) For last version 5.0.45, I show you active lines of my.cnf, to complete the repeatability of my installation: [client] port = 3306 socket = /tmp/mysql.sock [mysqld] port = 3306 socket = /tmp/mysql.sock skip-locking key_buffer = 16K max_allowed_packet = 1M table_cache = 4 sort_buffer_size = 64K read_buffer_size = 256K read_rnd_buffer_size = 256K net_buffer_length = 2K thread_stack = 64K server-id = 1 [mysqldump] quick max_allowed_packet = 16M [mysql] no-auto-rehash [isamchk] key_buffer = 8M sort_buffer_size = 8M [myisamchk] key_buffer = 8M sort_buffer_size = 8M [mysqlhotcopy] interactive-timeout
[31 Jul 2007 10:17]
Sveta Smirnova
Thank you for the report. Verified as described. Problems are: missed 0000000010 value, repeatable results of limit: SELECT * FROM test ORDER BY foo DESC LIMIT 75, 15; ID_test foo 000000001B 0 000000001G 0 000000001C 0 000000001v 0 000000001u 0 000000001r 0 000000001p 0 000000001l 0 000000001n 0 000000001o 0 000000001q 0 000000001t 0 000000001s 0 000000001D 0 SELECT * FROM test ORDER BY foo DESC LIMIT 80, 15; ID_test foo 000000001r 0 000000001p 0 000000001l 0 000000001n 0 000000001o 0 000000001q 0 000000001t 0 000000001s 0 000000001D 0
[31 Jul 2007 10:19]
Sveta Smirnova
test case
Attachment: bug30076.test (application/octet-stream, text), 5.60 KiB.
[31 Jul 2007 17:17]
Fabio Sgamo
I'm sorry you actually verified a MySQL bug, but I'm personally happy my report has been useful: I can tell my boss I worth the money he pays me ;-P Apart from the joke, I wonder which next step will be... "Verified" implies it will be investigated and patched? "Problems are: missed 0000000010 value" means the, if I compile MySQL from source AND use '0000000010' in varchar(10), a filesort will hide some records in "paged" queries? Thanks in advance for any kind of (kind and unkind ;-) answers!
[31 Jul 2007 18:41]
Sveta Smirnova
Yes, "Verified" means bug is valid and should be processed. My results are different from yours and I missed error in start. Actually doesn't matter how you compile mysql.
[1 Aug 2007 20:43]
Sveta Smirnova
I am sorry for mistake. Output in my previous comment is expected: table has 89 rows so output of limit 75,15 is last 14 rows and output of 80,15 is last 9 rows.. And I can not repeat error with 15,15/75,15 replacement. So bug is closed as "Can't repeat"
[2 Aug 2007 11:11]
Fabio Sgamo
Ehm, I'm sorry too... in my opinion, you made a real mistake in your last post, and maybe you didn't get the real point in what I reported "Unexpected" (and that's probably why I could not understand your comment "Problems are: missed 0000000010 value"). So, I try to point out that what is "Unexpected" is not the number of records in the query result (we both agree they respect the expected counts), but the actual records that make the output set: I think you should read once more my file attachment "bug_report.sql", that ends with the following lines: # In previuos outputs, on any server version, I never got the following records # They are first 14 inserted records with foo=0 # Just notice that ID_test 0000000011, 000000000X, 0000000002 with foo=1 have been outputted on query "LIMIT 0, 15" # EXPECTED and simulated output: +------------+-----+ | ID_test | foo | +------------+-----+ | 0000000010 | 0 | | 000000000Z | 0 | | 000000001d | 0 | | 000000001b | 0 | | 000000001e | 0 | | 000000000T | 0 | | 000000000P | 0 | | 0000000019 | 0 | | 000000000c | 0 | | 000000000O | 0 | | 000000000a | 0 | | 000000000k | 0 | | 000000001m | 0 | | 0000000007 | 0 | +------------+-----+ You should easily check it out, using for instance your preferred text editor's "search" tool on a file containing the partial results of all LIMITed queries: I cannot find the above 14 ID_test strings on the results of partial queries. On the other hand, if I search in the mentioned file for the actual ID_test return by the query with "LIMIT 75, 15", I find out that these records ARE ALREADY in sets returned by previuos partial restults! In my opinion, this is a "verifiable" bug :-( I made jokes on the fact I could be wrong in discovering a new bug, with the sole intent of causing the reader to have a smile on his/her face while reading my report... but I'm quite sure I showed UNEXPECTED results, that can be easily checked. The point could be if they could be repeated, but it looks like you had same UNEXPECTED and INCORRECT results I mentioned in my submission. I tested the same query on 3 different versions (2 very old, AND THE LATEST available at writing time, considered stable since 5.0.15, October 2005!), and it's not important which values the query shows, but the fact that it does not show the correct ones. I now have to leave: I'll add some more test case in the afternoon, but please reconsider the bug classification, and the fact the I got no e-mail on the last changes made to the bug status: I discovered just by chance that a "Verified" bug was put to "Can't repeat"... I think this is bad for the whole bug system, and I notice a lot of bugs closed 2 or more years ago with no more feedback from original reporting persons. After all, please remember that my English is not good, and that I like people with smiles on face ;-)
[2 Aug 2007 17:22]
Fabio Sgamo
I finally found time to make some tests, and -I hope- more useful and clear test cases... PLEASE note you should start from very beginning, and use my "How to repeat:" initial section of this bug, since I actually verified that the proposed test case works as expected, and in that difference probably lies the cause of the submitted bug. I used the old notation, to be able to make tests on all versions from 4.0 to 5.0 . I'd like to underline this again, both 4.0 and 5.0 shows to me the same bug (not tested on ".1" versions), even if with different output: since my query use "ORDER BY foo DESC", I make no restriction on the order of the records when foo=1 and foo=0. I just claim that foo=1 come before foo=0 AND, morover, that the sum of partial results of LIMITED queries, is equivalent to the hole set: that's to say they should be partitions of the result of the query issued with no LIMIT clause. So, I'll post 4 files: "NOT_bug30076_all.sql" and "NOT_bug30076_LIMIT.sql" that show that the proposed test case (with NO INDEXES at all and default engine and chars set) does the correct job. The following "bug30076_all.sql" and "bug30076_LIMIT.sql" are the real test cases I used to repeat the bug on 3 very different servers, one of which runs the latest 5.0.45 MySQL. The *_all.sql show the complete set of the query "SELECT * FROM test ORDER BY foo DESC", while the "*_LIMIT.sql" show the partial results of the "LIMIT offset, len" queries, with len=15, and offset IN (0, 15, 30, 45, 60, 75). All the resulting records are commented out, so it should be easy for every one to try those out and compare the hole set with the sum of the partial results. By the way, I already pointed out that the partial results show replications of 14 records, while they miss the following 14: 0000000010 000000000Z 000000001d 000000001b 000000001e 000000000T 000000000P 0000000019 000000000c 000000000O 000000000a 000000000k 000000001m 0000000007 Please use "match case" option to search for those ID_test in results of your query. IF you confirm this fact, you could probably verify even that the last 14 records of query "LIMIT 75, 15" are already present in previous lines. Thanks to anyone who will try to consider my submission again :-D
[2 Aug 2007 17:25]
Fabio Sgamo
Not a bug: all records returned with schema of 'bug30076.test'
Attachment: NOT_bug30076_all.sql (text/plain), 3.80 KiB.
[2 Aug 2007 17:26]
Fabio Sgamo
Not a bug: partial record sets returned with LIMIT and schema of 'bug30076.test'
Attachment: NOT_bug30076_LIMIT.sql (text/plain), 4.15 KiB.
[2 Aug 2007 17:26]
Fabio Sgamo
Not a bug: all records returned with schema of 'How to repeat:' section
Attachment: bug30076_all.sql (text/plain), 3.89 KiB.
[2 Aug 2007 17:28]
Fabio Sgamo
BUG: partial record sets returned with LIMIT and schema of 'How to repeat:' section
Attachment: bug30076_LIMIT.sql (text/plain), 4.24 KiB.
[7 Aug 2007 10:04]
Sveta Smirnova
Thank you for the feedback. Problem was wrong order if specify KEY foo (foo). CREATE TABLE test ( ID_test varchar(10) binary NOT NULL default '', foo tinyint(1) unsigned NOT NULL default '0', KEY foo (foo) ) TYPE=MyISAM; Workaround: SELECT * FROM test ignore key(foo) ORDER BY foo DESC LIMIT 0, 15;
[8 Aug 2007 22:45]
Igor Babaev
Hi, For the query SELECT * FROM test ORDER BY foo DESC MySQL employs the filesort algorithm (see http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html): mysql> EXPLAIN SELECT * FROM test ORDER BY foo DESC; +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 89 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ 1 row in set (0.00 sec) For the query SELECT * FROM test ORDER BY foo DESC LIMIT 0,15 MySQL uses index scan if there is an index for foo: mysql> EXPLAIN SELECT * FROM test ORDER BY foo DESC LIMIT 0,15; +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | test | index | NULL | foo | 1 | NULL | 89 | | +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ 1 row in set (0.01 sec) In general index scan returns rows not in the same sequence as filesort does though both sequences are sorted by foo. If there is no index defined on foo the second query also uses filesort and the sequence of rows for the query with LIMIT is always a subquence of the the query without LIMIT. In future development we are going always to use a different algorithm for query with LIMIT (priority queue algorithm). Then even when there is no index usable for the required order the sequence of rows for the query with LIMIT will not be guaranteed to be a subsequence of rows for the query without LIMIT. This complies with the semantics of SQL absolutely. I mark the case as 'Not a bug'.
[9 Aug 2007 15:35]
Fabio Sgamo
Dear Ladies and Sirs at MySQL AB (everywhere you are ;-), I must confess I'm personally surprised, and professionally disappointed, about the way you handled my bug submission up to now. I'd like not to seem rude to you, so please just read on with open minded attitude, and consider that I'm not paying for your attention and time spent on considering my writings (I'm not an Enterprise Edition customer), but even consider that I do not get paid for my time and the POTENTIAL help I'm giving you in building a REALLY reliable and stable product (both Enterprise and Community). If I keep on with this, is just because I consider that MySQL product's quality is worth my time (since my job is greatly based on your RDBMS), and because I feel I can "pay back" MySQL AB for the Community Edition I'm using, with my knowledge and "on the road" experience. Stating clearly that I think everybody can make mistakes, but should learn on them, I invite you once more to pay better attention on what I'm considering UNEXPECTED. We'll think about considering it a MySQL bug or just a non compliance with the semantics of SQL, or even just a misunderstanding of mines, only AFTER we agree on the answer to the following question: what is a "LIMITed query" expected to return? PLEASE just put your attention only on WHAT, not on HOW, and on the words used in MY answer: I use the term "set", not "sequence", since a set is an UNORDERED collection of items, whose items are not equal to any other item of the same collection. Than I use the term "partition", that can be applied to sub-sets of a set, if their intersection is empty, AND if their union is equal to the whole set. (I try to expose math concept I'm not used to deal with in English, but I think the semantics of "set", "subset" and "partition" are inequivocable in any language). So, I call "whole set" the collection of records return by query "SELECT * FROM test ORDER BY foo DESC"; I than call subset0-15 the collection of records return by query "SELECT * FROM test ORDER BY foo DESC LIMIT 0, 15"; subset15-15 is the collection of records return by query "SELECT * FROM test ORDER BY foo DESC LIMIT 15, 15"; ... so on until we have "the problematic" subset75-15, that is the collection of records return by query "SELECT * FROM test ORDER BY foo DESC LIMIT 75, 15". I assume we all have no doubt on the fact that the table "test" is created and filled up with those 89 records showed in MY test cases. So, using these labels and terms, my answer is: if subset0-15, subset15-15, subset30-15, subset45-15, subset60-15 and subset75-15 are the results of each LIMITed query, they should be PARTITIONS of the WHOLE SET. I don't actually think anyone can disagree with my statement: I'll wait for an answer from some of you, but I just make the next move, that is comparing the semantic of my answer, with the records I get from the test case "bug30076_LIMIT.sql" I submitted on 2 Aug 19:28. I notice I miss some items of the "whole set", and I get some items twice: this is clearly not compliant with the semantic I think the LIMIT clause should respect to make using it of some meaning. I'll stop here, keeping for later a lots of other personal and "on the road" considerations, so that you can analyze the problem at an abstract level: I hope you will want to spread the question to the most of you. Feel free to "Display comment to everyone" or keep it "hidden from the public".
[9 Aug 2007 16:02]
Sergei Golubchik
Let me try to explain. To generate subset0-15, subset15-15, subset30-15, subset45-15, subset60-15, subset75-15, and the whole set you issue 7 queries. Concatenation of subsets will return you the whole set only if the order of rows will be the same in all queries. But you didn't fix it, the order of rows is undefined, and can be different in different queries. That's why subsets won't give you a whole set, when combined. To fix the row ordering you need your result set to be fully sorted, for example: SELECT * FROM test ORDER BY foo DESC, ID_test; Igor was right, saying that when the ordering is not specified, MySQL is free to return rows in arbitrary order, which practically means that the order will depend on the execution plan and optimizations. This is what SQL standard says too.
[10 Aug 2007 10:34]
Fabio Sgamo
OK, we finally agree: NOT A BUG, as I hoped to resolve the question in my very first line written 45 days ago :-D Actually I don't think "the bug is in my brain", but surely I lack a deep knowledge of ANSI SQL definitions and implementations: I will try to learn more! At this point, I ask you to add a note in http://dev.mysql.com/doc/refman/5.0/en/select.html, that is the main (if not only) place of the manual that deals with the usage of LIMIT in SELECT statement (no problem with usage of LIMIT in DELETE and UPDATE, since the semantic is different and can't be used with ORDER BY). The note should clearly point how "LIMIT offset, row_count" works and is optimized by "priority queue algorithm", if alone or in conjunction with ORDER BY, as Mr. Babaev and Mr. Golubchik explained to me. In particular a statement like the following would have saved me hours of testing and useless bug-hunting in my own code first, then in MySQL, and consequent waste of your time answering me: "When ordering is performed on a non-unique value, ties may occur and it's not guaranteed that MySQL [and any RDBMS] will fetch the rows in the same order every time". I think we all agree that MySQL main usage is in web based applications: "paginated" output is always one of the main constrain one has to deal with, and "LIMIT" is clearly the clean and natural answer. If you clearly show the right way to use MySQL for this problem, you make more than 90% of "web programmers" happy since the first time they hear about MySQL ;-)