Bug #1588 | "natural" sorting | ||
---|---|---|---|
Submitted: | 17 Oct 2003 9:47 | Modified: | 20 Feb 2007 10:46 |
Reporter: | Michael Winston | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | any | OS: | MacOS (mac os x 10.2.8) |
Assigned to: | CPU Architecture: | Any |
[17 Oct 2003 9:47]
Michael Winston
[20 Feb 2007 10:46]
Sveta Smirnova
Thank you for the reasonable feature request.
[7 Sep 2017 13:50]
Berry van Mondfrans
This does seem like a very reasonable request to implement.
[29 Sep 2019 12:07]
Dean Trower
Not sure how or why a UDF would help actually. The "right" way to implement a nat-sort is a custom *comparison* function, which compares its two inputs normally until it hits the start of a number. Then it switches to parsing the number in each of the strings, and comparing them numerically. Then if they're equal, back to lexicographical parsing... etc. The point is that "parsing a number" can be as complex and handle arbitrarily many edge cases as you like, doing it that way. But AFAIK, the comparison function MySQL uses is hidden deep inside the code for "ORDER BY", and isn't exposed to the user, via UDF functions or otherwise. For MySQL to do nat-sorting the "right" way, you'd need to implement some entirely new syntax, such as: SELECT myString FROM myTable ORDER BY myString NAT_ASC; ...where NAT_ASC or NAT_DESC trigger the nat-sort comparison method, deep inside whatever sort code MySQL uses. I don't know how feasible this is. There's a hack-ish alternative, which is to build a sort key that's just a regular string, but where you attempt to encode the numbers in it so that regular lexicographical sorting does (more-or-less) what you'd like the proper numeric comparison to do. For example, if you know you've only got plain unsigned integers (sequences of digits 0-9) and they're all going to be <10 digits long, left-padding them with '0's out to width 10 will make them sort in numerically correct order (even though you're really still sorting lexicographically). You can do this using a UDF of course, but a regular stored function can do it just fine. I've posted one here: https://stackoverflow.com/a/58154535/999120 It's still a hack, of course: It's not a perfect replacement for a built-in nat-sort by any means. But if a true nat-sort requires too much work to implement, even implementing a "hack" sort-key function like this as a native function would provide a significant advantage: We could then use it in computed columns, which isn't possible for stored or UDF functions (at the moment anyway).
[30 Sep 2019 3:19]
Dean Trower
Actually now I think better of it, the proper way to signal to MySQL that you want to use a nat-sort is via specifying a collation or collation-like property, rather than the NAT_ASC / NAT_DESC syntax I mentioned above. So in that case the syntax might be something like: SELECT myString FROM myTable ORDER BY myString COLLATE natsort(...); or SELECT myString FROM myTable ORDER BY myString COLLATE utf8mb4_unicode_ci natsort(...) ASC; Where the natsort(...) bit tells MySQL to use a natsort when it encounters numbers in the string (and the regular collation elsewhere), and the parameters (...) affect such various nat-sorting options as exist (e.g. should numbers sort in reverse order, what exactly qualifies as a number, should negative numbers sort differently to positive, what symbols are recognized for decimal and thousands, should scientific notation be parsed, how do leading zeros affect sort, what to do with version strings like '8.0.17' etc, etc.)
[17 Nov 2023 12:15]
Hamish Allan
+1 for this feature request.