Bug #1588 "natural" sorting
Submitted: 17 Oct 2003 9:47 Modified: 20 Feb 2007 10:46
Reporter: Michael Winston Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version:any OS:MacOS (mac os x 10.2.8)
Assigned to: CPU Architecture:Any

[17 Oct 2003 9:47] Michael Winston
Description:
Boy, it sure would be nice to have a natural sorting function built-in for those of us too incapable 
of writing (or even compiling) our own UDFs. So instead of
chap 1: intro
chap 10: advanced stuff
chap 2: basic stuff
we could get
chap 1: intro
chap 2: basic stuff
chap 10: advanced stuff

I did find a function ("unprefix" located at http://empyrean.lib.ndsu.nodak.edu/~nem/mysql/udf/) 
that strips off articles (a, an, the).  I would be cool to have this built-in.

These two things together would form a very powerful, useful "library"-like sorting machanism 
that is rarely found.

Thanks,
Michael

How to repeat:
Feature request.  Not a bug.
[20 Feb 2007 10:46] Sveta Smirnova
Thank you for the reasonable feature request.
[7 Sep 2017 13:50] Berry van Mondfrans
This does seem like a very reasonable request to implement.
[29 Sep 2019 12:07] Dean Trower
Not sure how or why a UDF would help actually.

The "right" way to implement a nat-sort is a custom *comparison* function, which compares its two inputs normally until it hits the start of a number.  Then it switches to parsing the number in each of the strings, and comparing them numerically.  Then if they're equal, back to lexicographical parsing... etc.

The point is that "parsing a number" can be as complex and handle arbitrarily many edge cases as you like, doing it that way.

But AFAIK, the comparison function MySQL uses is hidden deep inside the code for "ORDER BY", and isn't exposed to the user, via UDF functions or otherwise.

For MySQL to do nat-sorting the "right" way, you'd need to implement some entirely new syntax, such as:

    SELECT myString FROM myTable ORDER BY myString NAT_ASC;

...where NAT_ASC or NAT_DESC trigger the nat-sort comparison method, deep inside whatever sort code MySQL uses.

I don't know how feasible this is.

There's a hack-ish alternative, which is to build a sort key that's just a regular string, but where you attempt to encode the numbers in it so that regular lexicographical sorting does (more-or-less) what you'd like the proper numeric comparison to do.
For example, if you know you've only got plain unsigned integers (sequences of digits 0-9) and they're all going to be <10 digits long, left-padding them with '0's out to width 10 will make them sort in numerically correct order (even though you're really still sorting lexicographically).

You can do this using a UDF of course, but a regular stored function can do it just fine.  I've posted one here:  https://stackoverflow.com/a/58154535/999120

It's still a hack, of course:  It's not a perfect replacement for a built-in nat-sort by any means.

But if a true nat-sort requires too much work to implement, even implementing a "hack" sort-key function like this as a native function would provide a significant advantage:  We could then use it in computed columns, which isn't possible for stored or UDF functions (at the moment anyway).
[30 Sep 2019 3:19] Dean Trower
Actually now I think better of it, the proper way to signal to MySQL that you want to use a nat-sort is via specifying a collation or collation-like property, rather than the NAT_ASC / NAT_DESC syntax I mentioned above.  So in that case the syntax might be something like:

    SELECT myString FROM myTable ORDER BY myString COLLATE natsort(...);

or

    SELECT myString FROM myTable ORDER BY myString COLLATE utf8mb4_unicode_ci natsort(...) ASC;
 
Where the natsort(...) bit tells MySQL to use a natsort when it encounters numbers in the string (and the regular collation elsewhere), and the parameters (...) affect such various nat-sorting options as exist (e.g. should numbers sort in reverse order, what exactly qualifies as a number, should negative numbers sort differently to positive, what symbols are recognized for decimal and thousands, should scientific notation be parsed, how do leading zeros affect sort, what to do with version strings like '8.0.17' etc, etc.)
[17 Nov 2023 12:15] Hamish Allan
+1 for this feature request.