sorting for humans

how would you go about implementing natural string sort in python?
seeknuance


natural sort order – it’s like encryption. everyone tries to roll their own, and it never works out well (if you didn’t follow the link above, jeff atwood said it best).

don’t reinvent the wheel unless you really have to. for the code in post linked above, i’d say, “don’t forget about capital letters”, and also, “length is not as important as you’d think”. imagine, for example, this [low-end,value,high-end] setup for the range:

['a','m15','z']

is m15 in that range or not?

(John, from seeknuance, observes in the comments that this and other cases are not relevant for his specific implementation; he can afford to stay much simpler, and more efficient, than my code below)

anyways, this is what i’d do:


import re

def in_natural_range(low, value, high):
    result = []
    for s in [low,value,high]:
        result.append( [int(c) if c.isdigit() else c.lower() for c in re.split('([0-9]+)', s)])
    return ''.join([str(i) for i in sorted(result)[1]]) == value.lower() #fixed based on comment below

print in_natural_range('a','b','c')
print in_natural_range('1','2','3')
print in_natural_range('a1','b','c1')
print in_natural_range('a1','a1','a1')
print in_natural_range('a1','a2','a3')
print in_natural_range('a1','a22','a3')
print in_natural_range('11','12','13')
print in_natural_range('11','123','13')

well, if i had to i’d do that. in practice, i’d just stick to python’s natsort library.


[More programming riddles]

3 thoughts on “sorting for humans

  1. Thank you for reading my blog.

    The case of [‘a’,’m15′,’z’] isn’t in my requirements. Each document id has >> from timeit import repeat
    >>> repeat(‘ferret_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import ferret_natural_range’)
    [17.68455696105957, 17.829108953475952, 18.02817988395691]
    >>> repeat(‘in_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import in_natural_range’)
    [0.7457630634307861, 0.7409119606018066, 0.7389099597930908]
    >>> from xxx import in_natural_range, ferret_natural_range
    >>> in_natural_range(“PP988”, “PP9888”, “PP9889”)
    True
    >>> ferret_natural_range(“PP988”, “PP9888”, “PP9889”)
    False

    Within the “PP” prefixes, 988 <= 9888 <= 9889.

    Cheers.

    • woot – thanks for catching that bug. i wasn’t excluding myself when i said *everyone* gets this wrong. fixed

      cool – was trying to figure out why you wrote it the way you did, and optimization was going to be my first guess.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s