sorting for humans

how would you go about implementing natural string sort in python?
— seeknuance

natural sort order – it’s like encryption. everyone tries to roll their own, and it never works out well (if you didn’t follow the link above, jeff atwood said it best).

don’t reinvent the wheel unless you really have to. for the code in post linked above, i’d say, “don’t forget about capital letters”, and also, “length is not as important as you’d think”. imagine, for example, this [low-end,value,high-end] setup for the range:

['a','m15','z']

is m15 in that range or not?

(John, from seeknuance, observes in the comments that this and other cases are not relevant for his specific implementation; he can afford to stay much simpler, and more efficient, than my code below)

anyways, this is what i’d do:


import re

def in_natural_range(low, value, high):
    result = []
    for s in [low,value,high]:
        result.append( [int(c) if c.isdigit() else c.lower() for c in re.split('([0-9]+)', s)])
    return ''.join([str(i) for i in sorted(result)[1]]) == value.lower() #fixed based on comment below

print in_natural_range('a','b','c')
print in_natural_range('1','2','3')
print in_natural_range('a1','b','c1')
print in_natural_range('a1','a1','a1')
print in_natural_range('a1','a2','a3')
print in_natural_range('a1','a22','a3')
print in_natural_range('11','12','13')
print in_natural_range('11','123','13')

well, if i had to i’d do that. in practice, i’d just stick to python’s natsort library.

[More programming riddles]

3 thoughts on “sorting for humans”

Thank you for reading my blog.

The case of [‘a’,’m15′,’z’] isn’t in my requirements. Each document id has >> from timeit import repeat
>>> repeat(‘ferret_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import ferret_natural_range’)
[17.68455696105957, 17.829108953475952, 18.02817988395691]
>>> repeat(‘in_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import in_natural_range’)
[0.7457630634307861, 0.7409119606018066, 0.7389099597930908]
>>> from xxx import in_natural_range, ferret_natural_range
>>> in_natural_range(“PP988”, “PP9888”, “PP9889”)
True
>>> ferret_natural_range(“PP988”, “PP9888”, “PP9889”)
False

Within the “PP” prefixes, 988 <= 9888 <= 9889.

Cheers.

Reply ↓

ferretfarmer on October 2, 2013 at 4:41 pm said:

woot – thanks for catching that bug. i wasn’t excluding myself when i said *everyone* gets this wrong. fixed

cool – was trying to figure out why you wrote it the way you did, and optimization was going to be my first guess.

Reply ↓

Boy, WordPress butchered my comment. Gah

Reply ↓

ferretfarmer

furor loquendi

sorting for humans

Aside

3 thoughts on “sorting for humans”

Leave a comment Cancel reply

Share this:

Related

3 thoughts on “sorting for humans”

Leave a comment Cancel reply