how would you go about implementing natural string sort in python?
— seeknuance
natural sort order – it’s like encryption. everyone tries to roll their own, and it never works out well (if you didn’t follow the link above, jeff atwood said it best).
don’t reinvent the wheel unless you really have to. for the code in post linked above, i’d say, “don’t forget about capital letters”, and also, “length is not as important as you’d think”. imagine, for example, this [low-end,value,high-end] setup for the range:
['a','m15','z']
is m15 in that range or not?
(John, from seeknuance, observes in the comments that this and other cases are not relevant for his specific implementation; he can afford to stay much simpler, and more efficient, than my code below)
anyways, this is what i’d do:
import re def in_natural_range(low, value, high): result = [] for s in [low,value,high]: result.append( [int(c) if c.isdigit() else c.lower() for c in re.split('([0-9]+)', s)]) return ''.join([str(i) for i in sorted(result)[1]]) == value.lower() #fixed based on comment below print in_natural_range('a','b','c') print in_natural_range('1','2','3') print in_natural_range('a1','b','c1') print in_natural_range('a1','a1','a1') print in_natural_range('a1','a2','a3') print in_natural_range('a1','a22','a3') print in_natural_range('11','12','13') print in_natural_range('11','123','13')
well, if i had to i’d do that. in practice, i’d just stick to python’s natsort library.
[More programming riddles]
Thank you for reading my blog.
The case of [‘a’,’m15′,’z’] isn’t in my requirements. Each document id has >> from timeit import repeat
>>> repeat(‘ferret_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import ferret_natural_range’)
[17.68455696105957, 17.829108953475952, 18.02817988395691]
>>> repeat(‘in_natural_range(“PP988”, “PP9888”, “PP9889”)’, ‘from xxx import in_natural_range’)
[0.7457630634307861, 0.7409119606018066, 0.7389099597930908]
>>> from xxx import in_natural_range, ferret_natural_range
>>> in_natural_range(“PP988”, “PP9888”, “PP9889”)
True
>>> ferret_natural_range(“PP988”, “PP9888”, “PP9889”)
False
Within the “PP” prefixes, 988 <= 9888 <= 9889.
Cheers.
woot – thanks for catching that bug. i wasn’t excluding myself when i said *everyone* gets this wrong. fixed
cool – was trying to figure out why you wrote it the way you did, and optimization was going to be my first guess.
Boy, WordPress butchered my comment. Gah