Categories
python sorting

Is there a built in function for string natural sort?

380

I have a list of strings for which I would like to perform a natural alphabetical sort.

For instance, the following list is naturally sorted (what I want):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

And here’s the “sorted” version of the above list (what I get using sorted()):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

I’m looking for a sort function which behaves like the first one.

1

335

There is a third party library for this on PyPI called natsort (full disclosure, I am the package’s author). For your case, you can do either of the following:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

You should note that natsort uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the natsort documentation’s How It Works page, in particular the Special Cases Everywhere! section.


If you need a sorting key instead of a sorting function, use either of the below formulas.

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Update November 2020

Given that a popular request/question is “how to sort like Windows Explorer?” (or whatever is your operating system’s file system browser), as of natsort version 7.1.0 there is a function called os_sorted to do exactly this. On Windows, it will sort in the same order as Windows Explorer, and on other operating systems it should sort like whatever is the local file system browser.

>>> from natsort import os_sorted
>>> os_sorted(list_of_paths)
# your paths sorted like your file system browser

For those needing a sort key, you can use os_sort_keygen (or os_sort_key if you just need the defaults).

Caveat – Please read the API documentation for this function before you use to understand the limitations and how to get best results.

5

  • 6

    I also think it’s quite interesting that natsort also sorts correct when the number is not at the end: like it’s often the case for filenames. Feel free to include the following example: pastebin.com/9cwCLdEK

    Jul 17, 2014 at 18:51

  • 4

    Natsort is a great library, should be added to python standard library! 🙂

    Oct 24, 2019 at 5:03

  • natsort also ‘naturally’ handles the case of multiple separate numbers in the strings. Great stuff!

    – FlorianH

    Apr 13, 2020 at 16:17

  • 1

    @SethMMorton Windows sorts files in this order ['!2020', '.2020', '2020']. Natsort returns ['2020', '!2020', '.2020']. Can it sort like Windows?

    – F. Vosnim

    Jul 1, 2020 at 18:41

  • 1

    Thank you for os_sorted, that’s worked for me to match the ordering in Finder on my Mac.

    Dec 27, 2021 at 18:23


239

Try this:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
    return sorted(l, key=alphanum_key)

Output:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Code adapted from here: Sorting for Humans : Natural Sort Order.

9

  • 2

    why do you use return sorted(l, key) instead of l.sort(key)? Is it for any performance gain or just to be more pythonic?

    – jperelli

    Aug 15, 2012 at 16:33

  • 14

    @jperelli I think the ladder would change the original list in the caller. But most likely the caller wants another shallow copy of the list.

    – huggie

    Aug 30, 2012 at 5:00

  • 3

    Just for the record, this can’t handle all inputs: the str/int splits must line up, otherwise you’ll create comparisons like [“foo”,0] < [0,”foo”] for the input [“foo0″,”0foo”], which raises a TypeError.

    – user19087

    Jan 11, 2017 at 0:25


  • 5

    @user19087: In fact it does work, because re.split('([0-9]+)', '0foo') returns ['', '0', 'foo']. Because of that, strings will always be on even indexes and integers on odd indexes in the array.

    Oct 30, 2017 at 9:13


  • For anyone wondering about performance, this is notably slower than python’s native sort. i.e. 25 -50x slower. And if you want to always sort [elm1, elm2, Elm2, elm2] as [elm1, Elm2, elm2, elm2] reliably (caps before lower), then you can simply call natural_sort(sorted(lst)). More inefficient, but very easy to get a repeatable sort. Compile the regex for a ~50% speedup. as seen in Claudiu’s answer.

    Aug 26, 2019 at 21:22


125

Here’s a much more pythonic version of Mark Byer’s answer:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

Now this function can be used as a key in any function that uses it, like list.sort, sorted, max, etc.

As a lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

Fully reproducible demo code:

import re
natsort = lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]
L = ["a1", "a10", "a11", "a2", "a22", "a3"]   
print(sorted(L, key=natsort))  
# ['a1', 'a2', 'a3', 'a10', 'a11', 'a22'] 

8

  • 10

    re module compiles and caches regexes automagically, so there is no need to precompile

    – wim

    Jan 22, 2014 at 17:17

  • 2

    @wim: it caches the last X usages, so it’s technically possible to use X+5 regexes and then do a natural sort over and over, at which point this wouldn’t be cached. but probably negligible in the long run

    – Claudiu

    Jan 22, 2014 at 17:51

  • I did not do it, but perhaps the reason was that it can’t handle tuples, like a regular python sort.

    May 8, 2015 at 18:51

  • 3

    The X usages mentioned by @Claudiu seem to be 100 on Python 2.7 and 512 on Python 3.4. And also note that when the limit is reached the cache is completely cleared (so it’s not only the oldest one that is thrown out).

    – Zitrax

    Nov 23, 2015 at 12:45

  • 3

    This doesn’t work when the list elements are Path objects. You can modify your function to make it work though, just replace the last _nsre.split(s) with _nsre.split(str(s)). Same for the lambda, replace s with str(s) at the end of the expression.

    – cbrnr

    Oct 23, 2019 at 13:57