Categories
flatten list multidimensional-array python

How do I make a flat list out of a list of lists?

4736

I want to flatten this list of lists:

[[1, 2, 3], [4, 5, 6], [7], [8, 9]]

into:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

3

6620

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists — as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

8

  • 606

    I tried a test with the same data, using itertools.chain.from_iterable : $ python -mtimeit -s'from itertools import chain; l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'list(chain.from_iterable(l))'. It runs a bit more than twice as fast as the nested list comprehension that’s the fastest of the alternatives shown here.

    – intuited

    Oct 15, 2010 at 1:21


  • 340

    I found the syntax hard to understand until I realized you can think of it exactly like nested for loops. for sublist in l: for item in sublist: yield item

    Jul 27, 2011 at 16:43

  • 237

    [leaf for tree in forest for leaf in tree] might be easier to comprehend and apply.

    – John Mee

    Aug 29, 2013 at 1:38


  • 25

    @RobCrowell Same here. To me the list comprehension one doesn’t read right, something feels off about it – I always seem to get it wrong and end up googling. To me this reads right [leaf for leaf in tree for tree in forest]. I wish this is how it was. I am sure I am missing something about the grammar here, and I would appreciate if anyone could point that out.

    Jul 12, 2021 at 17:19


  • 40

    I kept looking here every time I wanted to flatten a list, but this gif is what drove it home: i.stack.imgur.com/0GoV5.gif

    – Gilthans

    Aug 11, 2021 at 12:04

2089

You can use itertools.chain():

>>> import itertools
>>> list2d = [[1,2,3], [4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))

Or you can use itertools.chain.from_iterable() which doesn’t require unpacking the list with the * operator:

>>> import itertools
>>> list2d = [[1,2,3], [4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain.from_iterable(list2d))

This approach is arguably more readable than [item for sublist in l for item in sublist] and appears to be faster too:

$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools' 'list(itertools.chain.from_iterable(l))'
20000 loops, best of 5: 10.8 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 5: 21.7 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 5: 258 usec per loop
$ python3 -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;from functools import reduce' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 5: 292 usec per loop
$ python3 --version
Python 3.7.5rc1

4

  • 25

    The * is the tricky thing that makes chain less straightforward than the list comprehension. You have to know that chain only joins together the iterables passed as parameters, and the * causes the top-level list to be expanded into parameters, so chain joins together all those iterables, but doesn’t descend further. I think this makes the comprehension more readable than the use of chain in this case.

    Sep 3, 2014 at 14:13


  • 124

    @TimDierks: I’m not sure “this requires you to understand Python syntax” is an argument against using a given technique in Python. Sure, complex usage could confuse, but the “splat” operator is generally useful in many circumstances, and this isn’t using it in a particularly obscure way; rejecting all language features that aren’t necessarily obvious to beginning users means you’re tying one hand behind your back. May as well throw out list comprehensions too while you’re at it; users from other backgrounds would find a for loop that repeatedly appends more obvious.

    Nov 12, 2015 at 20:26


  • 3

    * creates an intermediary tuple.! from_iterable fetch the nested lists directly from the top list.

    – nadapez

    Oct 21, 2021 at 3:35

  • 2

    To make this more readable, you can make a simple function: def flatten_list(deep_list: list[list[object]]): return list(chain.from_iterable(deep_list)). The type hinting improves the clarity of what’s going on (modern IDEs would interpret this as returning a list[object] type).

    Oct 25, 2021 at 14:34


1232

Note from the author: This is very inefficient. But fun, because monoids are awesome.

>>> xss = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> sum(xss, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

sum sums the elements of the iterable xss, and uses the second argument as the initial value [] for the sum. (The default initial value is 0, which is not a list.)

Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4].

Note that only works on lists of lists. For lists of lists of lists, you’ll need another solution.

5

  • 142

    that’s pretty neat and clever but I wouldn’t use it because it’s confusing to read.

    – andrewrk

    Jun 15, 2010 at 18:55

  • 102

    This is a Shlemiel the painter’s algorithm joelonsoftware.com/articles/fog0000000319.html — unnecessarily inefficient as well as unnecessarily ugly.

    Apr 25, 2012 at 18:24

  • 58

    The append operation on lists forms a Monoid, which is one of the most convenient abstractions for thinking of a + operation in a general sense (not limited to numbers only). So this answer deserves a +1 from me for (correct) treatment of lists as a monoid. The performance is concerning though…

    – ulidtko

    Dec 3, 2014 at 10:35

  • 18

    this is a very inefficient way because of the quadratic aspect of the sum.

    Jul 31, 2017 at 18:04

  • 10

    This article explains the maths of the inefficiency mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python

    – ds4940

    Jan 4, 2018 at 16:46