Categories
list python string string-concatenation

How slow is Python’s string concatenation vs. str.join?

94

As a result of the comments in my answer on this thread, I wanted to know what the speed difference is between the += operator and ''.join()

So what is the speed comparison between the two?

3

  • 3

    what are you testing? two strings? two million strings?

    Jun 16, 2010 at 17:03

  • Yeah, I just keep forgetting the syntax of timeit – and time.time() – start is so much easier 😛

    Jun 16, 2010 at 17:29

  • This question is similar and has better answers: stackoverflow.com/questions/1349311/…

    – Frank

    Aug 26, 2020 at 15:00

131

From: Efficient String Concatenation

Method 1:

def method1():
  out_str=""
  for num in xrange(loop_count):
    out_str += 'num'
  return out_str

Method 4:

def method4():
  str_list = []
  for num in xrange(loop_count):
    str_list.append('num')
  return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it’s a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can’t be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

9

  • 3

    Well I was going to just answer this myself (hence the tag) but it looks like you beat me to the punch! +1, especially for the useful link!

    Jun 16, 2010 at 17:12

  • 2

    @Wayne: Useful link is copied from the question that you’ve linked to!

    Jun 16, 2010 at 17:17

  • 9

    -1. There is no fixed ratio for the speed difference between string.join and + concatenation, because they have completely different **growth rate**/big oh complexity. As the number of string to concatenate grows, string.join will have greater and greater margin compared to string concatenation.

    – Lie Ryan

    Jun 16, 2010 at 18:14

  • 1

    @nate c: Method 1 is now just a shade slower than method 6 (using Python 2.6), but that’s only in CPython. I believe that in Jython, it hasn’t been optimised like this, so ''.join(list) remains considerably faster – see the first point in “Programming Recommendations” in PEP 8.

    Nov 22, 2010 at 5:11

  • 11

    From PEP 8: “For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a+=b or a=a+b. Those statements run more slowly in Jython. In performance sensitive parts of the library, the ”.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.”

    – Neil G

    Jul 2, 2011 at 8:41


12

My original code was wrong, it appears that + concatenation is usually faster (especially with newer versions of Python on newer hardware)

The times are as follows:

Iterations: 1,000,000       

Python 3.3 on Windows 7, Core i7

String of len:   1 took:     0.5710     0.2880 seconds
String of len:   4 took:     0.9480     0.5830 seconds
String of len:   6 took:     1.2770     0.8130 seconds
String of len:  12 took:     2.0610     1.5930 seconds
String of len:  80 took:    10.5140    37.8590 seconds
String of len: 222 took:    27.3400   134.7440 seconds
String of len: 443 took:    52.9640   170.6440 seconds

Python 2.7 on Windows 7, Core i7

String of len:   1 took:     0.7190     0.4960 seconds
String of len:   4 took:     1.0660     0.6920 seconds
String of len:   6 took:     1.3300     0.8560 seconds
String of len:  12 took:     1.9980     1.5330 seconds
String of len:  80 took:     9.0520    25.7190 seconds
String of len: 222 took:    23.1620    71.3620 seconds
String of len: 443 took:    44.3620   117.1510 seconds

On Linux Mint, Python 2.7, some slower processor

String of len:   1 took:     1.8840     1.2990 seconds
String of len:   4 took:     2.8394     1.9663 seconds
String of len:   6 took:     3.5177     2.4162 seconds
String of len:  12 took:     5.5456     4.1695 seconds
String of len:  80 took:    27.8813    19.2180 seconds
String of len: 222 took:    69.5679    55.7790 seconds
String of len: 443 took:   135.6101   153.8212 seconds

And here is the code:

from __future__ import print_function
import time

def strcat(string):
    newstr=""
    for char in string:
        newstr += char
    return newstr

def listcat(string):
    chars = []
    for char in string:
        chars.append(char)
    return ''.join(chars)

def test(fn, times, *args):
    start = time.time()
    for x in range(times):
        fn(*args)
    return "{:>10.4f}".format(time.time() - start)

def testall():
    strings = ['a', 'long', 'longer', 'a bit longer', 
               '''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz  oijewf sdkjjka dsf sdk siasjk dfwijs''',
               '''this is a really long string that's so long
               it had to be triple quoted  and contains lots of
               superflous characters for kicks and gigles
               @!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
              '''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']

    for string in strings:
        print("String of len:", len(string), "took:", test(listcat, 1000000, string), test(strcat, 1000000, string), "seconds")

testall()

10

  • 1

    Your test is wrong. Your strcat will return output as string * len(string) whereas your listcat will always return just the string. How can you compare them? Test with newstr += char or with chars.append(string). This actually proves the point of @bwawok that + is faster than list append.

    Oct 9, 2013 at 9:35

  • Good catch – that should be newstr += char. Whoops. Fixed, and updated.

    Oct 9, 2013 at 22:31

  • On my Win 10, desktop Haswell i5, Python 2.7.10 and 3.5.2 machine results are the opposite: strcat is slightly faster: pastebin.com/sVVuExBa

    – Dan M.

    Aug 28, 2016 at 14:11


  • 1

    @DanM.: Did you mean listcat is slightly faster? Because that’s what it shows in the Pastebin.

    Dec 6, 2016 at 16:24

  • @ShadowRanger values in the second column are lower, isn’t it supposed to represent strcat (according to print)?

    – Dan M.

    Dec 6, 2016 at 18:07

12

The existing answers are very well-written and researched, but here’s another answer for the Python 3.6 era, since now we have literal string interpolation (AKA, f-strings):

>>> import timeit
>>> timeit.timeit('f\'{"a"}{"b"}{"c"}\'', number=1000000)
0.14618930302094668
>>> timeit.timeit('"".join(["a", "b", "c"])', number=1000000)
0.23334730707574636
>>> timeit.timeit('a = "a"; a += "b"; a += "c"', number=1000000)
0.14985873899422586

Test performed using CPython 3.6.5 on a 2012 Retina MacBook Pro with an Intel Core i7 at 2.3 GHz.

This is by no means any formal benchmark, but it looks like using f-strings is roughly as performant as using += concatenation; any improved metrics or suggestions are, of course, welcome.

3

  • 2

    Please see this answer to a similar question: stackoverflow.com/a/1350289/1202214 += should NOT be used, its performance gains is an illusion.

    Nov 6, 2018 at 18:45

  • @AndreasBergström nice find. re-running the informal benchmark on the same machine using a = "a"; a = a + "b"; a = a + "c" yields a slight slowdown of 0.1739.

    – Jules

    Nov 7, 2018 at 1:06

  • This is not a fair benchmark. You are not creating the list in a loop which is a significant performance optimization that is not applicable to the general case. Check Dominic’s answer for how a fair benchmark should look.

    Mar 1, 2021 at 10:42