Categories
contains python string substring

Does Python have a string ‘contains’ substring method?

3588

I’m looking for a string.contains or string.indexof method in Python.

I want to do:

if not somestring.contains("blah"):
   continue

0

    7875

    Use the in operator:

    if "blah" not in somestring: 
        continue
    

    15

    • 372

      Under the hood, Python will use __contains__(self, item), __iter__(self), and __getitem__(self, key) in that order to determine whether an item lies in a given contains. Implement at least one of those methods to make in available to your custom type.

      Aug 17, 2018 at 7:02

    • 62

      Just make sure that somestring won’t be None. Otherwise you get a TypeError: argument of type 'NoneType' is not iterable

      Oct 10, 2018 at 22:44

    • 14

      For strings, does the Python in operator use the Rabin-Carp algorithm?

      – Sam Chats

      Dec 18, 2018 at 20:23

    • 10

      @SamChats see stackoverflow.com/questions/18139660/… for the implementation details (in CPython; afaik the language specification does not mandate any particular algorithm here).

      Feb 28, 2019 at 15:34

    • 4

      @Kaz It should be ugly, since you’re thinking at the wrong abstraction level. On the other hand, '.so' in filepath.suffixes is quite beautiful and explicitly saying what you really want to do.

      – Veky

      Jul 20, 2019 at 20:28

    862

    If it’s just a substring search you can use string.find("substring").

    You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:

    s = "This be a string"
    if s.find("is") == -1:
        print("No 'is' here!")
    else:
        print("Found 'is' in the string.")
    

    It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.

    4

    • 94

      +1 for highlighting the gotchas involved in substring searches. the obvious solution is if ' is ' in s: which will return False as is (probably) expected.

      Aug 9, 2010 at 3:22


    • 128

      @aaronasterling Obvious it may be, but not entirely correct. What if you have punctuation or it’s at the start or end? What about capitalisation? Better would be a case insensitive regex search for \bis\b (word boundaries).

      – Bob

      Nov 8, 2012 at 0:07

    • 2

      Why would this not be what the OP wants

      Feb 18 at 3:55

    • 1

      @uh_big_mike_boi The problem with substring searches is that, in this example, you’re looking for the word is inside “This be a string.” That will evaluate to True because of the is in This. This is bad for programs that search for words, like swear filters (for example, a dumb word check for “ass” would also catch “grass”).

      Jun 19 at 18:44

    452

    Does Python have a string contains substring method?

    99% of use cases will be covered using the keyword, in, which returns True or False:

    'substring' in any_string
    

    For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):

    start = 0
    stop = len(any_string)
    any_string.find('substring', start, stop)
    

    or str.index (like find but raises ValueError on failure):

    start = 100 
    end = 1000
    any_string.index('substring', start, end)
    

    Explanation

    Use the in comparison operator because

    1. the language intends its usage, and
    2. other Python programmers will expect you to use it.
    >>> 'foo' in '**foo**'
    True
    

    The opposite (complement), which the original question asked for, is not in:

    >>> 'foo' not in '**foo**' # returns False
    False
    

    This is semantically the same as not 'foo' in '**foo**' but it’s much more readable and explicitly provided for in the language as a readability improvement.

    Avoid using __contains__

    The “contains” method implements the behavior for in. This example,

    str.__contains__('**foo**', 'foo')
    

    returns True. You could also call this function from the instance of the superstring:

    '**foo**'.__contains__('foo')
    

    But don’t. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):

    class NoisyString(str):
        def __contains__(self, other):
            print(f'testing if "{other}" in "{self}"')
            return super(NoisyString, self).__contains__(other)
    
    ns = NoisyString('a string with a substring inside')
    

    and now:

    >>> 'substring' in ns
    testing if "substring" in "a string with a substring inside"
    True
    

    Don’t use find and index to test for “contains”

    Don’t use the following string methods to test for “contains”:

    >>> '**foo**'.index('foo')
    2
    >>> '**foo**'.find('foo')
    2
    
    >>> '**oo**'.find('foo')
    -1
    >>> '**oo**'.index('foo')
    
    Traceback (most recent call last):
      File "<pyshell#40>", line 1, in <module>
        '**oo**'.index('foo')
    ValueError: substring not found
    

    Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.

    Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.

    If you really mean not any_string.startswith(substring) then say it.

    Performance comparisons

    We can compare various ways of accomplishing the same goal.

    import timeit
    
    def in_(s, other):
        return other in s
    
    def contains(s, other):
        return s.__contains__(other)
    
    def find(s, other):
        return s.find(other) != -1
    
    def index(s, other):
        try:
            s.index(other)
        except ValueError:
            return False
        else:
            return True
    
    
    
    perf_dict = {
    'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
    'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
    '__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
    '__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
    'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
    'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
    'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
    'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
    }
    

    And now we see that using in is much faster than the others.
    Less time to do an equivalent operation is better:

    >>> perf_dict
    {'in:True': 0.16450627865128808,
     'in:False': 0.1609668098178645,
     '__contains__:True': 0.24355481654697542,
     '__contains__:False': 0.24382793854783813,
     'find:True': 0.3067379407923454,
     'find:False': 0.29860888058124146,
     'index:True': 0.29647137792585454,
     'index:False': 0.5502287584545229}
    

    How can in be faster than __contains__ if in uses __contains__?

    This is a fine follow-on question.

    Let’s disassemble functions with the methods of interest:

    >>> from dis import dis
    >>> dis(lambda: 'a' in 'b')
      1           0 LOAD_CONST               1 ('a')
                  2 LOAD_CONST               2 ('b')
                  4 COMPARE_OP               6 (in)
                  6 RETURN_VALUE
    >>> dis(lambda: 'b'.__contains__('a'))
      1           0 LOAD_CONST               1 ('b')
                  2 LOAD_METHOD              0 (__contains__)
                  4 LOAD_CONST               2 ('a')
                  6 CALL_METHOD              1
                  8 RETURN_VALUE
    

    so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine – this should adequately explain the difference.

    8

    • 10

      Why should one avoid str.index and str.find? How else would you suggest someone find the index of a substring instead of just whether it exists or not? (or did you mean avoid using them in place of contains – so don’t use s.find(ss) != -1 instead of ss in s?)

      Jun 10, 2015 at 3:35

    • 4

      Precisely so, although the intent behind the use of those methods may be better addressed by elegant use of the re module. I have not yet found a use for str.index or str.find myself in any code I have written yet.

      Jun 10, 2015 at 3:39

    • Please extend your answer to advice against using str.count as well (string.count(something) != 0). shudder

      – cs95

      Jun 5, 2019 at 3:05


    • 2

      This is an excellent answer to a universal need in Python. Thanks for providing some detailed explanations !

      Aug 29, 2020 at 14:12

    • 1

      @burningfennec I addressed your follow-on question at the end of the answer above.

      May 30, 2021 at 22:35