Categories
contains php string string-matching substring

How do I check if a string contains a specific word?

2659

Consider:

$a="How are you?";

if ($a contains 'are')
    echo 'true';

Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')?

0

    7818

    Now with PHP 8 you can do this using str_contains:

    if (str_contains('How are you', 'are')) { 
        echo 'true';
    }
    

    RFC

    Before PHP 8

    You can use the strpos() function which is used to find the occurrence of one string inside another one:

    $a="How are you?";
    
    if (strpos($a, 'are') !== false) {
        echo 'true';
    }
    

    Note that the use of !== false is deliberate (neither != false nor === true will return the desired result); strpos() returns either the offset at which the needle string begins in the haystack string, or the boolean false if the needle isn’t found. Since 0 is a valid offset and 0 is “falsey”, we can’t use simpler constructs like !strpos($a, 'are').

    23

    • 179

      @DTest – well yes of course it will return true because the string contains ‘are’. If you are looking specifically for the word ARE then you would need to do more checks like, for example, check if there is a character or a space before the A and after the E.

      – jsherk

      Nov 14, 2012 at 21:35

    • 45

      Very good comments above! I never use != or ==, after all !== and === is best option (in my opinion) all aspect considered (speed, accuracy etc).

      – Melsi

      Dec 15, 2012 at 12:28

    • 11

      @jsherk Why not regexes, then? Something like ” are “.

      Jan 6, 2013 at 15:48

    • 8

      As for not catching ‘care’ and such things, it is better to check for (strpos(‘ ‘ . strtolower($a) . ‘ ‘, ‘ are ‘) !== false)

      – Wouter

      Sep 23, 2013 at 14:26

    • 28

      I tend to avoid this issue by always using strpos($a, 'are') > -1 to test for true. From a debugging perspective, I find my brain wastes fewer clock cycles determining if the line is written correctly when I don’t have to count contiguous equals signs.

      – equazcion

      May 6, 2014 at 6:01

    729

    You could use regular expressions as it’s better for word matching compared to strpos, as mentioned by other users. A strpos check for are will also return true for strings such as: fare, care, stare, etc. These unintended matches can simply be avoided in regular expression by using word boundaries.

    A simple match for are could look something like this:

    $a="How are you?";
    
    if (preg_match('/\bare\b/', $a)) {
        echo 'true';
    }
    

    On the performance side, strpos is about three times faster. When I did one million compares at once, it took preg_match 1.5 seconds to finish and for strpos it took 0.5 seconds.

    Edit:
    In order to search any part of the string, not just word by word, I would recommend using a regular expression like

    $a="How are you?";
    $search="are y";
    if(preg_match("/{$search}/i", $a)) {
        echo 'true';
    }
    

    The i at the end of regular expression changes regular expression to be case-insensitive, if you do not want that, you can leave it out.

    Now, this can be quite problematic in some cases as the $search string isn’t sanitized in any way, I mean, it might not pass the check in some cases as if $search is a user input they can add some string that might behave like some different regular expression…

    Also, here’s a great tool for testing and seeing explanations of various regular expressions Regex101

    To combine both sets of functionality into a single multi-purpose function (including with selectable case sensitivity), you could use something like this:

    function FindString($needle,$haystack,$i,$word)
    {   // $i should be "" or "i" for case insensitive
        if (strtoupper($word)=="W")
        {   // if $word is "W" then word search instead of string in string search.
            if (preg_match("/\b{$needle}\b/{$i}", $haystack)) 
            {
                return true;
            }
        }
        else
        {
            if(preg_match("/{$needle}/{$i}", $haystack)) 
            {
                return true;
            }
        }
        return false;
        // Put quotes around true and false above to return them as strings instead of as bools/ints.
    }
    

    One more thing to take in mind, is that \b will not work in different languages other than english.

    The explanation for this and the solution is taken from here:

    \b represents the beginning or end of a word (Word Boundary). This
    regex would match apple in an apple pie, but wouldn’t match apple in
    pineapple, applecarts or bakeapples.

    How about “café”? How can we extract the word “café” in regex?
    Actually, \bcafé\b wouldn’t work. Why? Because “café” contains
    non-ASCII character: é. \b can’t be simply used with Unicode such as
    समुद्र, 감사, месяц and 😉 .

    When you want to extract Unicode characters, you should directly
    define characters which represent word boundaries.

    The answer: (?<=[\s,.:;"']|^)UNICODE_WORD(?=[\s,.:;"']|$)

    So in order to use the answer in PHP, you can use this function:

    function contains($str, array $arr) {
        // Works in Hebrew and any other unicode characters
        // Thanks https://medium.com/@shiba1014/regex-word-boundaries-with-unicode-207794f6e7ed
        // Thanks https://www.phpliveregex.com/
        if (preg_match('/(?<=[\s,.:;"\']|^)' . $word . '(?=[\s,.:;"\']|$)/', $str)) return true;
    }
    

    And if you want to search for array of words, you can use this:

    function arrayContainsWord($str, array $arr)
    {
        foreach ($arr as $word) {
            // Works in Hebrew and any other unicode characters
            // Thanks https://medium.com/@shiba1014/regex-word-boundaries-with-unicode-207794f6e7ed
            // Thanks https://www.phpliveregex.com/
            if (preg_match('/(?<=[\s,.:;"\']|^)' . $word . '(?=[\s,.:;"\']|$)/', $str)) return true;
        }
        return false;
    }
    

    As of PHP 8.0.0 you can now use str_contains

    <?php
        if (str_contains('abc', '')) {
            echo "Checking the existence of the empty string will always 
            return true";
        }
    

    16

    • 11

      @Alexander.Plutov second of all you’re giving me a -1 and not the question ? cmon it takes 2 seconds to google the answer google.com/…

      – Breezer

      Dec 6, 2010 at 14:03

    • 65

      +1 Its a horrible way to search for a simple string, but many visitors to SO are looking for any way to search for any of their own substrings, and it is helpful that the suggestion has been brought up. Even the OP might have oversimplified – let him know of his alternatives.

      – SamGoody

      Nov 9, 2011 at 9:53

    • 77

      Technically, the question asks how to find words not a substring. This actually helped me as I can use this with regex word boundries. Alternatives are always useful.

      – user764357

      Aug 20, 2013 at 5:57

    • 16

      +1 for the answer and -1 to the @plutov.by comment because , strpos is just a single check meanwhile regexp you can check many words in the same time ex: preg_match(/are|you|not/)

      – albanx

      Nov 5, 2014 at 17:05

    • 7

      Regular Expressions should be the last resort method. Their use in trivial tasks should be discouraged. I insist on this from the height of many years of digging bad code.

      – yentsun

      Feb 18, 2015 at 14:38


    290

    Here is a little utility function that is useful in situations like this

    // returns true if $needle is a substring of $haystack
    function contains($needle, $haystack)
    {
        return strpos($haystack, $needle) !== false;
    }
    

    12

    • 75

      @RobinvanBaalen Actually, it can improves code readability. Also, downvotes are supposed to be for (very) bad answers, not for “neutral” ones.

      – Xaqq

      Jul 9, 2013 at 8:56

    • 39

      @RobinvanBaalen functions are nearly by definition for readability (to communicate the idea of what you’re doing). Compare which is more readable: if ($email->contains("@") && $email->endsWith(".com)) { ... or if (strpos($email, "@") !== false && substr($email, -strlen(".com")) == ".com") { ...

      – Brandin

      Jul 25, 2013 at 12:12

    • 3

      @RobinvanBaalen in the end rules are meant to be broken. Otherwise people wouldn’t come up with newer inventive ways of doing things 🙂 . Plus have to admit I have trouble wrapping the mind around stuff like on martinfowler.com. Guess the right thing to do is to try things out yourself and find out what approaches are the most convenient.

      – James P.

      Aug 22, 2013 at 1:43

    • 6

      Another opinion: Having an utility function which you can easily wrap can help debugging. Also it loundens the cry for good optimizers which eliminate such overhead in production services. So all opinions have valid points. 😉

      – Tino

      Feb 20, 2014 at 21:09


    • 20

      Of course this is usefull. You should encourage this. What happens if in PHP 100 there is a new and faster way to find string locations ? Do you want to change all your places where you call strpos ? Or do you want to change only the contains within the function ??

      – Cosmin

      Jun 17, 2015 at 9:44