Categories
multiline regex

How do I match any character across multiple lines in a regular expression?

515

For example, this regex

(.*)<FooBar>

will match:

abcde<FooBar>

But how do I get it to match across multiple lines?

abcde
fghij<FooBar>

1

  • 1

    To clarify; I was originally using Eclipse to do a find and replace in multiple files. What I have discovered by the answers below is that my problem was the tool and not regex pattern.

    – andyuk

    Oct 2, 2008 at 15:45

292

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

8

  • and what if i wanted just a new line and not all characters ?

    – Grace

    Apr 11, 2011 at 12:02

  • 5

    @Grace: use \n to match a newline

    Apr 11, 2011 at 21:05

  • 6

    The s flag is (now?) invalid, at least in Chrome/V8. Instead use /([\s\S]*)<FooBar>/ character class (match space and non-space] instead of the period matcher. See other answers for more info.

    – Allen

    May 9, 2013 at 15:37


  • 17

    @Allen – JavaScript doesn’t support the s modifier. Instead, do [^]* for the same effect.

    Jul 12, 2015 at 22:26

  • 2

    In Ruby, use the m modifier

    Jul 15, 2015 at 22:57

292

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

8

  • and what if i wanted just a new line and not all characters ?

    – Grace

    Apr 11, 2011 at 12:02

  • 5

    @Grace: use \n to match a newline

    Apr 11, 2011 at 21:05

  • 6

    The s flag is (now?) invalid, at least in Chrome/V8. Instead use /([\s\S]*)<FooBar>/ character class (match space and non-space] instead of the period matcher. See other answers for more info.

    – Allen

    May 9, 2013 at 15:37


  • 17

    @Allen – JavaScript doesn’t support the s modifier. Instead, do [^]* for the same effect.

    Jul 12, 2015 at 22:26

  • 2

    In Ruby, use the m modifier

    Jul 15, 2015 at 22:57

157

The question is, can the . pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.

A special note about : they are not considered regular expressions, but . matches any character there, the same as POSIX-based engines.

Another note on and : the . matches any character by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match'); (tokens contain a abcde\n fghij item).

Also, in all of ‘s regex grammars the dot matches line breaks by default. Boost’s ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m (source).

As for (it is POSIX based), use the n option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual

POSIX-based engines:

A mere . already matches line breaks, so there isn’t a need to use any modifiers, see (demo).

The (demo), (demo), (TRE, base R default engine with no perl=TRUE, for base R with perl=TRUE or for stringr/stringi patterns, use the (?s) inline modifier) (demo) also treat . the same way.

However, most POSIX-based tools process input line by line. Hence, . does not match the line breaks just because they are not in scope. Here are some examples how to override this:

  • – There are multiple workarounds. The most precise, but not very safe, is sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/' (H;1h;$!d;x; slurps the file into memory). If whole lines must be included, sed '/start_pattern/,/end_pattern/d' file (removing from start will end with matched lines included) or sed '/start_pattern/,/end_pattern/{{//!d;};}' file (with matching lines excluded) can be considered.
  • perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str" (-0 slurps the whole file into memory, -p prints the file after applying the script given by -e). Note that using -000pe will slurp the file and activate ‘paragraph mode’ where Perl uses consecutive newlines (\n\n) as the record separator.
  • grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file. Here, z enables file slurping, (?s) enables the DOTALL mode for the . pattern, (?i) enables case insensitive mode, \K omits the text matched so far, *? is a lazy quantifier, (?=<Foobar>) matches the location before <Foobar>.
  • pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file (M enables file slurping here). Note pcregrep is a good solution for macOS grep users.

See demos.

Non-POSIX-based engines:

  • – Use the s modifier PCRE_DOTALL modifier: preg_match('~(.*)<Foobar>~s', $s, $m) (demo)

  • – Use RegexOptions.Singleline flag (demo):
    var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
    var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;

  • – Use the (?s) inline option: $s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]

  • – Use the s modifier (or (?s) inline version at the start) (demo): /(.*)<FooBar>/s

  • – Use the re.DOTALL (or re.S) flags or (?s) inline modifier (demo): m = re.search(r"(.*)<FooBar>", s, flags=re.S) (and then if m:, print(m.group(1)))

  • – Use Pattern.DOTALL modifier (or inline (?s) flag) (demo): Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)

  • – Use RegexOption.DOT_MATCHES_ALL : "(.*)<FooBar>".toRegex(RegexOption.DOT_MATCHES_ALL)

  • – Use (?s) in-pattern modifier (demo): regex = /(?s)(.*)<FooBar>/

  • – Use (?s) modifier (demo): "(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }

  • – Use [^] or workarounds [\d\D] / [\w\W] / [\s\S] (demo): s.match(/([\s\S]*)<FooBar>/)[1]

  • (std::regex) Use [\s\S] or the JavaScript workarounds (demo): regex rex(R"(([\s\S]*)<FooBar>)");

  • – Use the same approach as in JavaScript, ([\s\S]*)<Foobar>. (NOTE: The MultiLine property of the RegExp object is sometimes erroneously thought to be the option to allow . match across line breaks, while, in fact, it only changes the ^ and $ behavior to match start/end of lines rather than strings, the same as in JavaScript regex)
    behavior.)

  • – Use the /m MULTILINE modifier (demo): s[/(.*)<Foobar>/m, 1]

  • – Base R PCRE regexps – use (?s): regmatches(x, regexec("(?s)(.*)<FooBar>",x, perl=TRUE))[[1]][2] (demo)

  • – in stringr/stringi regex funtions that are powered with the ICU regex engine. Also use (?s): stringr::str_match(x, "(?s)(.*)<FooBar>")[,2] (demo)

  • – Use the inline modifier (?s) at the start (demo): re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)

  • – Use dotMatchesLineSeparators or (easier) pass the (?s) inline modifier to the pattern: let rx = "(?s)(.*)<Foobar>"

  • – The same as Swift. (?s) works the easiest, but here is how the option can be used: NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:&regexError];

  • , – Use the (?s) modifier (demo): "(?s)(.*)<Foobar>" (in Google Spreadsheets, =REGEXEXTRACT(A2,"(?s)(.*)<Foobar>"))

NOTES ON (?s):

In most non-POSIX engines, the (?s) inline modifier (or embedded flag option) can be used to enforce . to match line breaks.

If placed at the start of the pattern, (?s) changes the bahavior of all . in the pattern. If the (?s) is placed somewhere after the beginning, only those .s will be affected that are located to the right of it unless this is a pattern passed to Python’s re. In Python re, regardless of the (?s) location, the whole pattern . is affected. The (?s) effect is stopped using (?-s). A modified group can be used to only affect a specified range of a regex pattern (e.g., Delim1(?s:.*?)\nDelim2.* will make the first .*? match across newlines and the second .* will only match the rest of the line).

POSIX note:

In non-POSIX regex engines, to match any character, [\s\S] / [\d\D] / [\w\W] constructs can be used.

In POSIX, [\s\S] is not matching any character (as in JavaScript or any non-POSIX engine), because regex escape sequences are not supported inside bracket expressions. [\s\S] is parsed as bracket expressions that match a single character, \ or s or S.

7

  • 10

    You should link to this excellent overview from your profile page or something (+1).

    – Jan

    Oct 15, 2017 at 20:15

  • 1

    You may want to add this to the boost item: In the regex_constants namespace, flag_type_’s : perl = ECMAScript = JavaScript = JScript = ::boost::regbase::normal = 0 which defaults to Perl. Programmers will set a base flag definition #define MOD regex_constants::perl | boost::regex::no_mod_s | boost::regex::no_mod_m for thier regex flags to reflect that. And the arbitor is always the inline modifiers. Where (?-sm)(?s).* resets.

    – user557597

    Apr 26, 2018 at 21:30


  • 1

    Can you also add for bash please?

    Dec 19, 2018 at 2:12

  • 2

    @PasupathiRajamanickam Bash uses a POSIX regex engine, the . matches any char there (including line breaks). See this online Bash demo.

    Dec 19, 2018 at 7:33


  • 1

    you are a legend

    Apr 27, 2020 at 6:13