Categories
javascript regex

Why do regex constructors need to be double escaped?

65

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.

But this is not the case as double escapes are required.

Why is this?

var res = new RegExp('(\\s|^)' + foo).test(moo);

Is there a concrete example of how a single escape could be mis-interpreted as something else?

3

  • 1

    Remember, it’s not that Java or the Regexp constructor need clarification, it’s the compiler (or parser).

    – GJK

    Jul 25, 2013 at 16:04

  • 3

    To add to the already-correct answers: note that if you write a RegExp literal in JavaScript, you don’t need to escape the backslash, as you would suspect: /(\s|^)/

    – Dan Tao

    Jul 25, 2013 at 16:05

  • Related: stackoverflow.com/a/37329801/1225328.

    – sp00m

    Jul 30, 2018 at 9:00

55

You are constructing the regular expression by passing a string to the RegExp constructor.

\ is an escape character in string literals.

The \ is consumed by the string literal parsing…

const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);

… so the data you pass to the RegEx compiler is a plain s and not \s.

You need to escape the \ to express the \ as data instead of being an escape character itself.

1

  • 2

    That pertains to both regular string literals as well as template string literals.

    Sep 27, 2018 at 15:28

23

Inside the code where you’re creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.

So when you generate a string by saying var someString = '(\\s|^)', what you’re really doing is creating an actual string with the value (\s|^).

    11

    The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".

    Here’s a live example to illustrate why "\s" is not enough:

    alert("One backslash:          \s\nDouble backslashes: \\s");

    Note how an extra \ before \s changes the output.