Categories
javascript regex

Is there a RegExp.escape function in JavaScript?

551

I just want to create a regular expression out of any possible string.

var usersString = "Hello?!*`~World()[]";
var expression = new RegExp(RegExp.escape(usersString))
var matches = "Hello".match(expression);

Is there a built-in method for that? If not, what do people use? Ruby has RegExp.escape. I don’t feel like I’d need to write my own, there have got to be something standard out there.

3

729

The function linked in another answer is insufficient. It fails to escape ^ or $ (start and end of string), or -, which in a character group is used for ranges.

Use this function:

function escapeRegex(string) {
    return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

While it may seem unnecessary at first glance, escaping - (as well as ^) makes the function suitable for escaping characters to be inserted into a character class as well as the body of the regex.

Escaping / makes the function suitable for escaping characters to be used in a JavaScript regex literal for later evaluation.

As there is no downside to escaping either of them, it makes sense to escape to cover wider use cases.

And yes, it is a disappointing failing that this is not part of standard JavaScript.

31

  • 20

    actually, we don’t need to escape / at all

    – thorn0

    Feb 14, 2013 at 20:53

  • 37

    @Paul: Perl quotemeta (\Q), Python re.escape, PHP preg_quote, Ruby Regexp.quote

    – bobince

    Oct 3, 2013 at 10:24

  • 17

    If you are going to use this function in a loop, it’s probably best to make the RegExp object it’s own variable var e = /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g; and then your function is return s.replace(e, '\\$&'); This way you only instantiate the RegExp once.

    – styfle

    Oct 17, 2013 at 21:14

  • 31

    bobince cares not for eslint’s opinion

    – bobince

    Sep 15, 2017 at 22:57

  • 7

    But maybe you want to escape characters to put them inside a character range. IMO better to harmlessly overescape than to underescape and cause problems in niche cases. FWIW personally I’d rather see the characters explicitly here; we’re not playing code golf.

    – bobince

    Oct 12, 2017 at 20:54

171

For anyone using Lodash, since v3.0.0 a _.escapeRegExp function is built-in:

_.escapeRegExp('[lodash](https://lodash.com/)');
// → '\[lodash\]\(https:\/\/lodash\.com\/\)'

And, in the event that you don’t want to require the full Lodash library, you may require just that function!

6

  • 8

    there’s even an npm package of just this! npmjs.com/package/lodash.escaperegexp

    Nov 1, 2015 at 7:34

  • 1

    This imports loads of code that really doesn’t need to be there for such a simple thing. Use bobince’s answer… works for me and its so many less bytes to load than the lodash version!

    – Rob Evans

    Aug 31, 2017 at 13:20

  • 11

    @RobEvans my answer starts with “For anyone using lodash”, and I even mention that you can require only the escapeRegExp function.

    Aug 31, 2017 at 13:24

  • 2

    @gustavohenke Sorry I should have been slightly more clear, I included the module linked to in your “just that function” and that is what I was commenting on. If you take a look it’s quite a lot of code for what should effectively be a single function with a single regexp in it. Agree if you are already using lodash then it makes sense to use it, but otherwise use the other answer. Sorry for the unclear comment.

    – Rob Evans

    Aug 31, 2017 at 18:03

  • 2

    @maddob I cannot see that \x3 you mentioned: my escaped strings are looking good, just what I expect

    May 31, 2018 at 15:10

53

Most of the expressions here solve single specific use cases.

That’s okay, but I prefer an “always works” approach.

function regExpEscape(literal_string) {
    return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}

This will “fully escape” a literal string for any of the following uses in regular expressions:

  • Insertion in a regular expression. E.g. new RegExp(regExpEscape(str))
  • Insertion in a character class. E.g. new RegExp('[' + regExpEscape(str) + ']')
  • Insertion in integer count specifier. E.g. new RegExp('x{1,' + regExpEscape(str) + '}')
  • Execution in non-JavaScript regular expression engines.

Special Characters Covered:

  • -: Creates a character range in a character class.
  • [ / ]: Starts / ends a character class.
  • { / }: Starts / ends a numeration specifier.
  • ( / ): Starts / ends a group.
  • * / + / ?: Specifies repetition type.
  • .: Matches any character.
  • \: Escapes characters, and starts entities.
  • ^: Specifies start of matching zone, and negates matching in a character class.
  • $: Specifies end of matching zone.
  • |: Specifies alternation.
  • #: Specifies comment in free spacing mode.
  • \s: Ignored in free spacing mode.
  • ,: Separates values in numeration specifier.
  • /: Starts or ends expression.
  • :: Completes special group types, and part of Perl-style character classes.
  • !: Negates zero-width group.
  • < / =: Part of zero-width group specifications.

Notes:

  • / is not strictly necessary in any flavor of regular expression. However, it protects in case someone (shudder) does eval("https://stackoverflow.com/" + pattern + "https://stackoverflow.com/");.
  • , ensures that if the string is meant to be an integer in the numerical specifier, it will properly cause a RegExp compiling error instead of silently compiling wrong.
  • #, and \s do not need to be escaped in JavaScript, but do in many other flavors. They are escaped here in case the regular expression will later be passed to another program.

If you also need to future-proof the regular expression against potential additions to the JavaScript regex engine capabilities, I recommend using the more paranoid:

function regExpEscapeFuture(literal_string) {
    return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

This function escapes every character except those explicitly guaranteed not be used for syntax in future regular expression flavors.


For the truly sanitation-keen, consider this edge case:

var s="";
new RegExp('(choice1|choice2|' + regExpEscape(s) + ')');

This should compile fine in JavaScript, but will not in some other flavors. If intending to pass to another flavor, the null case of s === '' should be independently checked, like so:

var s="";
new RegExp('(choice1|choice2' + (s ? '|' + regExpEscape(s) : '') + ')');

8

  • 1

    The / doesn’t need to be escaped in the [...] character class.

    Jul 4, 2017 at 11:32

  • 1

    Most of these doesn’t need to be escaped. “Creates a character range in a character class” – you are never in a character class inside of the string. “Specifies comment in free spacing mode, Ignored in free spacing mode” – not supported in javascript. “Separates values in numeration specifier” – you are never in numerarion specifier inside of the string. Also you can’t write arbitrary text inside of nameration specification. “Starts or ends expression” – no need to escape. Eval is not a case, as it would require much more escaping. [will be continued in the next comment]

    – Qwertiy

    Sep 22, 2017 at 14:01

  • “Completes special group types, and part of Perl-style character classes” – seems not available in javascript. “Negates zero-width group, Part of zero-width group specifications” – you never have groups inside of the string.

    – Qwertiy

    Sep 22, 2017 at 14:01

  • @Qwertiy The reason for these extra escapes is to eliminate edge cases which could cause problems in certain use cases. For instance, the user of this function may want to insert the escaped regex string into another regex as part of a group, or even for use in another language besides Javascript. The function does not make assumptions like “I will never be part of a character class”, because it’s meant to be general. For a more YAGNI approach, see any of the other answers here.

    Sep 22, 2017 at 20:14

  • Very good. Why is _ not escaped though? What ensures it probably won’t become regex syntax later?

    – madprops

    Oct 29, 2017 at 11:43