Categories
excel regex vba

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

712

How can I use regular expressions in Excel and take advantage of Excel’s powerful grid-like setup for data manipulation?

  • In-cell function to return a matched pattern or replaced value in a string.
  • Sub to loop through a column of data and extract matches to adjacent cells.
  • What setup is necessary?
  • What are Excel’s special characters for Regular expressions?

I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.

3

1096

Regular expressions are used for Pattern Matching.

To use in Excel follow these steps:

Step 1: Add VBA reference to “Microsoft VBScript Regular Expressions 5.5”

  • Select “Developer” tab (I don’t have this tab what do I do?)
  • Select “Visual Basic” icon from ‘Code’ ribbon section
  • In “Microsoft Visual Basic for Applications” window select “Tools” from the top menu.
  • Select “References”
  • Check the box next to “Microsoft VBScript Regular Expressions 5.5” to include in your workbook.
  • Click “OK”

Step 2: Define your pattern

Basic definitions:

- Range.

  • E.g. a-z matches an lower case letters from a to z
  • E.g. 0-5 matches any number from 0 to 5

[] Match exactly one of the objects inside these brackets.

  • E.g. [a] matches the letter a
  • E.g. [abc] matches a single letter which can be a, b or c
  • E.g. [a-z] matches any single lower case letter of the alphabet.

() Groups different matches for return purposes. See examples below.

{} Multiplier for repeated copies of pattern defined before it.

  • E.g. [a]{2} matches two consecutive lower case letter a: aa
  • E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa

+ Match at least one, or more, of the pattern defined before it.

  • E.g. a+ will match consecutive a’s a, aa, aaa, and so on

? Match zero or one of the pattern defined before it.

  • E.g. Pattern may or may not be present but can only be matched one time.
  • E.g. [a-z]? matches empty string or any single lower case letter.

* Match zero or more of the pattern defined before it.

  • E.g. Wildcard for pattern that may or may not be present.
  • E.g. [a-z]* matches empty string or string of lower case letters.

. Matches any character except newline \n

  • E.g. a. Matches a two character string starting with a and ending with anything except \n

| OR operator

  • E.g. a|b means either a or b can be matched.
  • E.g. red|white|orange matches exactly one of the colors.

^ NOT operator

  • E.g. [^0-9] character can not contain a number
  • E.g. [^aA] character can not be lower case a or upper case A

\ Escapes special character that follows (overrides above behavior)

  • E.g. \., \\, \(, \?, \$, \^

Anchoring Patterns:

^ Match must occur at start of string

  • E.g. ^a First character must be lower case letter a
  • E.g. ^[0-9] First character must be a number.

$ Match must occur at end of string

  • E.g. a$ Last character must be lower case letter a

Precedence table:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

Predefined Character Abbreviations:

abr    same as       meaning
\d     [0-9]         Any single digit
\D     [^0-9]        Any single character that's not a digit
\w     [a-zA-Z0-9_]  Any word character
\W     [^a-zA-Z0-9_] Any non-word character
\s     [ \r\t\n\f]   Any space character
\S     [^ \r\t\n\f]  Any non-space character
\n     [\n]          New line

Example 1: Run as macro

The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return “Not Matched” because the digits were not at the start of the string.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

Example 2: Run as an in-cell function

This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

Place your strings (“12abc”) in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be “abc”.

results image


Example 3: Loop Through Range

This example is the same as example 1 but loops through a range of cells.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

Example 4: Splitting apart different patterns

This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

Results:

results image


Additional Pattern Examples

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    \<\/[a-zA-Z][0-9]\>            Exact non-word character except any single alpha followed by any single digit

20

  • 32

    You should not forget to Set regEx = Nothing. You will get Out Of Memory exceptions, when that Sub is executed frequently enought.

    – Kiril

    Mar 13, 2015 at 10:28

  • 15

    Late binding line: Set regEx = CreateObject("VBScript.RegExp")

    – ZygD

    Dec 5, 2015 at 11:23

  • 2

    Okay, I’m pretty sure it’s because the code is in ThisWorkbook. Try moving the code to a separate Module.

    May 9, 2019 at 4:00


  • 4

    @PortlandRunner in the “project explorer” (?) this excel file lacked a “Modules” subfolder, although another file showed one. Right-clicked the file and chose ‘insert module’, then double-clicked “Module 1” and pasted the code. Saved. Back to workbook and keyed in the function again – it worked. Might be noteworthy in the answer, for the sake of the inexperienced like me? Thanks for the help.

    May 10, 2019 at 5:52

  • 2

    Unreal… simple indie tools like Notepad++ have a “regex” option in their Find and Replace… but in a world class tool like Excel, you have to be a programmer to do it , and in the most obscure and complicated way..

    – Ciabaros

    Sep 16, 2021 at 16:30

230

To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.

How it works

It takes 2-3 parameters.

  1. A text to use the regular expression on.
  2. A regular expression.
  3. A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.

Some examples

Extracting an email address:

=regex("Peter Gordon: [email protected], 47", "\[email protected]\w+\.\w+")
=regex("Peter Gordon: [email protected], 47", "\[email protected]\w+\.\w+", "$0")

Results in: [email protected]

Extracting several substrings:

=regex("Peter Gordon: [email protected], 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")

Results in: E-Mail: [email protected], Name: Peter Gordon

To take apart a combined string in a single cell into its components in multiple cells:

=regex("Peter Gordon: [email protected], 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: [email protected], 47", "^(.+): (.+), (\d+)$", "$" & 2)

Results in: Peter Gordon [email protected]

How to use

To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):

  1. In Excel in a Macro enabled file (‘.xlsm’) push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
  2. Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
    1. Click on Tools -> References (please excuse the german screenshot)
      Tools -> References
    2. Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
    3. Click OK.
  3. Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).

    Second icon in the icon row -> Module

  4. In the big text window in the middle insert the following:

    Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
        Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
        Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
        Dim replaceNumber As Integer
    
        With inputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = matchPattern
        End With
        With outputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = "\$(\d+)"
        End With
        With outReplaceRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
        End With
    
        Set inputMatches = inputRegexObj.Execute(strInput)
        If inputMatches.Count = 0 Then
            regex = False
        Else
            Set replaceMatches = outputRegexObj.Execute(outputPattern)
            For Each replaceMatch In replaceMatches
                replaceNumber = replaceMatch.SubMatches(0)
                outReplaceRegexObj.Pattern = "\$" & replaceNumber
    
                If replaceNumber = 0 Then
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
                Else
                    If replaceNumber > inputMatches(0).SubMatches.Count Then
                        'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                        regex = CVErr(xlErrValue)
                        Exit Function
                    Else
                        outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                    End If
                End If
            Next
            regex = outputPattern
        End If
    End Function
    
  5. Save and close the Microsoft Visual Basic for Applications Editor window.

8

  • 7

    This answer combined with the steps here to create an Add-In, has been very helpful. Thank you. Make sure you don’t give your module and function the same name!

    Feb 24, 2015 at 19:03

  • 2

    Just reiterating the comment above from Chris Hunt. Don’t call your Module ‘Regex’ as well. Thought I was going mad for a while as the function wouldn’t work due to a #NAME error

    – Chris

    Sep 28, 2015 at 14:57

  • Well, I’m gone nuts as I tried everything (including changing modules/names) and still getting the #NAME error >_> i.imgur.com/UUQ6eCi.png

    – Enissay

    Aug 15, 2016 at 20:46

  • @Enissay: Try creating a minimal Function foo() As Variant \n foo="Hello World" \n End Function UDF to see if that works. If yes, work your way up to the full thing above, if no something basic is broken (macros disabled?).

    Aug 16, 2016 at 7:27

  • 2

    @Vijay: same at the github.com/malcolmp/excel-regular-expressions

    – Vadim

    Apr 28, 2020 at 18:30

90

Expanding on patszim‘s answer for those in a rush.

  1. Open Excel workbook.
  2. Alt+F11 to open VBA/Macros window.
  3. Add reference to regex under Tools then References
    ![Excel VBA Form add references
  4. and selecting Microsoft VBScript Regular Expression 5.5
    ![Excel VBA add regex reference
  5. Insert a new module (code needs to reside in the module otherwise it doesn’t work).
    ![Excel VBA insert code module
  6. In the newly inserted module,
    ![Excel VBA insert code into module
  7. add the following code:

    Function RegxFunc(strInput As String, regexPattern As String) As String
        Dim regEx As New RegExp
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .pattern = regexPattern
        End With
    
        If regEx.Test(strInput) Then
            Set matches = regEx.Execute(strInput)
            RegxFunc = matches(0).Value
        Else
            RegxFunc = "not matched"
        End If
    End Function
    
  8. The regex pattern is placed in one of the cells and absolute referencing is used on it.
    ![Excel regex function in-cell usage
    Function will be tied to workbook that its created in.
    If there’s a need for it to be used in different workbooks, store the function in Personal.XLSB

5

  • 3

    Thanks for mentioning it needs to be in Personal.xlsb to be available in all Excel documents you work on. Most (?) other answers don’t make that clear. Personal.XLSB would go in the folder (might need to create the folder) C:\Users\user name\AppData\Local\Microsoft\Excel\XLStart folder

    Jun 7, 2019 at 14:29

  • I chose this approach. However, there is a problem for me with Office 365. I noticed, if I open the xlsm file the other day, formulas with RegxFunc turn #NAME. Actually, to workaround this, I need to recreate the file. Any suggestions?

    – HoRn

    Mar 18, 2021 at 11:47


  • @HoRn #Name? You might want to try this so answer, stackoverflow.com/a/18841575/1699071. It states that the function name and module name were the same. The fix was to rename either the module name or the function name. Other posts on the same page also might help.

    – SAm

    Mar 18, 2021 at 15:38

  • 1

    I gave up on trying to get personal.xlsb working. Instead I put this function in my clipboard buffer’s permanent collection (arsclip) and will just create a new module whenever I need it. It’s laughable how difficult this is for a function that should, by 2021, be native to Excel. PS: Right in the middle of this, Stack asked me to pay for pasting. Y’all, it’s April 2. ^april\x20?0?1$’ fails today. Ya got me.

    – wistlo

    Apr 2, 2021 at 22:14


  • For some people from Non-English countries this may be interesting: You have to use a semicolon “;” instead of a comma “,” in RegxFunc(B5,$C$2)

    – devbf

    Apr 30, 2021 at 9:16