Template Builder REGEX (Regular Expressions)
Cleaning and filtering data can sometimes go beyond the basic find and replace or "equals", "greater than" or "less than" statement. In these instances, there are Regular Expressions or REGEX for short. The NinjaCat Template Builder includes support for the use of REGEX for the purpose of text manipulation to aid in data cleaning. This article is a brief resource with some general REGEX knowledge, tips for use within NinjaCat, and links to external resources to broaden your REGEX knowledge.
Resources
- A place to learn Regex from basics to Advanced functions, step-by-step explanations, and teaching examples RegexOne
- A place to test Regex, a testing environment to experiment with Regex Regex101
Regex Basics
- What it does
- Regex can look for dynamic variables, any sequence of characters that we specify, rather then having to match an exact string
- Regex operates left to right, character followed by another character
Flags | |
---|---|
g | Global search |
i | Ignore case, can be a capital or lower case |
M | Multi-line search, continues after carriage returns |
Characters | |
---|---|
// | Slashes are delimiters that indicate the beginning and end of your regular expression |
$ | End of the string |
\ | “Escape it” Using a backslash before a reserved character actually looks for the actual character rather than functioning as a reserved character |
. | Period means look for any character |
+ | When using + this looks at the previous character to say look for one or more of the previous character |
* | Asterisk or star, matches as many times as possible, matches 0 or more of the preceding character |
[ ] | Braces match anything within the braces - Example: [A-Z] matches any character A-Z - Example: [A-Z]+ matches any character A-Z any number of times - Example: [ENAM] matches any character E or N or A or M - Example: [ENAM]+ matches any character E or N or A or M any number of times |
^ | Start of String when used outside of Brackets - inverse when used in braces match anything that is not - Example: [^ENAM] matches any character thats that are not E or N or A or M |
\s | |
\S | |
{ } | |
( ) | |
$ | |
? |
- Characters
- //
- Slashes are delimiters that indicate the beginning and end of your regular expression
- $
- End of the string
- \
- “Escape it” Using a backslash before a reserved character actually looks for the actual character rather than functioning as a reserved character
- .
- Period means look for any character
- +
- When using + this looks at the previous character to say look for one or more of the previous character
- *
- Asterisk or star, matches as many times as possible, matches 0 or more of the preceding character
- [ ]
- Braces match anything within the braces
- Example: [A-Z] matches any character A-Z
- Example: [A-Z]+ matches any character A-Z any number of times
- Example: [ENAM] matches any character E or N or A or M
- Example: [ENAM]+ matches any character E or N or A or M any number of times
- ^
- Start of String when used outside of Brackets
- inverse when used in brackets match anything that is not
- Example: [^ENAM] matches any character thats that are not E or N or A or M
- \s
- Representation of a “space” you can use a “space” or the representation of a space to indicate a space
- Example: [^\s] matches anything that is not a space one time
- Example: [^\s]+ matches anything that is not a space until a space
- Representation of a “space” you can use a “space” or the representation of a space to indicate a space
- \S
- Represents any non-whitespace, excludes a space, tab, or new line
- { }
- Brackets used to specify number of matches you want for the preceding character or set
- Example: a{3} matches “a” 3 times
- Example: a{3,} matches “a” 3 or more times
- Example: a{3,6} matches “a” between 3 and 6 times
- Brackets used to specify number of matches you want for the preceding character or set
- ( )
- Parentheses, creates a capture group, match anything between these parentheses
- Using question mark, colon (?: ) will indicate capture what i want in the parentheses but don’t create a group
- $
- Used for returning a capture group
- Example: $1 will return whatever is between the first set of ( )
- Used for returning a capture group
- ?
- Question mark, look for 0 - 1 of the preceding character
- Braces match anything within the braces
- //
Regex Filter Examples
- Contains This but NOT That
^(?=.*contains this)(?!.*but does not contain that).* - Any string that contains apples, pears or peaches but NOT peas, onions or carrots:
^(?=.*(apples|pears|peaches).*)(?!.*(peas|onions|carrots).*).*$ - Contains This AND That
^(?=._contains this)(?=._and contains that).* - Contains This OR That
(this|that) - Contains This OR This OR This
(contains this|or this|or this) - NOT-Contains This OR That
^(?!.not contains this|._that).
^(?!.not contains this)(?!._and not contains that). - NOT This OR That
^(?!this|that).* - Include Special Characters that Regex would recognise as a command in a search string by using a backslash before the special character
- ex: source medium filter
- Google / cpc | facebook / paidsocial
- (google / cpc|facebook / paidsocial)
- Google / cpc | facebook / paidsocial
- ex: source medium filter
- ^New\s\w*$
- Starts with ‘New’, followed by a space, followed by any word, followed by nothing else. For example
- New Nissan
- New Kia
- Would EXCLUDE ‘New Kia Soul’
- Starts with ‘New’, followed by a space, followed by any word, followed by nothing else. For example
- ^New\s\w*\s.*
- Starts with ‘New’, followed by a space, followed by any word, followed by a space, followed by anything. Examples:
- New Kia Soul
- New Kia Honda
- New Kia Soul Is So Cool
- New Honda Pilot is what my husband drives
- Would EXCLUDE ‘New Kia’ or ‘New Honda’
- Starts with ‘New’, followed by a space, followed by any word, followed by a space, followed by anything. Examples:
Find and Replace (make sure to check mark Regex)
- Custom Dimension Rename Regex
- Take the entire string and replace it, when it contains This
- Find:
- ^.*(This).*$
- Replace:
- With whatever string you type
- Find:
- Custom Dimension Rename Regex
- Take the entire string and replace it, when it contains This OR That
- Find:
- ^.*(This|That).*$
- Replace:
- With whatever string you type
- Find:
- Take the entire string and replace it, when it contains This OR That
- Take the entire string and replace it, when it contains This
- Find Groups of text within a string separated by a dash and Replace the Name with just one Group of the String
- Find:
- ^(.*)-(.*)-(.*)-(.*)-(.*)-(.*)$
- Replace:
- $1 $2 $3
- Find:
- Find the first string of contiguous digits and replace with only the same string of digits
- Find:
- (\d+) matches contiguous strings of digits.
- Replace:
- $1 replaces any match with the result of the first capturing group.
- Find:
- The "Find" will match any character that is not a number, and the empty replace will cause all non-numbers to be replaced with nothingness
- Find:
- [^0-9]
- Replace:
- Find:
- Find the first group of numbers and replace with only the first group of numbers. That will match the whole thing, and capture the first group of numbers. Then replace the match with the capture.
Find:- .*?([0-9]+).*
- Replace:
- $1
- Remove one blank at the end of a string of character. Note this will only remove ONE blank at end - not all blanks.
- Find:
- ^(.*)\s+$
- Replace:
- $1
- In a string like, “Darwin Family Dental Care | FL” remove all characters through to and including the space after the “|” leaving just FL.
- Find:
- ^(.*)\s+$
- Replace:
- $1
- Find:
- Creative Use for Dimension Rename Filtering using Replacement (credit to Aidan)
- Use Case:
- Specifically in charts only one Dimension Can be selected
- Filters can be applied but the filter will be applied after the Replacement
- EXAMPLE:
- Campaign name looks like:
- “WR - Search - LocationName”
- Filter “Starts with”
- “WR - Search - “
- This returns all campaign names that start with “WR - Search - “
- “WR - Search - “
- Dimension Replacement:
- Find:
- WR - Search -
- Replace:
- “Blank”
- Find:
- Result:
- No data is returned because we are trying to filter on campaign names starting with “WR - Search - “ but the campaign names no longer start with “WR - Search - “ Because we re-named them trying to shorten them to just the “locationName”
- Solution
- We take a reverse approach by excluding and re-naming
- We use a Regex to find all results that dont match “WR - Search - “ and re-name all the non matching campaign names to “A”
- So we are keeping the campaign names we want to return intact in their original names
- We add a second replacement to remove “WR - Search - “
- Then we apply a “campaign name” filter
- Not Equal to “A”
- The result is campaign names showing just the Location name with the prefix “WR - Search - “removed and all non “WR - Search - “ campaigns filtered out
- Campaign name looks like:
- EXAMPLE:
- Use Case:
#HASHTAGS
-
Isolating #HASHTAGS within a Post string to combine and count them
-
-
pavement The lake is a long way from here.
-
maximum A song can make or ruin a person’s day if they let it get to them.
-
migration My Mum tries to be cool by saying that she likes all
-
Find: ^(#\w*)?.*$
-
Replace: $1w
-
Result:
- /#pavement
- /#maximum
- /#migration
-
-
-
Jays EXAMPLES:
- String = Dedicated Hosting - ModBroad
Find = ^(.*) - (.*)$
Replace = $1
Result = Dedicated Hosting
Bucketing/Grouping Numbers:
-
Example: Creating custom age group buckets to display in charts and tables. The client wants to bucket ages into custom groups. 22-31, 32-41, 42-51, 52-61, 62-71, 72-81, 112-121
-
Use the following regex in FIND
-
^2[2-9]|3[0-1]$
-
Explanation, start at the beginning of the string, the first digit must be a 2, and the second digit must include or between 2-9, OR look for the first digit 3 and the second digit including and between 0-1
-
Use the following in REPLACE
- 22-31
-
-
-
Add additional finds and renames for all numbered groups
-
Updated 3 days ago