Regex for Developers -- The Only Cheat Sheet You Will Ever Need
Regex for Developers -- The Only Cheat Sheet You Will Ever Need
Regular expressions are one of those skills that every developer needs but nobody enjoys learning. You Google the pattern, copy it into your code, pray it works, and move on. Until it breaks on an edge case and you are back to square one.
This guide is different. Instead of just listing syntax, it teaches you how to read and write regex so you stop depending on Stack Overflow for every pattern.
The Basics -- Building Blocks
Every regex is built from a small set of building blocks. Learn these and you can decode any pattern.
Literal Characters
The simplest regex is just plain text. The pattern hello matches the exact string "hello" inside any text. Nothing special about it.
Character Classes
Square brackets match any single character from a set:
- •[abc] -- matches "a", "b", or "c"
- •[a-z] -- matches any lowercase letter
- •[A-Z] -- matches any uppercase letter
- •[0-9] -- matches any digit
- •[a-zA-Z0-9] -- matches any letter or digit
- •[^abc] -- matches any character EXCEPT "a", "b", or "c" (the caret negates the class)
Shorthand Character Classes
These are shortcuts for common character classes:
- •\d -- any digit (same as [0-9])
- •\D -- any non-digit
- •\w -- any word character (same as [a-zA-Z0-9_])
- •\W -- any non-word character
- •\s -- any whitespace (space, tab, newline)
- •\S -- any non-whitespace
- •. -- any character except newline (the wildcard)
Quantifiers
Quantifiers specify how many times a character or group should appear:
- •a* -- zero or more "a" characters
- •a+ -- one or more "a" characters
- •a? -- zero or one "a" (makes it optional)
- •a{3} -- exactly 3 "a" characters
- •a{2,5} -- between 2 and 5 "a" characters
- •a{2,} -- 2 or more "a" characters
Anchors
Anchors match positions, not characters:
- •^ -- start of string (or start of line in multiline mode)
- •$ -- end of string (or end of line in multiline mode)
- •\b -- word boundary (the position between a word character and a non-word character)
Examples So Far
- •^\d{5}$ -- matches exactly 5 digits (like a US zip code: "90210")
- •\b\w+\b -- matches a whole word
- •^[A-Z] -- matches any string that starts with a capital letter
Groups and Alternation
Groups
Parentheses create groups. Groups let you apply quantifiers to multiple characters and capture matched text:
- •(abc)+ -- matches "abc", "abcabc", "abcabcabc", etc.
- •(\d{3})-(\d{4}) -- matches "555-1234" and captures "555" as group 1 and "1234" as group 2
Non-Capturing Groups
If you need grouping but do not need to capture the match, use (?:...):
- •(?:abc)+ -- same as (abc)+ but does not create a capture group. Use this when you only need the grouping behavior.
Alternation
The pipe character means "or":
- •cat|dog -- matches "cat" or "dog"
- •(Mon|Tue|Wed|Thu|Fri)day -- matches any weekday
- •https?:// -- matches "http://" or "https://" (the ? makes the "s" optional)
Lookahead and Lookbehind
These are "zero-width assertions" -- they check what comes before or after your match without including it in the result.
Positive Lookahead: (?=...)
Matches only if followed by the specified pattern:
- •\d+(?= dollars) -- matches "100" in "100 dollars" but not "100" in "100 euros"
Negative Lookahead: (?!...)
Matches only if NOT followed by the specified pattern:
- •\d+(?! dollars) -- matches "100" in "100 euros" but not in "100 dollars"
Positive Lookbehind: (?<=...)
Matches only if preceded by the specified pattern:
- •(?<=\$)\d+ -- matches "50" in "$50" but not "50" in "50 euros"
Negative Lookbehind: (?<!...)
Matches only if NOT preceded by the specified pattern:
- •(?<!\$)\d+ -- matches "50" in "50 euros" but not in "$50"
Copy-Paste Patterns for Common Tasks
These are tested, production-ready patterns. Copy them directly into your code.
Email Validation (Practical)
> ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This handles 99% of real email addresses. It checks for: characters before @, a domain name, and a TLD of at least 2 characters. It is not RFC 5322 compliant (nothing practical is), but it works for real-world validation.
URL Matching
> https?:\/\/[\w\-]+(\.[\w\-]+)+[\/\w\-._~:?#\[\]@!$&'()+,;=]
Matches both http and https URLs with paths, query strings, and fragments.
Phone Numbers (US Format)
> ^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Matches: (555) 123-4567, 555-123-4567, 5551234567, +1 555 123 4567, and similar variations.
Password Strength
At least 8 characters, with uppercase, lowercase, digit, and special character:
> ^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$
This uses four positive lookaheads to check each requirement independently, then matches the actual characters.
IPv4 Address
> ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
Validates proper IP ranges (0-255 for each octet). The pattern 192.168.1.1 matches, but 999.999.999.999 does not.
Date (YYYY-MM-DD)
> ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Matches ISO date format with valid month (01-12) and day (01-31) ranges. Does not validate February 30th -- use a date library for that.
HTML Tags
> <([a-zA-Z][a-zA-Z0-9])\b[^>]>(.*?)<\/\1>
Matches opening and closing HTML tags and captures the content between them. The \1 is a backreference that matches whatever the first group captured (the tag name).
Warning: Do not use regex to parse HTML in production. Use a proper HTML parser. This pattern is useful for quick searches and one-off scripts.
Hex Color Code
> ^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Matches both 3-digit and 6-digit hex colors: #fff, #FF0000, #a3b2c1.
Slug (URL-Friendly String)
> ^[a-z0-9]+(?:-[a-z0-9]+)*$
Matches valid URL slugs like "my-blog-post" or "hello123". No uppercase, no consecutive hyphens, no leading/trailing hyphens.
Regex in Different Languages
JavaScript
> const pattern = /^\d{3}-\d{4}$/;
> pattern.test("555-1234"); // true
>
> const match = "price: $42.99".match(/\$(\d+\.\d{2})/);
> // match[1] is "42.99"
>
> "hello world".replace(/world/, "regex"); // "hello regex"
> "a-b-c".split(/-/); // ["a", "b", "c"]
Python
> import re
>
> re.match(r"^\d{3}-\d{4}$", "555-1234") # Match object
> re.search(r"\$(\d+\.\d{2})", "price: $42.99").group(1) # "42.99"
> re.sub(r"world", "regex", "hello world") # "hello regex"
> re.split(r"-", "a-b-c") # ["a", "b", "c"]
> re.findall(r"\d+", "age 25, height 180") # ["25", "180"]
Go
> matched, _ := regexp.MatchString("^[a-z]+$", "hello") // true
>
> re := regexp.MustCompile("(\\d+)")
> result := re.FindString("age 25") // "25"
> all := re.FindAllString("25 and 30", -1) // ["25", "30"]
Flags
Flags modify how the regex engine behaves:
- •g (global) -- find all matches, not just the first one
- •i (case-insensitive) -- "hello" matches "Hello", "HELLO", etc.
- •m (multiline) -- ^ and $ match start/end of each line, not the whole string
- •s (dotall) -- makes . match newlines too
- •u (unicode) -- enables full Unicode support
In JavaScript: /pattern/gi
In Python: re.compile(pattern, re.IGNORECASE | re.MULTILINE)
Reading Complex Regex
When you encounter a scary regex, break it down left to right:
Take this pattern: ^(?=.[A-Z])(?=.\d)[A-Za-z\d]{8,}$
- 1^ -- start of string
- 2(?=.*[A-Z]) -- lookahead: somewhere in the string, there must be an uppercase letter
- 3(?=.*\d) -- lookahead: somewhere in the string, there must be a digit
- 4[A-Za-z\d]{8,} -- 8 or more letters or digits
- 5$ -- end of string
Reading: "The string must contain at least one uppercase letter and one digit, and must be at least 8 characters long, consisting only of letters and digits."
Tips for Reading Regex
- •Read left to right, one piece at a time
- •Identify the anchors first (^ and $) to understand what the pattern is matching against
- •Find the groups and understand what each one captures
- •Look for lookaheads/lookbehinds -- they add conditions without consuming characters
- •Test with real examples using a tool like regex101.com
Performance Pitfalls
Catastrophic Backtracking
Some patterns cause the regex engine to try an exponential number of combinations. This pattern is dangerous:
> (a+)+b
On the input "aaaaaaaaaaaaaac", the engine tries every possible way to split the "a" characters between the inner and outer groups before concluding there is no match. This can freeze your application.
Prevention:
- •Avoid nested quantifiers like (a+)+
- •Use atomic groups or possessive quantifiers when available
- •Set a timeout on regex execution in production code
- •Test your patterns with long inputs before deploying
Use Specific Patterns
- •Use \d instead of . when you expect digits
- •Use [^"] instead of .?** when matching quoted strings
- •Use anchors (^ and $) whenever the match should cover the full string
Specific patterns are faster because the engine has fewer choices to make.
Practical Debugging Workflow
- 1Start with the simplest version of your pattern that matches one example
- 2Add complexity one piece at a time, testing after each addition
- 3Test edge cases: empty strings, very long strings, strings with special characters
- 4Use regex101.com -- it shows you step-by-step how the engine processes your pattern
- 5Add comments to complex regex using the verbose/extended flag (x in Python, or use separate variables in JavaScript)
Quick Reference Card
| Pattern | Meaning |
|---|---|
| . | Any character except newline |
| \d \w \s | Digit, word char, whitespace |
| \D \W \S | NOT digit, word char, whitespace |
| [abc] | Any of a, b, or c |
| [^abc] | NOT a, b, or c |
| a* a+ a? | 0+, 1+, 0 or 1 |
| a{3} a{2,5} | Exactly 3, between 2 and 5 |
| ^ $ \b | Start, end, word boundary |
| (...) | Capture group |
| (?:...) | Non-capturing group |
| a|b | a or b |
| (?=...) | Positive lookahead |
| (?!...) | Negative lookahead |
Regex is a tool, not a language to master completely. Know the basics, keep a cheat sheet handy, and use a testing tool for complex patterns. That is all you need.
For more developer guides and free tools, check out our blog and explore our developer tools.