Regular expressions (RegEx) are a formal language to define simple patterns. It is commonly used to find interesting parts within a larger body of text or to validate data. It is typically fast and sometimes a clean solution. However, in some cases, an attacker can craft input to a vulnerable regular expression which causes the time to check the input for the regex to be way longer than expected.
Why you should care
I’ve only found one example of a successful ReDoS “attack”. In 2016, 34 minutes of outage of StackOverflow was caused by ReDoS (source).
Wikipedia and OWASP don’t mention a single successful attack. I guess the reason for that is that RegEx is not used that often on the server-side π€·βοΈ There are a lot of parsing tools for Python, but I only vaguely remember using pyparsing once.
In many cases when you might want to use a regex, doing it completely correctly is just overkill. For example, email regexes. Instead of jumping in that rabbit hole, you can just do some super basic checks, e.g. if the length is at least 3 and if there is exactly one @ symbol in there. Then you just send a confirmation email which verifies that the user actually has access to this email address.
For URLs, you can typically just ping them. For HTML, you want to use an HTML parser. For many other cases, using split goes a long way.
What I want to say: It might be that you don’t need to worry as you might not use RegEx in most of your code. It’s interesting nevertheless π
An Introduction to PCRE
Perl Compatible Regular Expressions (PCRE) are the most common style of regular expressions. They are a way to specify types of strings. They are way more specific than just wildcard pattern matching.
For example, the following pattern matches first a single uppercase letter and then arbitrary many lowercase letters:
[A-Z][a-z]*
The characters within the square brackets [...] define the character class. In this case, all characters from A to Z . The * is a quantifier and means “at least zero, but up to arbitrary many”.
RegEx Engines
You’ve seen in the previous section how regular expressions are written. Next you need to get a grasp of how it’s checked if a regular expression matches a given text. That’s done by a RegEx engine.
The RegEx engine of Python uses recursive backtracking, which can have exponential asymptotic execution time (see Optimizing Unicode Regular Expression Evaluation with Previews). There are engines like pyre2 which use the Thompson NFA algorithm. The only reason I see for using the backtracking algorithm are backreferences, e.g. when you want to check if a word is duplicated. you can recognize those by \1 or similar in the regex.
The idea of recursive backtracking is to match the string step-by-step. It keeps a list of possibilities that could match and eliminates them step-by-step.
How does a ReDoS Attack work?
Think about the regular expression (a|a)+b as represented here:
Non-deterministic finite automaton (NFA) representation of the regex “(a|a)+b”. Created via regexper.com
Although both sides of the first group are identical — just “a” — the recusive backtracking algorithm will still keep track of both possible matches. Meaning with every “a” in the input, it doubles the possible paths to match. Meaning if you have 26 times a, there will be 2²βΆ = 67,108,864 paths to evaluate:
import re
pattern = re.compile("**(a|a)+b**")
evil_input = "aaaaaaaaaaaaaaaaaaaaaaaaaa"
if pattern.match(evil_input):
print("π£π§¨π₯")
Executing this takes almost 7 seconds on my machine — impressively fast considering the number of possible options, but extremely slow if you remember that you can see at a single glance that this string cannot match.
If you have a vulnerable regular expression — also called “evil regex” — and an input that causes the exponential matching time, the machine will take quite some CPU resources to evaluate the expression on the input. In the worst case, your service will become unavailable to the other users of that machine. The regular expression denial of service attack (ReDOS) was successful.
How can I prevent ReDos attacks?
If you don’t use regular expressions and don’t provide your users the option to use them, you’re save.
If you use a regular expression engine which does not use backtracking, you should also be save.
Or if your regular expressions are not “evil” you also don’t have this issue. However, recognizing evil regular expressions is hard: How can I recognize an evil regex? Because computers do exactly what you tell them to do, even if it's not what you meant or is totally unreasonable. If…stackoverflow.com is this regex vulnerable to REDOS attacks Testing on regex101.com shows that there are no combinations of inputs that create runaway checks - but your regex is…stackoverflow.com
What’s next?
In this series about application security (AppSec) we already explained some of the techniques of the attackers π and also techniques of the defenders π:
- Part 1: SQL Injections ππ
- Part 2: Don’t leak Secrets π
- Part 3: Cross-Site Scripting (XSS) ππ
- Part 4: Password Hashing π
- Part 5: ZIP Bombs π
- Part 6: CAPTCHA π
- Part 7: Email Spoofing π
- Part 8: Software Composition Analysis (SCA) π
- Part 9: XXE attacks ππ
- Part 10: Effective Access Control π
- Part 11: DOS via a Billion Laughs π
- Part 12: Full Disk Encryption π
- Part 13: Insecure Deserialization π
- Part 14: Docker Security π
- Part 15: Credential Stuffing ππ
- Part 16: Multi-Factor Authentication (MFA/2FA) π
- Part 17: ReDoS π
The following articles are about to come:
- Part 18: Secure Messaging π
- Part 19: Cryptojacking π
- Part 20: Backups π
- Part 21: Cryptotrojans π
- Part 22: Single-Sign-On π
- Part 23: Clipboard Hijacking π
- Part 24: Certificates π
- Part 25: Race Condition Attacks in Blockchains π
- Part 26: Mobile Device Management (MDM) π
- Part 27: Server-Side Request Forgery (SSRF) π
- Part 28: Network Separation π
- Part 29: Social Engineering (including Phising) π
- Part 30: Virtual Private Networks (VPNs) π
- Part 31: CSRF π
Let me know if you are interested in more articles around AppSec / InfoSec!
I love writing about software development and technology π€© Don’t miss updates: **Get my free email newsletter** π§ or sign up for Medium βοΈ if you haven’t done it yet — both encourage me to write more π€