Currently S5852 only reports on nested repetitions (that are unbounded and non-possessive), but catastrophic backtracking can also occur when an unbounded repetition contains multiple paths that can match the same input (e.g. overlapping alternatives or an option followed by another overlapping option).
Further in cases where a match can never fail after matching a repetition, no backtracking will occur, which isn't currently not taken into account, leading to false positives.
To fix these issues we should extend the existing implementation as follows:
When checking a repetition, we shouldn't just check whether it contains another repetition, but also any node with multiple outgoing "backtrackable" transitions (i.e. any disjunction or non-possessive (but possibly bounded) repetition¹)), such that any of the subautomata targeted by any of the transitions intersects with any of the others (which we can check using the helper method for intersections).
After we determined that a repetition is problematic, we should then check whether its continuation can either match the empty string (but not if we know that the regex is used for full matches (e.g. as an argument to matches) or is .*. If that is the case, we shouldn't raise the issue.
¹ In some cases the node with multiple transitions will be the branch node associated with the repetition rather than the repetition itself.
- depends upon
-
SONARJAVA-3549 Add support for automata-based analyses for regular expressions
-
- Closed
-
-
SONARJAVA-3551 Implement helper to find whether state in regex automaton is reachable without consuming input
-
- Closed
-
-
SONARJAVA-3564 Implement intersects and supersetOf helper for regex automata
-
- Closed
-
- relates to
-
MMF-2182 Help Java developers writing regexp running fast, with the correct amount of resources and really doing what developers intended
-
- Resolved
-
-
RSPEC-5852 Using slow regular expressions is security-sensitive
- Active