Uploaded image for project: 'Rules Repository'
  1. Rules Repository
  2. RSPEC-5998

Regular expressions should not overflow the stack

    XMLWordPrintable

    Details

    • Type: Bug Detection
    • Status: Active
    • Resolution: Unresolved
    • Labels:
    • Message:
      Refactor this repetition that can lead to a stack overflow for large inputs.
    • Highlighting:
      Hide

      The offending repetition

      Show
      The offending repetition
    • List of parameters:
      Hide
      • key: maxStackConsumptionFactor
      • default: TBD
      • description: An indicator approximately proportional to how quickly the stack grows relative to the input size. An issue will be reported if the value for a regex exceeds the maximum set here. Setting this to 0 will cause an issue to be reported for all regular expressions with non-constant stack consumption.
      Show
      key: maxStackConsumptionFactor default: TBD description: An indicator approximately proportional to how quickly the stack grows relative to the input size. An issue will be reported if the value for a regex exceeds the maximum set here. Setting this to 0 will cause an issue to be reported for all regular expressions with non-constant stack consumption.
    • Default Severity:
      Major
    • Impact:
      Low
    • Likelihood:
      High
    • Default Quality Profiles:
      Sonar way
    • Covered Languages:
      Java
    • Remediation Function:
      Constant/Issue
    • Constant Cost:
      20min
    • Analysis Scope:
      Main Sources

      Description

      The Java regex engine uses recursive method calls to implement backtracking. Therefore when a repetition inside a regular expression contains multiple paths (i.e. the body of the repetition contains an alternation (|), an optional element or another repetition), trying to match the regular expression can cause a stack overflow on large inputs. This does not happen when using a possessive quantifier (such as *+ instead of *) or when using a character class inside a repetition (e.g. [ab]* instead of (a|b)*).

      The size of the input required to overflow the stack depends on various factors, including of course the stack size of the JVM. One thing that significantly increases the size of the input that can be processed is if each iteration of the repetition goes through a chain of multiple constant characters because such consecutive characters will be matched by the regex engine without invoking any recursion.

      For example, on a JVM with a stack size of 1MB, the regex (?:a|b)* will overflow the stack after matching around 6000 characters (actual numbers may differ between JVM versions and even across multiple runs on the same JVM) whereas (?:abc|def)* can handle around 15000 characters.

      Since often times stack growth can't easily be avoided, this rule will only report issues on regular expressions if they can cause a stack overflow on realistically sized inputs. You can adjust the maxStackConsumptionFactor parameter to adjust this.

      Noncompliant Code Example

      Pattern.compile("(a|b)*"); // Noncompliant
      Pattern.compile("(.|\n)*"); // Noncompliant
      Pattern.compile("(ab?)*"); // Noncompliant
      

      Compliant Solution

      Pattern.compile("[ab]*"); // Character classes don't cause recursion the way that '|' does
      Pattern.compile("(?s).*"); // Enabling the (?s) flag makes '.' match line breaks, so '|\n' isn't necessary
      Pattern.compile("(ab?)*+"); // Possessive quantifiers don't cause recursion because they disable backtracking
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              sebastian.hungerecker Sebastian Hungerecker
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: