Uploaded image for project: 'SonarJava'
  1. SonarJava
  2. SONARJAVA-3725

Support lookahead in RegexTreeHelper.intersects and supersetOf methods



    • Type: False Negative
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Rules
    • Labels:


      Currently RegexTreeHelper.intersects and .supersetOf don't support lookahead assertions appearing within the sub-expressions being compared. This can lead to false negatives in the linked rules.


      Pattern.compile("\\w(?=\\w)|\\d(?=\\d)"); // FN, should raise S5855 "Remove or rework this redundant alternative"
      Pattern.compile(".*foo(?!bar).*foo"); // FN should raise S5852 "Make sure the regex used here, which is vulnerable to quadratic runtime due to backtracking, cannot lead to denial of service."

      The algorithm that can be used to support positive and negative lookahead was already described in SONARJAVA-3564, but cut out of the original implementation for time reasons.

      The algorithm described in the linked ticket is wrong. To see why, consider the following (silly) example: (?=[ab])[bc] is equivalent to [ac] and does not intersect with the regex 'b', yet the algorithm would assume it does because both [ab] and [bc] do intersect with 'b'.

      The proper algorithm to determine whether '(?=α)β' intersects with a regex γ would be to first calculate the intersection automaton of α and β and then check whether that automaton intersects γ. To calculate the intersection automaton, one could use the same logic as for checking intersection - except that instead of just returning true when two constructs intersect, one would need to build up an automaton.

      Note that the number of states in the intersection automaton (as well as the negated automaton that'd have to be built for negative lookahead) is potentially exponential, so we should make sure to bail out when the generated automaton becomes too large.

      Since that's more effort than the initially imagined algorithm, we should wait to see whether the lack of support for lookaheads poses a problem in practice.


          Issue Links



              Unassigned Unassigned
              sebastian.hungerecker Sebastian Hungerecker
              0 Vote for this issue
              1 Start watching this issue