Uploaded image for project: 'SonarJava'
  1. SonarJava
  2. SONARJAVA-3564

Implement intersects and supersetOf helper for regex automata

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.10
    • Component/s: Rules
    • Labels:

      Description

      Since multiple rules will need to calculate whether two (sub)automata intersect or whether one is a subset or superset of the other, we should provide common helper methods for this. Since sometimes we need to check this for the prefix of an automaton, this option should also be supported.

      type Sub = {startState, endState, allowPrefix}
      
      Sub.isAtEnd = startState == endState
      Sub.$stateMethod = {startState.$stateMethod, endState, allowPrefix}
      
      /**
        * If both sub-automata have allowPrefix set to true, this method will check whether auto1 intersects
        * the prefix of auto2 or auto2 intersects the prefix of auto1. This is different than checking whether
        * the prefix of auto1 intersects the prefix of auto2 (which would always be true because both prefix
        * always contain the empty string).
        * defaultAnswer will be returned in case of unsupported features or the state limit is exceeded.
        * It should be whichever answer does not lead to an issue being reported to avoid false positives.
        */
      boolean intersects(Sub auto1, Sub auto2, boolean defaultAnswer) {
        start1, start2 = (auto1, auto2).startState
        if cache size above limit:
          return defaultAnswer
        if start1 and/or start2 is lookbehind:
          // lookbehind not supported
          return defaultAnswer
        if start1 and/or start2 is a backreference:
          // We could support backreferences by having a stack of subautomata, onto which we push the referenced group,
          // but for now we'll simply bail here
          return defaultAnswer
        if start1 and/or start2 is repetion with finite max > 1:
          // Properly supporting fixed-max loops would require unrolling the automaton, potentially making it huge
          // Technically the case where min > 1 is unsupported for the same reason, but in that case we treat it
          // as if min were 1, which should hopefully not produce a lot of FPs
          return defaultAnswer
      
        if both auto and start2 have already been visited together:
          return the cached result for (start1, start2) or defaultAnswer if there is none because we're currently in the process of calculating it
        if both automata are at the end:
          return true
        if auto1.isAtEnd:
          return auto2.allowPrefix
        if auto2.isAtEnd:
          return auto1.allowPrefix
        if start1 is negation:
          return !supersetOf(auto1.successor, auto2, !defaultAnswer)
        if start2 is negation:
          return !subsetOf(auto1, auto2.successor, !defaultAnswer)
        if start1 is lookahead:
          lookaheadAuto = {start1.element, start1.endOfLookaheadState, auto1.allowPrefix}
          continuationAuto = {auto2.startState, auto2.endState, true}
          return intersects(lookaheadAuto, continuationAuto, defaultAnswer) && intersects(auto1.continuation, auto2, defaultAnswer)
        if start2 is lookahead:
          lookaheadAuto = {start2.element, start2.endOfLookaheadState, auto2.allowPrefix}
          continuationAuto = {auto1.startState, auto1.endState, true}
          return intersects(continuationAuto, lookaheadAuto, defaultAnswer) && intersects(auto1, auto2.continuation, defaultAnswer)
        if start1 has epsilon edge:
          return start1.successors.any(succ -> intersects(auto1 with {startState = succ}, auto2, defaultAnswer)
        if start2 has epsilon edge:
          return start2.successors.any(succ -> intersects(auto1, auto2 with {startState = succ}, defaultAnswer)
        assert both start1 and start2 have character edge
        return charactersIntersect(successor1, successor2) && start1.successors.any(succ1 ->
          start2.successors.any(succ2 ->
            intersects(auto1 with succ1, auto2 with succ2, defaultAnswer)
          )
        )
      }
      
      subsetOf(auto1, auto2, defaultAnswer) = supersetOf(auto2, auto1, defaultAnswer)
      
      /**
        * Here auto2.allowPrefix means that if supersetOf(auto1, auto2), then for every string matched by auto2, auto1 can match a prefix of it
        * auto1.allowPrefix means that if supersetOf(auto1, auto2), then for every string matched by auto2, auto1 can match a continuation of it
        * If both are set, it means either one can be the case.
        */
      boolean supersetOf(auto1, auto2, defaultAnswer) {
        start1, start2 = (auto1, auto2).startState
        if cache size above limit:
          return defaultAnswer
        if start1 and/or start2 is lookbehind:
          // lookbehind not supported
          return defaultAnswer
        if start1 and/or start2 is a backreference:
          // We could support backreferences by having a stack of subautomata, onto which we push the referenced group,
          // but for now we'll simply bail here
          return defaultAnswer
        if start1 and/or start2 is repetion with finite max > 1:
          // Properly supporting fixed-max loops would require unrolling the automaton, potentially making it huge
          // Technically the case where min > 1 is unsupported for the same reason, but in that case we treat it
          // as if min were 1, which should hopefully not produce a lot of FPs
          return defaultAnswer
      
        if both auto and start2 have already been visited together:
          return the cached result for (start1, start2) or defaultAnswer if there is none because we're currently in the process of calculating it
        if both automata are at the end:
          return true
        if auto1.isAtEnd:
          return auto2.allowPrefix
        if auto2.isAtEnd:
          return auto1.allowPrefix
        if start1 is negation:
          return !intersects(auto1.successor, auto2, !defaultAnswer)
        if start2 is negation:
          return !intersects(auto1, auto2.successor, !defaultAnswer)
        if start1 is lookahead:
          lookaheadAuto = {start1.element, start1.endOfLookaheadState, auto1.allowPrefix}
          continuationAuto = {auto2.startState, auto2.endState, true}
          return supersetOf(lookaheadAuto, continuationAuto, defaultAnswer) && supersetOf(auto1.continuation, auto2, defaultAnswer)
        if start2 is lookahead:
          lookaheadAuto = {start2.element, start2.endOfLookaheadState, auto2.allowPrefix}
          continuationAuto = {auto1.startState, auto1.endState, true}
          return supersetOf(continuationAuto, lookaheadAuto, defaultAnswer) && supersetOf(auto1, auto2.continuation, defaultAnswer)
        if start1 has epsilon edge:
          return start1.successors.any(succ -> supersetOf(auto1 with {startState = succ}, auto2, defaultAnswer)
        if start2 has epsilon edge:
          return start2.successors.all(succ -> supersetOf(auto1, auto2 with {startState = succ}, defaultAnswer)
        assert both start1 and start2 have character edge
        return characterIsSupersetOf(successor1, successor2) && start2.successors.all(succ2 ->
          start1.successors.any(succ1 ->
            supersetOf(auto1 with succ1, auto2 with succ2, defaultAnswer)
          )
        )
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alban.auzeill Alban Auzeill
                Reporter:
                sebastian.hungerecker Sebastian Hungerecker
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: