Since multiple rules will need to calculate whether two (sub)automata intersect or whether one is a subset or superset of the other, we should provide common helper methods for this. Since sometimes we need to check this for the prefix of an automaton, this option should also be supported.
type Sub = {startState, endState, allowPrefix} Sub.isAtEnd = startState == endState Sub.$stateMethod = {startState.$stateMethod, endState, allowPrefix}
/** * If both subautomata have allowPrefix set to true, this method will check whether auto1 intersects * the prefix of auto2 or auto2 intersects the prefix of auto1. This is different than checking whether * the prefix of auto1 intersects the prefix of auto2 (which would always be true because both prefix * always contain the empty string). * defaultAnswer will be returned in case of unsupported features or the state limit is exceeded. * It should be whichever answer does not lead to an issue being reported to avoid false positives. */ boolean intersects(Sub auto1, Sub auto2, boolean defaultAnswer) { start1, start2 = (auto1, auto2).startState if cache size above limit: return defaultAnswer if start1 and/or start2 is lookbehind: // lookbehind not supported return defaultAnswer if start1 and/or start2 is a backreference: // We could support backreferences by having a stack of subautomata, onto which we push the referenced group, // but for now we'll simply bail here return defaultAnswer if start1 and/or start2 is repetion with finite max > 1: // Properly supporting fixedmax loops would require unrolling the automaton, potentially making it huge // Technically the case where min > 1 is unsupported for the same reason, but in that case we treat it // as if min were 1, which should hopefully not produce a lot of FPs return defaultAnswer if both auto and start2 have already been visited together: return the cached result for (start1, start2) or defaultAnswer if there is none because we're currently in the process of calculating it if both automata are at the end: return true if auto1.isAtEnd: return auto2.allowPrefix if auto2.isAtEnd: return auto1.allowPrefix if start1 is negation: return !supersetOf(auto1.successor, auto2, !defaultAnswer) if start2 is negation: return !subsetOf(auto1, auto2.successor, !defaultAnswer) if start1 is lookahead: lookaheadAuto = {start1.element, start1.endOfLookaheadState, auto1.allowPrefix} continuationAuto = {auto2.startState, auto2.endState, true} return intersects(lookaheadAuto, continuationAuto, defaultAnswer) && intersects(auto1.continuation, auto2, defaultAnswer) if start2 is lookahead: lookaheadAuto = {start2.element, start2.endOfLookaheadState, auto2.allowPrefix} continuationAuto = {auto1.startState, auto1.endState, true} return intersects(continuationAuto, lookaheadAuto, defaultAnswer) && intersects(auto1, auto2.continuation, defaultAnswer) if start1 has epsilon edge: return start1.successors.any(succ > intersects(auto1 with {startState = succ}, auto2, defaultAnswer) if start2 has epsilon edge: return start2.successors.any(succ > intersects(auto1, auto2 with {startState = succ}, defaultAnswer) assert both start1 and start2 have character edge return charactersIntersect(successor1, successor2) && start1.successors.any(succ1 > start2.successors.any(succ2 > intersects(auto1 with succ1, auto2 with succ2, defaultAnswer) ) ) }
subsetOf(auto1, auto2, defaultAnswer) = supersetOf(auto2, auto1, defaultAnswer) /** * Here auto2.allowPrefix means that if supersetOf(auto1, auto2), then for every string matched by auto2, auto1 can match a prefix of it * auto1.allowPrefix means that if supersetOf(auto1, auto2), then for every string matched by auto2, auto1 can match a continuation of it * If both are set, it means either one can be the case. */ boolean supersetOf(auto1, auto2, defaultAnswer) { start1, start2 = (auto1, auto2).startState if cache size above limit: return defaultAnswer if start1 and/or start2 is lookbehind: // lookbehind not supported return defaultAnswer if start1 and/or start2 is a backreference: // We could support backreferences by having a stack of subautomata, onto which we push the referenced group, // but for now we'll simply bail here return defaultAnswer if start1 and/or start2 is repetion with finite max > 1: // Properly supporting fixedmax loops would require unrolling the automaton, potentially making it huge // Technically the case where min > 1 is unsupported for the same reason, but in that case we treat it // as if min were 1, which should hopefully not produce a lot of FPs return defaultAnswer if both auto and start2 have already been visited together: return the cached result for (start1, start2) or defaultAnswer if there is none because we're currently in the process of calculating it if both automata are at the end: return true if auto1.isAtEnd: return auto2.allowPrefix if auto2.isAtEnd: return auto1.allowPrefix if start1 is negation: return !intersects(auto1.successor, auto2, !defaultAnswer) if start2 is negation: return !intersects(auto1, auto2.successor, !defaultAnswer) if start1 is lookahead: lookaheadAuto = {start1.element, start1.endOfLookaheadState, auto1.allowPrefix} continuationAuto = {auto2.startState, auto2.endState, true} return supersetOf(lookaheadAuto, continuationAuto, defaultAnswer) && supersetOf(auto1.continuation, auto2, defaultAnswer) if start2 is lookahead: lookaheadAuto = {start2.element, start2.endOfLookaheadState, auto2.allowPrefix} continuationAuto = {auto1.startState, auto1.endState, true} return supersetOf(continuationAuto, lookaheadAuto, defaultAnswer) && supersetOf(auto1, auto2.continuation, defaultAnswer) if start1 has epsilon edge: return start1.successors.any(succ > supersetOf(auto1 with {startState = succ}, auto2, defaultAnswer) if start2 has epsilon edge: return start2.successors.all(succ > supersetOf(auto1, auto2 with {startState = succ}, defaultAnswer) assert both start1 and start2 have character edge return characterIsSupersetOf(successor1, successor2) && start2.successors.all(succ2 > start1.successors.any(succ1 > supersetOf(auto1 with succ1, auto2 with succ2, defaultAnswer) ) ) }
 contributes to

MMF2182 Help Java developers writing regexp running fast, with the correct amount of resources and really doing what developers intended
 Resolved
 depends upon

SONARJAVA3549 Add support for automatabased analyses for regular expressions
 Closed
 is depended upon by

SONARJAVA3550 Rule S5994: Regex patterns following a possessive quantifier should not always fail
 Closed

SONARJAVA3560 Rule S6002: Regex lookahead assertions should not be contradictory
 Closed

SONARJAVA3566 Rule S5855: Regex alternatives should not be redundant
 Closed

SONARJAVA3568 S5852 should use automata to increase its accuracy
 In Progress