Many false positives are currently generated.
The idea is to use the same principle as on COBOL, and to work on block of lines instead of lines (because each line is about 4 tokens max) to avoid to generate false positives, and set a threshold on the normalized (with respect to the block length) probability.