In the case of COBOL or RPG, depending on the format of the files that are analyzed, SonarQube can incorrectly consider some issues as new.
And the same will apply to lines of code: with the support for project measures on new code when SCM info is missing, SonarQube can think that some old lines are now new or have been updated.
Indeed, when new lines are added to a file, the content of margin may change for some old lines whereas the corresponding code was not changed. That's the case, for example, when the margin contains lines number. In such a case, SonarQube is not able to know that the code on this line is the same than in the past.
Our COBOL and RPG analyzers can already ignore margins in order to properly parse the file. A fixed format can be defined with a project parameter but also, in the case of COBOL, the format can change within the file itself.
SonarQube should also ignore those margins, at least in the case of a file using a fixed format, to correctly detect which lines and issues are new.
As a first step, the COBOL analyzer will be the only one that will benefit from those improvements.
- SonarQube API should give the possibility for analyzers to specify what is the significative part of the lines in a file. The API will look like the one for CPD tokens:
- If the API is not called for a file -> all entire lines are considered as significant
- Basic rules for the API:
- addRange can be called only with range spanning over a single line
- addRange can not be called twice for the same line
- if addRange is not called for a given line, the line is considered as NOT significant (or the significant part is empty)
- The compute engine keeps computing the raw file and the corresponding hash to be able to detect if the file has changed. Meaning that, if only the margins change, the file is still considered as modified.
-> The position of the issue in the line still refers to the raw line that is passed to SQ, and the display in SQ keeps working the same way.
- Based on the info that is optionally provided by the analyzer, the scanner sends for each line the range of significant code for issue tracking/new line detection (ie without considering the margins).
-> New lines detection and issue matching mechanisms still rely on line hash, and this one changes only when the code is really updated.
- SonarQube stores a flag in DB, in the FILE_SOURCES table, to indicate whether the line hashes refer to a line with a significant code range. It can be called 'SIGNIFICANT_CODE'.
- If a file doesn't have this flag (or the flag is set to false) but that a significant code range is included in the report from the scanner, SonarQube
- calculates line hashes in the old way, not taking into account the significant code range but using the whole line for all steps in the compute engine (file move detection, issue tracking, no scm diff).
- calculates and persist line hashes taking into account the significant code range when persisting it, and set the flag to true in DB.
- On first project analysis after upgrade:
- All lines hashes are updated in DB even for the files that have not been modified.
- If a file hasn't been modified, as the file hash is still the same, lines are not considered as new and issues remain unchanged.
- If a file has been modified but margin doesn't change, since we compare hashes for the whole line, new/updated lines and issues are correctly identified.
- Limitation: If a file has been modified but margin changes, we'll have the same drawback than in the past with the first analysis of the project: lines for which the margin has changed as well as issues in those lines are considered as new.