Uploaded image for project: 'Product Roadmaps'
  1. Product Roadmaps
  2. MMF-808

Project measures must not be misleading when SCM info is missing



    • Type: MMF
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Labels:



      When SCM info is missing, the user wonders why the project home page is missing 'Coverage on New Code' and 'Duplication on New Code'.
      More annoying, the values of several project measures and of the Quality Gate are misleading:

      • In the Measures page, the metrics which require new/updated lines to be detected are always set to 0:
        • Lines/Conditions.. on New Code
        • Duplicated Blocks/Lines on New Code
        • Lines on New Code
        • Technical Debt Ratio on New Code'
      • As a consequence, developer can inject a big technical debt during the leak, but 'Maintainability Rating on New Code' will always be 'A' and the Quality Gate can always be green.


      We could of course clarify in the UI that SCM information is missing but that's not enough and we must also find a way to compute the missing measures and ratings even when SCM information is not available.

      Without SCM info, we are basically missing the creation date for each line in order to know whether a piece of code is new or not.
      A solution can be to have our own mechanism to find, or approximate if needed, the code creation date.

      When SCM info is missing on a file
      The Compute Engine will use a differential algorithm to track the code that has been added/updated/removed:

      • First analysis of the project: All source files and corresponding lines of code are considered as new.
      • Subsequent analyses:
        • For a new file: All lines of code are considered as new.
        • For an updated file: We do a diff to find the code that has been added/modified.


      • When the analysis finds that a piece of code is new/updated in a file:
        • If we didn't have prior blame info for the file: we use the analysis date as the last modification time of changed lines. We will then use this date to identify which lines have been added/updated since the start of the leak period.
        • If we previously had blame info for the file: we also update the last modification time of changed lines with the analysis date, overriding the SCM date. But we drop the author and commit id data to not keep incorrect information of the changes.
      • When no change is detected on a file, nothing to do. We keep the blame info we possibly have from a previous analysis.

      When we have SCM info for a file
      We use the info we get from the SCM, even in the case we didn't have prior blame info for the file. Indeed, if the file has not changed but we now have SCM info, the analysis reports blame info.

      Fix the Leak period

      Currently, the data relative to the leak period is identified by comparing the date of issues/lines with the start date of the leak period: If the date of an issue or a source code line is exactly the same as the start date of the leak period, it is currently considered as part of the leak period.
      This is not correct in the case of an analysis that is the base for the leak period: it should not contribute to the leak. When SCM is present, this problem is not noticeable since typically the SCM date is be well before the date of the analysis that defines the start of the leak period. When no SCM is present, the generated dates correspond exactly to the analysis time that starts the leak period, and this problem becomes an issue. It hsould be fixed with this MMF.


      When upgrading from a previous version of SonarQube
      The code that was already analyzed in the past should not be considered as new (we don't want to reset the leak).
      And, because we don't have any date for all the lines previously analyzed, the number of lines that we will consider as added/updated during the leak won't be correct. And that will be case until the leak period starts after the day of the upgrade.
      Nevertheless, we should start computing the missing metrics with the info we have from that point of time.

      • Benefits: Waiting for perfectly knowing all the lines that are added/updated in the leak, we'll start displaying the metrics based on the lines that are changed starting from now And, that way, we'll prevent the 'Maintainability Rating on New Code' from always being considered as 'A'.
      • Drawback: During this period of time, a few added lines with, for example, a bad coverage can be sufficient to turn the Quality Gate to red.

      Moved out of the scope

      When an analysis is done without the collecting the SCM info (as soon as a file has no SCM info), we could display a warning in the issues page explaining that "Since SCM information are not available for some sources, corresponding issues can't be automatically assigned to their author".


          Issue Links



              christophe.levis Christophe Levis
              christophe.levis Christophe Levis
              1 Vote for this issue
              7 Start watching this issue