Uploaded image for project: 'SonarQube'
  1. SonarQube
  2. SONAR-12689

Separate storage of projects/views from their components and branches

    Details

    • Type: Task
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Database
    • Labels:
      None
    • Edition:
      Community
    • Production Notes:
      None

      Description

      WHY

      The PROJECTS table

      Table PROJECTS is arguably the oldest table of the SonarQube schema.

      Each row in this table has a UUID and an ID (for historical reasons). Both are a unique technical identifier of the row.

      Each row in this table has a SCOPE and a QUALIFIER which are used to distinguish which kind of data that row represents. We will go over them.

      Projects/Portfolios/Applications

      These have SCOPE=PRJ and then, depending on the qualifier:

      1. TRK: a Project
      2. VW: a Portfolio
      3. APP: an Application

      Components

      Depending on whether we are talking about a Project, a Portfolio or an Application, the name "Component" is referring to different things.

      Components are always arranged into a tree with a single root.

      So, Components are:

      • for a Project: the root of the tree, directories and files
      • for a Portfolio: the root of the tree, sub portfolios, references to a Project
      • for an Application: the root of the tree, references to a Project

      Currently, a Project/Portfolio/Application and the root of the tree of components of the main branch of that Project/Portfolio/Application are represented by the same row with SCOPE=PRJ and QUALIFIER=TRK/VW/APP. This implies that the root of the component tree and the Project/Portfolio/Application have the same UUID/ID.

      Note: Portfolios currently have no concept of branch but from all implementation aspects, it actually has a main branch (and no support for any other branch is implemented, yet)

      Branches

      Table PROJECT_BRANCHES holds the properties specific to a Project/Application branch. It has a UUID column but this UUID is unusual from a SQ database design point of view as it is not specific to the PROJECT_BRANCHES table and rather follows a specific rule:

      the UUID of a row in table PROJECT_BRANCHES must be the same as the UUID of row in table PROJECTS with SCOPE=PRJ and QUALIFIER=TRK/VW/APP.

      Because of this rule, since the root of the component tree and the Project/Portfolio/Application have the same UUID/ID, then this UUID can actually be the one of 3 different entities:

      1. a Project (qualifier=TRK), a Portfolio (qualifier=VW) or an Application (qualifier=APP)
      2. the main branch of that Project/Portfolio/Application
      3. the root of the tree of components of a branch of that Project/Portfolio/Application

      There are many places where that UUID/ID is stored across the database.

      Since the UUID/ID is the same for all 3 entities, it currently relies on naming and/or functional modeling and/or happy-inoffensive confusion and/or experience to know which entities is actually referred to.

      The problem

      The situation above works. It built up as SonarQube evolved over the years and it reached that state mainly because of small staking increments which allowed to provide new features without the cost of big redesign of the database.

      However, we have reached the point today where we want to implement new features such as MMF-1335 "Allow a different, existing branch to be designated the Main Branch".

      In MMF-1335, we are facing the problem that the Project and it's main branch are basically the same row in the database. Changing one with changing the other is impossible. We therefor have someone represent both entities with a different data in the database.

      Also, this situation is creating confusion, not helping new developer integration, ... In other words, this is technical debt which we have had for a long time but it getting costlier and costlier.

      The goal

      The point of this ticket is to separate the storage of the three entities:

      1. Project/View/Application
      2. branches
      3. component trees

      The target is to do so in the cleanest way possible and pay the accumulated design debt of the database instead of adding some more design debt as we've been doing in the past

      WHAT

      New table PROJECTS and renamed table COMPONENTS

      Projects/Portfolio/Applications will be stored in their own specific table.

      By the lack of a better name and because it's currently the practice to use the word "Projects" to designated Projects, Portfolio and Applications, that table will be named PROJECTS.

      Components are currently stored in a table called PROJECTS.

      This table has to be renamed to COMPONENTS:

      1. because the name is more consistent with the data it will hold from now on
      2. to free the table name PROJECTS for the new table

       can we find another name than PROJECTS for the new table. Reusing the name both creates a confusion and a risk to forget to change some SQL requests and have bugs

      In all text below, PROJECTS will refer to the new table which holds Projects/Portfolio/Applications 

      Branches

      Table PROJECT_BRANCHES currently holds the data specific to branches but is considered as a child table of the current table PROJECTS.

      This table should now become a child table of the new table PROJECTS.

      This distinction is important because it implies that rows can now be created in table PROJECT_BRANCHES without any row created in table COMPONENTS.

      Dropping the "branch UUID=component tree root UUID" rule

      Dropping this rule is a nice to have as far as we can tell from the know incoming features for SonarQube.

      To drop this rule,  what is the UUID of the root of components of each branch will have to be stored explicitly.

      Clarifying the references

      As mentioned before a given UUID stored in a table can indistinguishably be the one of the Project/Portfolio/Application, or the one of a Branch, or the one of the root of a Component Tree.

      It was ok because all three where UUIDs in the same table.

      Since this won't be the case anymore (it can be a UUID of table PROJECTS, COMPONENTS or PROJECT_BRANCHES), which one is actually refereed to must be clarified, at least in the Java code and documentation, at best by renaming columns.

      HOW

      Changes and impacts study

      This ticket is the conclusion of investigations on the implementation of MMF-1335

      During this investigation, a document was created to list the changes required to implement that MMF by creating a new table PROJECTS.

      This document holds very good tracks on:

      • the impacts of now storing Projects/Portfolios/Applications in a dedicated table (both DB and Java)
      • clarifying the references

      Note that dropping the "branch UUID=component tree root UUID" wasn't considered during this investigation.

      Remove usage of PROJECTS.ID

      In DB

      Tables USER_ROLES, GROUP_ROLES and PROPERTIES are referring to the table PROJECTS by its ID

      These tables are really child tables of the Project/Portfolio/Applications. Since the new table PROJECTS will not have an ID, these tables must be changed to use the UUID instead of the ID.

      The new column should be named PROJECT_UUID.

      In Web API

      Parameter projectId of WebService api/qualitygates/deselect must be dropped (there are deprecated since 6.1). 

      The same parameter exists for WebServices api/qualitygates/select and api/qualitygates/project_status but it's not been flagged as deprecated.

      According to implementation, name of the parameter is misleading and does not take an ID but a UUID.

      Nice to have: deprecate parameters projectId from select and project_status WebServices.

      Create new table PROJECTS

      New table PROJECTS must hold the properties specific to Projects:

      • UUID (from PROJECTS.UUID)
      • type: enum PRJ, VW, APP (from PROJECTS.QUALIFIER, with TRK renamed to PRJ)
      • key (from PROJECTS.KEE)
      • private flag (from PROJECTS.IS_PRIVATE)
      • tags (from PROJECTS.TAGS)
      • description (from PROJECTS.DESCRIPTION)
      • creation date (from PROJECTS.CREATED_AT)

      Nice-to-have: add a technical UPDATE_AT column (all tables should have technical updated_at column if they can be updated. key, tags and description can be updated)

       should the project-specific columns in old table PROJECTS be dropped?

       column PROJECTS.DESCRIPTION only holds a Project/Application/Portfolio description. WS api/components/* should be updated. Nice-to-have: column PROJECTS.DESCRIPTION could be dropped

      Rename referencing columns

      Renaming the columns holding the UUID of a Project/Application/Portfolio or a branch or a root of a component tree is not necessary but will greatly help future maintenance and make code clearer.

      Base names for columns and Java variables would be:

      • a project/Application/portfolio: PROJECT_UUID and projectUuid
      • a branch: BRANCH_UUID and branchUuid
      • the root of the component tree: ROOT_COMPONENT_UUID and rootComponentUuid

       Renaming the DB columns is a nice-to-have both because we can live without it as long as variables are correctly named and documented in Java code and because it will add extra DB migrations which cost as to be assessed.

      Clean-up SCOPE and QUALIFIER

      With projects/portfolio/application moving to their own table, it doesn't make sense anymore to store that information in table COMPONENTS through the SCOPE and QUALIFIER

      Here is a recap of the scopes and qualifiers in use.

      Name Scope Qualifier
      Root view PRJ VW
      Sub view A PRJ SVW
      Sub view A1 PRJ SVW
      Technical project FIL TRK
      Project PRJ TRK
      Directory DIR DIR
      File FIL FIL
      Unit test File FIL UTS
      Application PRJ APP

      Using only the QUALIFIER would almost be good except that we don't want to distinguish between TRK, VW and APP anymore AND we don't want to maintain the same information in two different locations.

      TRK, VW and APP should be replaced by ROOT and column SCOPE can be dropped. Qualifier TRK would then only represent a project link.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                duarte.meneses Duarte Meneses
                Reporter:
                sebastien.lesaint Sebastien Lesaint
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated: