Uploaded image for project: 'SonarQube'
  1. SonarQube
  2. SONAR-12928

Persistence of live measures is a performance hotspot on Oracle and SQLServer

    XMLWordPrintable

    Details

    • Edition:
      Community
    • Production Notes:
      None

      Description

      Problem

      Customers running large instances on Oracle or SQLServer are complaining that background tasks takes too long.

      Upon analysis, it appears that the step responsible for persisting live measure is taking a significant share of that time (68 to 74% according to some samples analyzed by services).

      See this discuss thread for details on feedback from Services.

      This problem goes unnoticed at SonarSource as we are running all our instances (Next, Peach and SonarCloud) on PostgreSQL.

      Current state

      All live measures are persisted in an "upsert" approach. 

      Either the Database provides an UPSERT feature or SQ simulates that behavior.

      Simulation is implemented by first doing a regular UPDATE and then, if that SQL statement returns "0 row updated", doing a regular INSERT.

      Finally, it deletes all live measures which are now irrelevant.

      As of today, only PostgreSQL provides an UPSERT feature in all the versions SQ supports. Therefore, "upsert" is simulated on both Oracle and SQLServer.

      Limitations of the simulated "upsert"

      In the best case scenario, if there are X live measures, X+1 SQL statements will be performed (X UPDATE + 1 DELETE). In the worse case scenario (eg. first analysis), 2X+1 SQL statements will be performed (X UPDATE + X INSERT + 1 DELETE).

      Because we are doing UPDATE statements and need to check how many rows have been updated, batch mode provided by SQL drivers can't be used.

      This implies that:

      1. a network round trip must be performed for each SQL statement
      2. DB can't optimize multiple statements put together in sequence

      Note that the number of live measures increases with the number of components (files and directories) in the project. The number of live measures per component is roughly the same for all of them.

      Put together, this means that the duration of this step increases with:

      1. the size of the project (big variation possible)
      2. the number of measures per component (small variation)
      3. the network latency
      4. the performance of each UPDATE and INSERT into table LIVE_MEASURES. This performance is affected by:
        1. the size of the LIVE_MEASURES table
        2. out of date table indices
        3. DB load which may come from many concurrent background tasks (per node and/or SQ DCE edition)
        4. possibly some DB tuning, hardware limitations, ...

      Possible solutions

      Reminder: goal of this step

      Live measures represent the current state of measure of a branch/PR of project. They can be updated by the analysis when a new report comes in but also as user interact with the UI (eg. resolving an issue).

      The goal of the step persisting live measure is to persist the last version of the measures of the project computed by the analysis.

      The analysis measures are the one truth about the state of the project when they are persisted. All live measures at the end of the step must be the one of the analysis, this implies that:

      • no other live measure is left in DB
      • any other live measure is overwritten to match the one of the analysis

      This is important as this guarantees there is a mean for live measures of a given branch to be "eventually consistent".

      Options

      1. Upsert approach

      It's particularly efficient with PostgreSQL: 

      • Up to 100 measures are "upserted" with a single statement
      • batch mode can be used
      • there is no worse case scenario (such as first analysis when simulating upsert)

      On the opposite, simulating Upsert is quite inefficient (see "Limitations of the simulated "upsert"" paragraph).

      SQ supports several DB vendors and several version for each of them.

      Currently, as long as a single version of a given vendor does not provide an UPSERT feature, it won't be used at all for this vendor.

      This is technically challenging, but Upsert could be supported for specific versions of a vendor.

      Note: since there is no standard, it's also possible that Upsert features may change from one version to another of the same DB

      2. delete all + mass inserts (per component)

      Basic idea: for X live measures do 1 DELETE + X INSERTS. The delete could be done once for the entire branch, or once per component.

       Pros:

      • no worse case scenario
      • simple
      • can leverage batch mode

      Cons:

      • may be hard on storage management and "garbage" collection on DB side
        • argument does not apply on Postgres as an update is implemented as a delete + an insert 
        • the situation with other DBs is unknown
      • still directly factor of the number of live measures

       

      Results: batch mode performs very well, and this option brought significant improvements in the execution time without much added complexity. There wasn't a significant change in the execution time between

      a) Running batch mode across all components (letting auto commit be done every 250 ops) with a single delete for the entire project

      b) Doing a delete per component and a commit after all insertion per component

      So option b) is clearly preferable as it creates a single safe transaction per component.

      3. Query existing measures and insert or update accordingly

      The idea here is to work with batches of measures (managed by the step itself).

      For each full batch, first query what measures currently exist in DB (for pairs of component/metric). Based on the result, update existing measures and insert non-existing measures.

      Results: The select is quite slow comparing to deletes and inserts. Also doing a select is not compatible with batch mode. Overall execution time was quite slower than 2.

      4. Multi-thread insertion of data

      Assuming that most time is spent waiting for I/O, using multiple threads allows the CE the do more useful work, even while some of the threads are just waiting for I/O. 

      The improvement of the execution time was very significant (proportional to number of threads).  Also the added code complexity is very manageable.

      However, this assumes that the DB is not overloaded. If the DB is overloaded, this approach will make things worse, since it will hit the DB with more data concurrently.

      Conclusions

      Option 2 had great results and it's easy to implement and maintain.

       

      Benchmark

      Execution time of the CE tasks in ms, comparing master and master + SONAR-12928 + SONAR-13093.

       

      Oracle 12 / moodle 1st analyis master 2nd analysis master 1st analysis w/ tickets 2nd analysis w/ tickets
      persist live measures   1102123 181800 252107
      persist issues   38849 21042 1673
      visitors   68124 99599 86770
      duplication   10160 8415 9484
      sources   15625 141748 17749
      TOTAL  **  1254151 495540 385681
               
      Postgresql / moodle 1st analyis master 2nd analysis master 1st analysis w/ tickets 2nd analysis w/ tickets
      persist live measures 46915 56331 44740 48725
      persist issues 53708 38155 6202 17
      visitors 32506 25449 23119 19814
      duplication 653 624 885 713
      sources 37228 15300 36534 14385
      TOTAL 188049 149829 128747 98087

       Future improvements

      • With Oracle there is a difference of ~70s when persisting live measures between first and second analysis. This is the overhead of deleting measures before inserting them again and could be avoided if we find a way to do a real upsert with Oracle
      • With Oracle JDBC drivers, every time we get a DB connection from the pool, it pings the DB to check if the connection is still valid. During the visitors step, this is done dozens of thousands of times and introduces a delay of ~30s. This can be avoided by sharing the DB connection during this step.

       

       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              duarte.meneses Duarte Meneses
              Reporter:
              sebastien.lesaint Sebastien Lesaint
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Due:
                Created:
                Updated:
                Resolved: