Uploaded image for project: 'Minimal Marketable Features'
  1. Minimal Marketable Features
  2. MMF-718

Remove webleader and webfollower constraint at start-up

    Details

    • Type: MMF
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Labels:

      Description

      Context

      Now that SonarQube can be configured as a cluster, we want our customers to be able to use that feature in-premise.
      And, as an ops of a company who wants to have benefit from a cluster, I will have to deal with the following use-cases:

      • Start a new SonarQube cluster
      • Add a node to the cluster
      • Upgrade SonarQube
      • Upgrade a plugin

      For these operations,

      • Starting a cluster should not require complex and error-prone operations, such as starting the nodes in a specific order to deal with technical constraints,
      • SonarQube should provide safeguards to prevent from accidentally upgrading the cluster or from having an undefined/unstable status.

      With this MMF:

      • We don’t want to provide, as much as possible, “yet another” tool to start/upgrade the cluster,
      • We don't want to provide a way to centralize the configuration.

      Current Procedure

      For the time being, there are some technical constraints when setting up and updating a SonarQube cluster.
      Typically, the current procedure to install or update either a plugin or SonarQube itself is the following:

      1. Stop Web and CE nodes (if started),
      2. Stop ES nodes (if started),
      3. Update nodes with plugins and/or SonarQube version (the same on all nodes),
      4. Start the ES nodes,
      5. Start the webleader node,
      6. Wait the webleader to be up (using /api/system/status, which does not answer continuously, for example when indexing in Elasticsearch),
      7. Look at /api/system/migrate_db if status=="DB_MIGRATION_NEEDED", if so launch /api/system/migrate_db and wait for /api/system/status to be "UP",
      8. Start the webfollower and CE nodes.

      For SonarQube.com, everything is handled by Ansible to:

      • Ensure that the exact same set of plugins are installed on each web and CE nodes,
      • Ensure that the exact same SonarQube version is installed on each node,
      • Ensure that ES + web leader is up and running before starting any other nodes.

      This is fine for SonarQube.com. However, for the cluster on-premise, the same can't be reused for the following reasons:

      • Customer would need to automate the same process. It's time-consuming and error-prone for an end-user,
      • We cannot really share our Ansible role for SonarQube since it only covers Ansible and our way of handling servers in AWS. We cannot maintain a script for all other automation tools (Puppet, Chef, etc).

      We need to simplify the way of starting/upgrading nodes for the on-premise customer before delivering this feature.

      Expected procedures

      As an on-premise ops, I would like to start any SonarQube nodes and let the cluster handle the initialization phase. So I'll be able to start the nodes without any specific order whatever the operation.

      When I start a node:

      • If the cluster is not initialized and down -> Setting up a new SonarQube cluster,
      • If the cluster is initialized and down -> Starting again the cluster or Upgrading SonarQube or a plugin,
      • If the cluster is initialized and up -> Adding a node to the cluster.

      Setting up a new SonarQube cluster (fresh install)

      For a new install with an empty database, as an on-premise ops,

      • I start all the nodes (Web, CE, ES),
        • A Web node detects that the initialization needs to be done, takes the leadership and performs the initialization phase,
        • The other Web and CE nodes detect that the initialization has started and are in stand-by during this phase,
      • When the initialization phase is successfully completed, the other nodes start without the need of initialization.

      Possible errors:

      • If a node is already active before the initialization is done, all nodes stop, unlikely to happen
      • If initialization fails, all CE and Web nodes stop,
        There's a real danger in stopping a node because of a failure on another one:
        If initialization fails, the corresponding Web node stops in error. Other nodes are not impacted.
      • If initialization is successful but a node doesn't have the same configuration, this node stops.

      Starting again the cluster

      Once my cluster is set up, if I want to start once again my cluster:

      • I start all the nodes (Web, CE, ES),
      • In case when ES indexation needs to be finalized, a Web node takes the leadership and complete the initialization,
      • When the initialization phase is successfully completed, the other nodes start without the need of initialization.

      Upgrading SonarQube or a plugin

      If I want to upgrade the version of SonarQube, add a new plugin or upgrade the version of a plugin,

      • I stop all nodes (Web, CE, ES),
      • I upgrade all the nodes (either SonarQube and/or a plugin),
      • I start all the nodes again (Web, CE, ES),

      The process is then similar to "Setting up a new SonarQube cluster": a Web node detects that the initialization needs to be done again (DB requires to be updated), takes the leadership and performs the initialization phase,
      The error cases are also similar.

      Adding a node to the cluster

      If I want to extend my cluster with an additional node,

      • I don't need to stop the existing nodes. Once my node configured, I simply start it,
      • The node detects that the cluster is initialized and up and, if the node has the same configuration than the cluster, it starts.

      Possible errors:

      • If the configuration of the new node is not the same than the cluster, the node immediately stops.

      Procedure for a single SonarQube instance

      There is no modification of procedure for a single SonarQube instance.

      What is the initialization phase?

      The initialization phase is :

      1. setup the database if not initialized,
      2. upgrade the database if required,
      3. launch the initialization of plugins (rules, quality profiles),
      4. launch the reindexation of ES if needed.

      Proposed solution

      For the case of on-premise cluster, we have to deal with the following issues:

      • When the initialization phase should start?
      • Which started node must initialize the back-end (DB and ES)?
      • What to do when the node does not have the same configuration than others?

      Let's took each kind of node and look what to do :

      In the following paragraphs, the term same configuration is defined by same SonarQube version and the exact same plugins.

      An ES node

      This is the simpler node : An ES node always starts
      Here there is no check but the version of ES may be important to detect

      A CE node

      A CE node must check if there is web node with the same plugins that have been started, and wait until this condition is true.

      A Web node

      A Web node is the node that will perform back-end initialization.

      Technical requirements

      To achieve this startup, we have to :

      • have a global lock in order to have only one elected Web node,
      • ability for all stand-by nodes (Web and CE) to be notified when the initialization is failing or successful,
      • store the current version of installed plugins and SonarQube,

      A good way of achieving this will be to use a distributed cache like https://ignite.apache.org/ or https://hazelcast.com/

      Behaviour

      • If there is a crash/error during the initialization phase, all the Web and CE nodes are stopped (ES nodes are not stopped). no auto recovery
      • If there is a difference of configuration for a specific node when it tries to join a running cluster, it stops.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                christophe.levis Christophe Levis
                Reporter:
                eric.hartmann Eric Hartmann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: