As an ops, when I want to operate a SonarQube Cluster, what is the procedure to setup and then safely start/stop such cluster without any fear to break anything ? (monitoring the cluster is covered by another MMF). Thanks to
MMF-867, if in worst case all nodes nodes are hardly stopped, we can rely on a recovery mechanism to guarantee the consistency of the data stored in the Database and in Elasticsearch.
Each node part of a SQ cluster must declare the following properties:
- sonar.cluster.enabled=true to activate the cluster mode
- Optionally sonar.cluster.name=xxxx to prevent mixing the nodes of several clusters installed on the same infrastructure. When a node tries to join a cluster not having the same name, the node should stop with a meaningful error message.
- All the nodes part of the SQ cluster should be then listed with the property sonar.cluster.hosts=ip1:port,ip2:port2,ip3:port3,ip4:port4,ip5:port5
- Optionally sonar.cluster.node.host=ip2 and sonar.cluster.node.port=port2 to specify the ip and port to be used by the edited node.
- Optionally sonar.cluster.node.name=Mickey to specify the name of the node.
- Part of SQ cluster, there must be two types of nodes : Application and Search nodes. An application node contains both a Web Server and a Compute Engine whereas a Search node contains only an ElasticSearch node. So the next property to be defined is either sonar.cluster.node.type=application or sonar.cluster.node.type=search
- sonar.cluster.search.hosts=ip_search2:port_search2,ip_search3:port_search3,ip_search4:port_search4 to specify the way to access to the Search nodes, where the ip and port used by the each search instance are specified with the existing non-cluster properties: sonar.search.host=ip_search2 and sonar.search.port=port_search2.
- sonar.auth.jwtBase64Hs256Secret is mandatory on application nodes, so that web sessions are distributed among the cluster.
From there, it's the responsibility of the ops team to start/stop a SQ cluster:
- Start first all the Search nodes with a minimum of 3 nodes
- Start then the Application nodes with a minimum of 2 nodes
- Stop the Application nodes
- Stop the Search nodes
- The following bug/limitation must be fixed part of this MMF: the web leader nomination is currently broken in a SQ cluster when we stop all Application nodes and we start some new ones without first stopping/starting all the Search nodes.
- When stopping a Application node, we're already waiting up to 1 minutes to complete all the pending analysis report computations. But during this period we must also not start any new computation and this behavior is missing.
- No safeguard to prevent some SQ nodes using different SQ versions or different SQ plugins to be attached to the same SQ cluster
- No safeguard to prevent an ops from doing a risky action like stopping first the Search nodes and then the Application nodes.