SONAR-13365 we want to perform regular migration tests from LTS to a specific version. The goal of this ticket is to make that task as easy as possible for the team in terms of booking the resources, setup, configuration, analysis, log collection, and monitoring. The scope here is to build the first baby step toward an automated mechanism, so it's fine to handle only one database vendor+version, and only one dataset (from 7.9).
We want to be able to deploy&test any SonarQube version that was pushed to repox by the QA, by specifying the build number.
A snapshot of PEACH (internal environment used by the Languages team) while still running on the latest LTS will be used to initiate the environment.
The data must be sanitized.
- SQ non-DCE
- binaries are pulled exclusively from a CI built of sonar-enterprise master or a branch (in case of early tests)
- we probably want SQ to run on a dedicated host to avoid interferences (eg. CPU, IOs, network consumed by other program)
- Postgres DB
- must be initialized with schema in previous LTS format and data
- Developer want to control the configuration of SQ in sonar.properties
- eg: enable SQL logs
- How ever SQ configuration should contain the appropriate credentials to the DB
- doing it manually is too much error prone
- Developer want to control the startup of the SQ
- Eg.: run a command on the SQ host via SSH is fine
- SQ is started and will stop expecting DB migration to be triggered with a WS call
- Developer must be able to call that WS
- Calling that WS from the SQ host via SSH is fine
- Once DB migrations are running, developer wants to be able to track progress
- If things are running too slow already, developer want to be able to stop SQ
- progress is reported in SQ logs (web.log)
- SSH access is fine
- If SQ startup finishes successfully, developer wants to:
- See the logs
- Go to SQ’s UI and do calls to the WEB API to check stuff are working as expected
- SSH access is fine
- If SQ startup fails, developer wants to:
- See the logs to know what went wrong
- Troobleshoot (see below)
- SSH access to logs is fine
The main bad thing expected to happen is things are running too slow or a failure to occur. So developers will want to be able to find out where things are slow or why they failed.
Developer will want to identify bottlenecks and to do that may want to:
- Look at SQ logs
- Look at SQ host metrics (CPU, memory, …)
- Look at SQ JVM metrics (could run out of heap)
- Enable GC logs can do the trick
- (lower priority) Install a JVM agent such as Glowroot
- exposes a WEB UI and developer would need to access the UI (make it public or allow ssh-tunnel access to it)
- Look at database metrics
- CPU, memory, free disk
- Ideally get info from the DB on “top consuming” / “slow” / … queries
- Be able to run queries against the database, run explain plan on SQL queries, …
- This requires access to the DB, ideally remotely with a client such as DBeaver
- (lowest priority) Look at network traffic between SQ and the database
To investigate failures, developer may want to:
- Do all as if investigating slowness (see above)
- Download heap dump in case of OOM
- Look at data in database
- Ideally with a client such as DBeaver
Whether it is because of a failure or to try and speed up things, developer will do new attempts which may involve:
- Change SQ configuration (sonar.properties)
- Use new SQ binaries
- Restart from scratch (ie. restore initial DB) or resume from where previous attempt failed
- (last on the list) Run with a debugger attached
When investigating a slow/failing DB migration in a series of DB migrations, developer will want to speed up test rounds by not running all the DB migrations before that one.
This can be achieved by taking a snapshot of the database and data up to the failing/slow DB migration and between each attempt to restore that snapshot rather than the original one.
When testing migrations to an intermediary version of SQ, the whole environment and data can be dropped.
When testing migrations to the new LTS, however, we will want to keep a snapshot of the DB (schema and data) for the DB migration tests of the next LTS.
Binaries and data used for the tests are partially sensitive, yet the environment should be accessible only to SonarSourcers. It would be ok if only SQ team members had access.
Dataset must be sanitized:
- remove webhooks
- remove personal and sensitive information
As migration tests won’t occur on a very high basis (2 to 3 times a year), good documentation must be provided for:
- features and their limits
- credentials required and where to find them
In order to solve this Ticket the following technology stack is chosen:
- AWS CloudFormation
The following preparations are expected on the Host system:
- aws cli installed and configured
- ansible installed
- POSIX compliant shell
- ssh key present in aws
The provided Template will create the underlying infrastructure and some security rules:
- Create VPC
- Create EC2 Instance
- SSH-Key ID has to be defined by the user
- Default to 5xl.large
- Create RDS from snapshot
- Default to db.m5.4xlarge
- Enable RDS Insights
- Create Security Groups
- Allow ssh from user defined CIDR
- Allow access to SQ Web-UI from anywhere
- Allow access to Database from EC2 CIDR
- Deny everything else
- Outputs the FQDN to connect to EC2
- Outputs the FQDN to connect to Database from EC2
After the Infrastructure is present (can take some time), ansible will provision everything
The provided Ansible setup will include a playbook and a role to set up sonarqube EE on an EC2 instance and provision the database connection against a restored snapshot of RDS. It will try to get all the necessary information from aws.
The Role will:
- Download an SonarQube Zip-File from Artifactory
- Prepare the host environment
- Install tooling
- Create User
- Sonarqube user
- Ssh user
- shared by all developers
- sudoer root
- Create Pathes
- Install Dependencies
- Openjdk (latest 11)
- Prepare Kernel Parameters for Elasticsearch
- Extract SonarQube to a specific path
- Prepare sonar.properties
- Connection to Database
- Enable GC logs
- Dump on OOM to log directory
- (nice-to-have) Open debugging port (non blocking)
After the Provisioning is complete (RDS on base snapshot + SQ up and running expecting WS call to start DB migrations) an initial snapshot of the infrastructure is created as a first safepoint to fall back to.
Every snapshot after this has to be triggered by the user. Documentation should provide a nice walkthrough on how to do it.
The Infrastructure will be setup on demand via a bash script with the following steps:
- Apply cloud formation template
- Feed ansible inventory with information from aws
- Provision EC2 instance using ansible
After that the User is expected to connect to the EC2 Instance via ssh and start with the Performance Testing.
The majority of the required metrics can be found in the AWS Console and/or in the specific log files of sonarqube.
A manual connection from the user to the database directly is possible via tunneling through the EC2 Instance => update documentation about tunneling
Update Ansible configuration and run it => update documentation
Manual operation with AWS console => update documentation
(nice-to-have) automate things with script and Ansible
The Cloudformation stack can be destroyed via a similar script as the one that will be used for the creation of the infrastructure
In order to secure the access to the database and the ec2 instance where sonarqube will be running, an (in aws) existing ssh keypair has to be used in order to access the ec2 instance via ssh.
On top of this regulation a security group will be defined to manage access only from a limited ip address (default value to offices IPs).
This security group can be modified afterwards in the aws console or via a changeset.
Username/Password login will be disabled.