A failure can happen while webservices are being called. And depending on the nature of the problem and when it happens, different mechanisms have to be applied to guarantee that the data modified by those web services remain consistent, both in the Database and between Elasticsearch and the Database.
With this MMF, different patterns will be implemented on users, rules, qualityprofiles webservices to guarantee the consistency of Elasticsearch with the Database.
A problem happens after the Database has been updated but changes are not yet / not fully indexed into Elasticsearch
The application has to properly re-index the data later on so that new values become visible for the client. But, using systematically an asynchronous indexation is not, for the time being, applicable. The UI doesn't support eventually consistency and needs the webservices to return an answer only when the information is properly indexed.
As a consequence, the application has first to try to synchronously index the data and then, in case of failure, to apply some background recovery mechanisms to guarantee that index will sooner or later be updated:
- Level 1: Retry
The Elasticsearch client retries to index the data during 5 seconds. The request is blocked until indexation has succeeded or until 5 seconds have passed.
- Level 2: Recovery
While the application updates the Database, it also uses the same transaction to store the corresponding operation into a recovery log.
The web server runs a background recovery process that tries to periodically replay the operations that are still listed in the log after a while (5min). In the case of a cluster, each web server runs a recovery job and the log is shared between all the servers.
If the recovery process faces an issue when replaying many of the operations, the recovery can be delayed to the next run to prevent increasing again the pressure on the system.
Of course, when a critical failure keeps going on, SonarQube needs to be stopped. When SonarQube is restarted, recovery process automatically replays the operations from the log.
A problem happens during full indexation
Elasticsearch data may need to be fully deleted, for example for an upgrade or if the physical data is corrupted. And, in that case, when SonarQube starts, the server fully restores Elasticsearch indexes from the Database.
But this indexation may also be interrupted. To solve this problem, the application keeps in the Database the information of which index is currently being restored. And when SonarQube starts again, the web leader doesn't only restore the indexes that are empty, but also drops and feeds again the indexes for which a restoration was ongoing.
A problem happens before or while the data is pushed to the Database
The application relies on a transaction to guarantee that the changes are properly saved in the Database or that the Database is not updated:
- For a small set of changes, updates are persisted into the Database within a transaction
- For a bigger set of changes, new values are first stored into a temporary column and then copied, within a transaction, to the final column.
If the transaction fails, the request returns an error and the client knows that the changes are not persisted. The request can then be retried later on.