If a Compute Engine crashes/is force stopped while a task is ongoing, the task status remains "in progress". This is likely to happen if the server crashes or is rebooted.
When SonarQube is not working in cluster mode, generally, there's no issue: if the server crashes, the task is considered as in error or even restarted when SonarQube restarts. And the next analysis of the project is, as a consequence, correctly handled.
But, when SonarQube is working in cluster mode, if a node crashes while executing a task, the task will be considered as in progress until all the cluster nodes are stopped and restarted. And this will prevent any other analysis to be computed for that project.
Also, even when SonarQube is not in cluster mode, in some few cases e.g. if there is a short DB disconnection, a worker can fail to complete a task and to mark it as done. SonarQube is still operational, but the task is wrongly flagged as 'in progress', preventing any other tasks to be handled for that project.
The mechanism should be reentrant and should reset the status of stale "in progress" tasks so that an ops doesn't need to restart SonarQube for the tasks to be back to pending and for another Compute Engine to handle it.
- Each task will be associated with a unique id of worker so that we can know which worker is supposed to handle each task.
And workers ids should be different when a Compute Engine restarts to not think that stale tasks are still handled by new workers.
- To be able to know at a point of time which workers are down, workers will regularly share their status.
The solution will rely on Hazelcast when in cluster mode and on a in-memory info when not in cluster mode.
- Periodically, e.g. every 10 min, SonarQube will look at "in progress" tasks for which there's no associated worker or the associated worker is down. And each of those stale tasks will be put back to pending.
- Nice-to-have: In addition, when in cluster mode, this check can also be performed when the cluster gets the info that a node has just left.