Currently, when users want to rely on AutoScan, the major problem is that they have no feedback at all from the service until a report is being processed by CE. This is quite late, and many things can happen before this (payload ignored because project not bound, scan failed, ...etc).
We want users to at least know that their project is currently being processed by autoscan or that it failed.
As a developer:
- When I push the .sonarcloud.properties file for the first time, I expect the Web UI to show me that:
- it is used and SonarCloud (autoscan) has started to process the analysis
- Then, every time I push on a branch or PR which has the file, I expect the Web UI to show me a couple of seconds later that SonarCloud has started to process the analysis of that new branch (which did not exist in SonarCloud before)
- E.g.: I push a new PR, I expect it to appear in SonarCloud just a couple of seconds later - even if technically the Core has not received any report yet so that we can display it's "analysis status"
- As soon as AutoScan has successfully started processing a "push", I expect SonarCloud to show me that it's in progress until:
- it's successful (whole analysis is done)
- it failed
- SonarCloud shows a generic failures message. This is a first baby step and failure message will get smarter with MMF-1766
- A part of what will be done in this iteration might be "compatible" with scanners run in CI mode => to keep in mind even if this is not the primary focus
In case of lost of status update (network issue, analysis killed, ... etc), we should avoid the system being stuck in an "in progress" state forever for a branch/PR: we will handle this once the system is deployed if it's really an issue and worth doing something.
The status messages will be displayed in the alert banner of a project, at the same place as the background task messages that we currently show while a task is processed or failed.
For the first analysis of a branch or PR we will also display a nice page that will display some explanation about what the analysis does, and some links to the doc. The banner will also be visible while on this page, so that users won't be lost once they don't see this page anymore but only the banner:
We won't stack banners, and if there is multiple things going on for a same branch :
- Always display the task that is the most advanced in the process
- Failure messages are only displayed if there is nothing in progress
When the analysis is successful and every steps are done, the banner disappear and the result is displayed automatically without the need to refresh. We don't show a success banner and we don't send emails either (unlike in the screenshot).
To be able to display the analysis progress of a new PR that is analyzed for the first time we will create a new empty PR on SC as soon as the PR has been stated to be processed by autoscan. After a page refresh, this empty PR will be displayed in:
- The branches dropdown with an "in progress" status, this will only be displayed for the first analysis (if there are multiple PR in this case, all the PR with the "in progress" status should be updated automatically once their first analysis is done, showing the QG badge)
- The branches management page, same here, we should automatically show the QG badge once the first analysis is done. To better display those "pending" PR we will drop the table headers and show "Analysis in progress" instead of the QG, last analysis date and actions. We also expect the status to be automatically updated once the first analysis is done.
- It can be opened and will display an empty page that is the same as the first analysis page of the project
- Explanation of what happens once done
- Links are different from the project empty page: to PR and branches doc and to QG for PR's doc
Same as for new PR, but the empty page will show the same links as the empty page for the default branch.
Do nothing in the background task page for now. Why ? The background task page deserve an MMF and a dedicated sprint in itself, a few improvement ideas that came up during brainstorming:
- Display status of tasks to all members of the org (without sensitive data)
- Display status of branches and PR (group task by branches and PR)
- Display duration of the in progress analysis in the status message.
- Decorate Github PR checks to display the in progress state of the analysis. This won't be done in this MMF.
- Detailed status message, including steps showing the progress of the analysis process instead of one fixed message. This improvement should be part of MMF-1766.
We introduce the concept of an Analysis ID, to track analyses through all the steps in their lifecycles, starting from the reception of a push event to a GitHub repository, ending in successful completion or a failure.
We also introduce a generic topic for events, with two types of events for now:
- Analysis status events: for example cloning the repo, running the scanner, computing measures, successful completion or errors.
- Emitted from AutoScan and Core domains
- Consumed mostly by a new Analysis Status Domain, that becomes the main authority on the status of analyses
- Core will listen on events marking the beginning of an analysis, in order to provision pull requests when necessary
Notes about consuming the generic topic of events:
- Following the existing pattern, use appropriate queues to consume the topic
- Configure AWS components appropriately to avoid processing messages unnecessarily
- Make sure that the workers processing their queue tolerate misconfiguration, and do not crash when they receive messages that were routed to them by mistake.
In projects analyzed by AutoScan, the Analysis ID is created by AutoScan. It is included in all analysis status events as the cuuid, and propagated through all lifecycle steps explicitly:
- The GitHub Filter generates a UUID and includes it in the message sent to the Worker Queue
- The Worker includes it as a -Dsonar.analysisId parameter for the scanner
- The Scanner adds the analysis ID to the protobuf report. If the parameter sonar.analysisId is not set then a UUID is generated.
- The Compute Engine reads it from the scanner report (when available) and uses it as Snapshot ID
- In the context of the Compute Engine, the Snapshot ID is effectively the Analysis ID
- Perform simple validation on the received Analysis ID
In projects not analyzed by AutoScan, in the short term, the scanner report will not contain an Analysis ID. In this case the Compute Engine generates a Snapshot ID as usual, and uses that as the Analysis ID when emitting events.
In the future, the Analysis ID can be created by the scanner engine.
The Core subscribes to lifecycle events having specific fields identifying the start of a pull request analysis. If the pull request does not yet exist, then create it.
No need to provision branches, because AutoScan doesn't support branches yet.
The WS api/project_pull_requests/list needs to be updated to return those empty PR. They should contain at least the following fields:
A new Analysis Status Domain will be created, with main components:
- Analysis Status Events will be associated with analyses (analysis ids)
- Components (AutoScan, Core) post analysis status events to the generic Events Topic
- The Analysis Status Worker consumes the events through a FIFO queue, and stores them in a convenient format
- The Analysis Status Service provides the status for project branches and pull requests
- Branches must be supported for compatibility with non-AutoScan analyses
- cuuid (required): correlation id (propagated from upstream)
- status (required): "in-progress" | "completed" | "failed"
- reason: the error code when status: "failed". It is less than 50 characters, for example "LICENSE_ERROR", "CLONE_ERROR", "SCANNER_ERROR", "UNKNOWN_ERROR".
- analysis_id (required): the analysis id, a globally unique id
- project_uuid (required)
- branch_name or pull_request_key
- Either one or the other must be present, not both at the same time
- date (required): date with timezone
- origin: "autoscan" | "core" | CI system name
- origin_id: CE task id | CI system job id
- revision: the commit sha1 in case of Git, for example
Note: minor transformation will be applied on the payload before storage:
- status renamed to analysis_status, because "status" is a reserved word in DynamoDB
- instant will be computed from date as an integer timestamp
- ttl will be computed as now + 6 months
- Composite fields for indexes will be added
- The fields listed above will be copied, and nothing else will be stored
Analysis status events will be created by various components at various points:
- AutoScan GitHub Filter:
- As soon as a project_uuid is found: in-progress
- On any error: failed
- AutoScan Worker:
- Right before cloning: in-progress
- On any error: failed
- Compute Engine:
- On report processing started: in-progress
- On report processing completed: success
- On any error: failed
The worker processes the incoming messages in the FIFO queue, and simply stores them in a DynamoDB without any additional processing.
A web service used by the frontend to get the status of a project branch or pull request.
- Must answer the needs of the frontend:
- What is the status of the currently viewed branch or pull request: in progress, failed, or successful
- Should the current view be refreshed (a previously in-progress analysis has completed with success)
- Secured: delegated through the Core, no direct access from the frontend
- A shared secret is required between this service and the Core
- Ideally, access should also be restricted at the network level (but we don't know how to do that way, yet)
The Core will expose a new web service to the frontend:
- Endpoint: /api/analysis_statuses/list
- project: the project key
- branch: (optional) the name of the branch, for example master
- pullRequest: (optional) the pull request key, for example 123
- Either branch or pullRequest must be present, and not both at the same time
- The response will contain all analyses "in-progress", or if there are none, then the most recent analysis
The Core in turn will request status from the new Status Service.
- Endpoint: TBD with Ops; perhaps a dedicated sub-domain name, passed in sonar.properties of the Core
- project_uuid: the project key
- branch_name: (optional) the name of the branch, for example master
- pull_request_key: (optional) the pull request key, for example 123
- Either branch_name or pull_request_key must be present, and not both at the same time
- The response will contain all analyses "in-progress", or if there are none in progress, then the most recent analysis (completed or failed)
- If there are no analyses, then respond with 404
The status of projects will be stored in a DynamoDB table.
Example data items:
- When a new event is received: put item for given analysis_id (replacing if exists)
- Get status for a branch or pull request: select items matching project_uuid and branch_key or pull_request_key where status is "in-progress"
- Remove from the list expired items
- If there are no items in-progress, then find the last item
Not needed, because it would be overkill. A lost final status is at worst a minor inconvenience and can be fixed by rerunning the analysis of the branch, pull request.
Special considerations to keep compatibility with non-AutoScan analyses:
- The Core must emit analysis status events as described above.
- This makes it possible to switch the frontend to use the new analysis status service, and not worry whether the analysis came from AutoScan or somewhere else.
- The analysis status service must support branches.
The new analysis status service will not be aware of previous analyses that ended in error. The frontend will handle this case, by falling back to the old method (ask Compute Engine) when the analysis status service returns no status.
The lambdas and services should be monitored and trigger alarms as usual in other components of AutoScan.
- Error rates for each micro service / lambda
- transient AWS errors, external calls, bad payloads, our own bugs, …
- Latency for each micro service
- especially important are the calls to the status service to get status, which will be called by the Core at high frequency
- Number of expired analysis (still in progress for more than 80 minutes)
- DynamoDB per table and index metrics: data size, bandwidth
- Error rate per reason of status events that are in error, indicating non-functional analysis errors
A.k.a, what can go wrong if the different components cannot talk to each other, in an otherwise correct configuration.
Scenario: the Core does not receive the provision PR request from Events Core Q, for whatever reason
- The PR will not be visible on the UI until the scanner report reaches the CE
- The failure to provision the PR is ignored, Autoscan will carry on
- The CE will create the PR, and its status, after receiving the scanner report from Autoscan
Scenario: PR/branch status is not stored correctly, for whatever reason
- If "in-progress" was missed, the UI will not show the in-progress banner for the PR/branch
- If "completed" was missed, the UI will not dismiss the in-progress banner for the PR/branch, for maximum 80 minutes, after which the in-progress state is considered expired, and it will be logged
- If "failed" was missed, the UI will not replace the in-progress banner with failed banner for the PR/branch. The in-progress banner will get cleaned up the same way as in the case of missed "completed".
Scenario: the status service returns 5xx, or 4xx (excluding 404), for whatever reason
- The UI pops up an error message and stops calling the service
- The status service is designed for high availability, so such behavior should be extremely rare, or else it's an indicator of a real problem.
The lambdas in the new Analysis Status Domain look simple enough, which make them good candidates to implement in Python.