When ops or services have to investigate on a potential performance issue, they can face some pains in profiling the performance on Compute Engine i.e. to understand where the processing time is spent and why some tasks take longer than others.
The main pains:
In a nutshell , it's not clear from the logs why an analysis takes longer than another one.
- We are missing the information about the type of branch analyzed
- There's no indication about the size of the analyzed report
- and also more generally about the context in which this report is processed (project size, number of issues)
- There's no simple way to spot or exclude DB or ES latency - no way to simply see where the time is spent and to check if this time is normal in regards of the analysis. The alternative is to activate TRACE logs and to manually compute the number of requests and the time spent in requests.
- In cluster mode (DCE), it can be hard to identify from the logs if / why a node and a specific process has a problem.
There's no indication whether the logs are coming from a node or another and whether the logs are linked to the actual process that takes CPU resources.
The Compute Engine should provide more information in the logs to help understanding why and where the computing time is spent:
- the nature of the analysis
- the report which is processed
- and the processing itself
We expect these information to be available both in 6.7.5 LTS and 7.3 versions.
Out of the scope, we can think, for 7.x, about providing for each analysis the full context of the analysis (in addition to the scanner context).
- CE logs should clearly show whether the processed report corresponds to a long-lived branch, a short-lived branch or a PR.
If this type is not known when the CE starts computing a task, the information should be dumped in an additional INFO log.
- CE logs should display contextual information info about:
- the report in general
- report size (uncompressed)
- number of folders
- number of files
- number of lines of code
- number of issues
- and, more specifically about the notification step:
- the number of channels
- the number of subscribers
- the number of issues that were updated
Ex: | subscribers=580 issues=73
- the report in general
- Some CE logs are useless at DEBUG level and should either moved to the TRACE level or muted:
and other logs from o.a.f.u.ContentHandlerFactoryRegistry, o.a.f.r.RendererFactory, o.a.f.r.XMLHandlerRegistry, o.a.f.r.ImageHandlerRegistry, o.a.x.i.l.s.ImageImplRegistry
- Related: Some Web logs also generate noise and should be muted
and other logs from c.h.i.p.o.PartitionStateOperation, c.h.r.i.o.ReplicationOperation, c.h.i.p.InternalPartitionService, c.h.r.i.o.RequestMapDataOperation, c.h.r.i.o.SyncReplicatedMapDataOperation, c.h.i.c.i.ClusterHeartbeatManager, c.h.i.c.ClusterService
- CE steps and corresponding timing info are currently written in DEBUG level. It would helpful, starting in 7.3, to constantly get this info in production to get a better idea of what is currently happening for a task in progress.
With SonarQube 7.3, CE steps logs should be moved at INFO level.
- CE logs should display detailed timing information info about the steps:
- the split between the time spent in DB requests and ES requests (the time spent in pure CE processing is the difference with the total duration)
- the number fo requests
Ex: | time=170ms db_time=150ms db_req=27 es_time=10ms es_req=3
- In cluster mode, CE logs should display the node name.