Uploaded image for project: 'SonarQube'
  1. SonarQube
  2. SONAR-12187

Make SQ restart recover from ES indices made read-only due to low disk watermark reached

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.8
    • Component/s: ElasticSearch
    • Labels:
      None
    • Edition:
      Community
    • Production Notes:
      None

      Description

      WHY

      Elasticsearch has some built-in mechanism which prevents it from flooding disk with indices data, it's called Disk-based Shard Allocation.

      This feature is impact is slightly different for DataCenter edition from the other edition becomes ES runs on several only for the former edition.

      When ES runs on a single node, the impact is that ES will make all indices read-only when the 95% (default value) free disk watermark is reached.

      When ES runs on multiple nodes, the impact is smoother:

      • 85% watermark reached on some ES node: no more shard will be allocated to this node
      • 90% watermark reached on some ES node: ES will try and move shard(s) away from this node
      • 95% watermark reached on some ES node: now ES will make read-only any index which shared can not be moved away from this node

      Freeing disk space is not enough to recover from indices being read-only and restarting SQ (and therefor ES) won't be enough either. Each index must be made read-write back with a command such as:

      PUT /twitter/_settings
      {
        "index.blocks.read_only_allow_delete": null
      } 

      Because of this, user who got into that situation had no other option than deleting their indices to recover, see:

      WHAT

      Two immediate actions should be taken:

      1. users should be provided with a mean to recover from their indices being read-only without having to delete and rebuild them
        • rebuilding the indices can be very costly
        • it's too strong a punishment for what could be just a transitive or accidental lack of disk
      2. documentation should be updated to inform users

      We are currently using ES default free disk watermark setting values (85%, 90% and 95%). While there is no reason to disable this feature (ES offers this option), these values probably do not make sense in some (many?) situations. Eg.:

      • SQ ES indices are barely taking 1Gb on disk. They have been made read-only on my personal computer because I've reached the 85% watermark of a 70Gb disk but I have 9Gb left!
      • I have a huge enterprise machine with 1TB of disk, ES indices will be made read-only while I still have 150Gb of free disk!

      Shall we offer the possibility to configure the watermark thresholds (they can be percentages or byte values)? Should SQ override them based on some algorithm?

      HOW

      Recover from read-only indices

      At SQ startup, reset any index which would be read-only.

      If there is still no free disk, ES will either put the indices back into read-only or reject the reset. This should be confirmed.

      Update documentation

      Requirements for 15% free disk was [added recently|https://github.com/SonarSource/sonar-enterprise/commit/4b3e712a29fe14a48d6646372783b9e12071163d].

      Documentation should be completed to:

      • mention how to recover from read-only indices?
      • mention the behavior for the DataCenter edition?
      • mention how to tuneĀ Disk-based Shard Allocation in SQ?

        Attachments

          Activity

            People

            Assignee:
            sebastien.lesaint Sebastien Lesaint
            Reporter:
            sebastien.lesaint Sebastien Lesaint
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Due:
              Created:
              Updated:
              Resolved: