Uploaded image for project: 'SonarQube'
  1. SonarQube
  2. SONAR-6632

Fail to index rule if description is more than 32kb

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.1
    • Fix Version/s: 5.6
    • Component/s: ElasticSearch, Rules
    • Labels:

      Description

      The size of DB column is correct but the Elasticsearch index does not accept more than 32k characters. The error occurs in version 5.1.x but not in 5.0.

      java.lang.IllegalStateException: Errors while indexing stack: failure in bulk execution:
      [0]: index [rules], type [rule], id [css:validate-property-value], message [IllegalArgumentException[Document contains at least one immense term in field="htmlDesc" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[60, 112, 62, 66, 114, 111, 119, 115, 101, 114, 115, 32, 105, 103, 110, 111, 114, 101, 32, 105, 110, 118, 97, 108, 105, 100, 32, 112, 114, 111]...', original message: bytes can be at most 32766 in length; got 32781]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 32781]; ]
      

      The explanation is the upgrade of Lucene that does not ignore too long terms anymore (see https://issues.apache.org/jira/browse/LUCENE-5472). The root cause is probably usage of an index analyzer that badly tokenizes description.

        Attachments

          Activity

            People

            Assignee:
            simon.brandhof Simon Brandhof (Inactive)
            Reporter:
            simon.brandhof Simon Brandhof (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Due:
              Created:
              Updated:
              Resolved: