dictionary
Ad-hoc dictionary:​*
components are used to filter labels. Such dictionaries are typically used for
per-query filtering of junk labels or to narrow down the set of labels to a specific subset.
The following dictionary:​*
stage types are available for use in analysis request JSONs:
-
dictionary:​all
-
This dictionary includes all labels.
-
dictionary:​glob
-
Filters labels matching wildcard expressions (example:
* eclipse
). -
dictionary:​project
-
Uses the referenced dictionary declared in the project descriptor.
-
dictionary:​query​Terms
-
Excludes terms extracted from a query (if possible).
-
dictionary:​regex
-
Filters labels matching any provided regular expression.
dictionary:​reference
-
References a
dictionary:​*
component defined in the request or in the project's default components.
dictionary:​all
Includes entries from all project-level dictionaries defined in the
dictionaries
section of the project descriptor.
{
"type": "dictionary:all"
}
dictionary:​glob
A glob dictionary allows filtering labels using word-based wildcard matching.
{
"type": "dictionary:glob",
"entries": []
}
The primary use case of the glob matcher is case-insensitive matching of entire phrases, as well as "begins with…", "ends with…" or "contains…" rules. Glob matcher entries are fast to parse and very fast to apply.
In the request below, we request the top aggregated labels form documents matching the electric field query but filter out any label containing electric and an exact label state:
entries
An array of strings, each representing a single glob matching rule.
See the project descriptor reference for syntax specification and examples of glob matching rules.
dictionary:​project
This dictionary is a reference to a dictionary declared at the project descriptor level.
{
"type": "dictionary:project",
"dictionary": null
}
Project dictionaries are compiled once so if their content does not change between requests, it makes sense to move them to the project level and use a reference within the request.
dictionary
The identifier of the referenced dictionary at the project descriptor level.
dictionary:​query​Terms
Excludes individual terms extracted from a
query. For example, a string
query
cats ​O​R dogs
would construct a dictionary filtering the terms cat
and dog
.
{
"type": "dictionary:queryTerms",
"query": null
}
This implementation works on a best-effort basis. It is not always possible to extract query terms from complex Lucene queries (or other query implementations). Also, the shape of extracted queries may depend on the query analyzer pipeline (for example, stemming options).
query
The query to extract excluded terms from.
dictionary:​regex
This dictionary type excludes any labels that match one or more regular expressions. It offers more expressive syntax, but is expensive to parse and apply.
{
"type": "dictionary:regex",
"entries": []
}
Glob dictionaries are fast to parse and very fast to apply. Regular expressions are an order of magnitude slower and have to be applied to all label candidates, which may slow down processing significantly.
Each entry in the regular expression dictionary must be a valid Java Regular Expression pattern. If a label's string (as a whole) matches at least one of the patterns defined in the dictionary, it is marked as a positive match and filtered out.
entries
An array of strings, each containing a regular expression. Note that double quotes and backslashes are special characters and must be escaped appropriately.
See the project descriptor reference for examples of regular expression dictionary entries.
dictionary:​*
Consumers of
The following stages and components take dictionary:​*
as
input:
Stage or component | Property |
---|---|
label​Filter:​dictionary | exclude |