query
query:*
components specify lists of documents in a declarative way. Typically, you pass them to the
documents:byQuery
stage to execute document searches or to
documentContent
to highlight search terms in document content.
You can use the following query:*
components in your analysis requests:
-
query:all
-
Matches all documents in the index.
-
query:complement
-
Negates the query you provide.
-
query:composite
-
Composes a list of queries using the AND or OR operator.
-
query:filter
-
Narrows down the matches of the query you provide to the set of documents also matched by the filter query.
-
query:forDocumentFields
-
Invokes the
queryBuilder
on field values of one document you provide and returns the query. -
query:forFieldValues
-
Matches documents containing any of the provided values in one or more content fields.
-
query:forLabels
-
Matches documents containing labels you provide.
-
query:fromDocuments
- A query that matches the documents you provide.
-
query:fromQueryBuilder
-
Invokes the
queryBuilder
on a constant set of inputs and returns the query. -
query:string
-
Parses text queries using the Lucene query parser of your choice.
query:reference
-
References a
query:*
component defined in the request or in the project's default components.
query:all
A query matching all documents in the index.
{
"type": "query:all"
}
query:complement
Negates the set of documents from the query
you provide.
{
"type": "query:complement",
"query": null
}
query
The query to negate. Any documents not matching this query will be returned.
query:composite
Composes a list of queries using the AND
or OR
operators.
{
"type": "query:composite",
"operator": "OR",
"queries": []
}
Note that certain query component implementations (like
query:string
) may offer built-in Boolean operations that are more efficient. This component should be
used to combine documents from different query implementations.
operator
Declares the way documents from queries
are combined. The operator
property supports the following values:
OR
-
Produces the union of all unique documents from all queries.
AND
-
Produces the intersection of all documents from all queries. A document must appear in all queries to appear in the output.
queries
A list of query:* components to compose.
query:filter
Narrows down the matches of the query you provide to the set of documents also matched by the filter query.
{
"type": "query:filter",
"filter": null,
"query": null
}
The query:filter
component acts similar to the query:composite
component with the
AND
operator. The subtle difference is that
filter
queries do not contribute to document scores.
Here is an example request using the query:filter
component and searching for occurrences of
cats and dogs, where the document score is only computed for the hits on dogs.
filter
Any query:*
component that acts as and AND
(conjunctive) clause but does not
contribute to scoring.
query
Any query:*
component reference which takes part in document scoring.
query:forDocumentFields
Invokes the queryBuilder
on field values of one
document you provide and returns the query.
{
"type": "query:forDocumentFields",
"documents": {
"type": "documents:reference",
"auto": true
},
"queryBuilder": {
"type": "queryBuilder:reference",
"auto": true
}
}
documents
The document for which to build the query.
The selector must return exactly one document.
queryBuilder
The query builder to use to build the query.
query:forFieldValues
Matches documents containing any of the provided values in one or more content fields.
{
"type": "query:forFieldValues",
"fields": {
"type": "contentFields:reference",
"auto": true
},
"values": []
}
The typical use case for this type of query is selecting large numbers (thousands) of documents based on their identifiers or some other unique field values. An equivalent Boolean string query will be less efficient.
fields
A reference to the contentField:*
component providing the set of field names to scan for the
presence of values. At least one field is required.
values
An array of field values to match.
Note that a "field value" is actually the value stored in the inverted index. A field with an analyzer that
tokenizes strings into multiple values (or otherwise manipulates them) will result in index values that are
different to those passed on input. We recommend to use this type of query for
literal
fields only.
query:forLabels
Matches documents containing labels from any
labels:*
component you provide.
{
"type": "query:forLabels",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labels": {
"type": "labels:reference",
"auto": true
},
"minOrMatches": 1,
"operator": "OR"
}
In this example request, we search for any documents that contain any existing labels present in an explicit snippet of text. Such a scenario can be useful for looking up documents that are similar to the provided text (a basic more-like-this functionality).
fields
An array of one or more feature fields.
labels
The source of labels.
minOrMatches
Sets the minimum number of labels that must match in a document for it to be included in the result. This
setting applies to
OR
-type queries only (disjunction queries).
operator
Declares the way labels should be composed:
OR
-
Produces documents matching any of the labels.
AND
-
Produces documents matching all the labels.
query:fromDocuments
Extracts the search query from the documents stage you provide.
{
"type": "query:fromDocuments",
"buildFromDocumentIds": false,
"documents": {
"type": "documents:reference",
"auto": true
}
}
If the input documents originate from a search query, such as
query:string
, this query becomes equal to that underlying search query. Otherwise, which is the case, for example, for
documents:byId
or
documents:embeddingNearestNeighbors
, this query becomes a synthetic query matching exactly the ids of the input documents.
query:fromDocuments
has two practical use cases:
-
Highlighting query occurrences in a union of document lists. The following request illustrates this use case:
While query highlighting in the above request could be implemented by referencing the corresponding queries both in the
documents:byQuery
anddocumentContent.queries
map, this is not possible in general. In particular, thedocuments:rwmd
stage generates a query that cannot be constructed in any other way –query:fromDocuments
is the only way to access that query for highlighting purposes. See the similar document retrieval tutorial for real-world examples. -
Filtering by sampled document set. To improve the performance of certain requests, you can use the documents:sample stage to take a random sample of a set of documents and process only that sample rather than the whole set. For stages requiring a query on input, you can use the
query:fromDocuments
component to convert the random sample of documents into a query.The following example uses
query:fromDocuments
to create a consistent sample of documents falling within two overlapping time periods.In the components section, the request defines two queries that determine the boundaries of two overlapping time periods. The
sample
stage samples 10k documents covering the union of the time periods. Finally, thedocuments0
anddocuments1
stages select the sample of documents for the two time periods, usingquery:fromDocuments
inquery:filter.filter
property. Note that the request sets thebuildFromDocumentIds
property totrue
in both filter queries. This causes Lingo4G to build queries matching only the documents selected at the sampling stage rather than pass the original search query provided to thesample
stage.Note that for overlapping time periods, sampling from each individual period leads to overrepresentation of certain periods. If this is undesirable, the above request avoids the problem by performing sampling only once for the union of all time periods.
buildFromDocumentIds
If true
, builds a query that matches the input documents by internal identifiers. Otherwise,
returns the original query used by the input documents stage.
documents
The document selector from which to extract the query.
query:fromQueryBuilder
Invokes the queryBuilder
on a constant set of inputs and returns the query.
{
"type": "query:fromQueryBuilder",
"queryBuilder": {
"type": "queryBuilder:reference",
"auto": true
}
}
The primary use case of this component is combining multiple user-provided variable values into a single more complex query or reusing the same user input to build multiple queries.
queryBuilder
The query builder to call.
This component invokes the query builder with an empty set of inputs, so the query builder you provide must use
the input
property of
each queryBuilderVariable
to
query:string
Parses text queries using the Apache Lucene query parser of your choice.
{
"type": "query:string",
"query": "",
"queryParser": {
"type": "queryParser:project",
"queryParserKey": ""
}
}
Text queries provide a very powerful and flexible way of selecting a subset of documents matching the provided
criteria. The type of
queryParser
will determine how the query text is interpreted.
The query parsers chapter lists all available query parsers
and provides examples of their query syntax.
In the example request below, we search for all occurrences of the word
cat
, preceding the word dog
by no more than 15 word positions. Note the highlighted
fragment, which is the query:string
component inside the
documents:byQuery
stage.
query
The text query to pass to the query parser.
queryParser
The name of the query parser to use. If blank, the default query parser definition from the project descriptor is used.
query:*
Consumers of
The following stages and components take query:*
as
input: