query

query:‚Äč* components specify lists of documents in a declarative way. Typically, you pass them to the documents:‚Äčby‚ÄčQuery stage to execute document searches or to document‚ÄčContent to highlight search terms in document content.

You can use the following query:‚Äč* components in your analysis requests:

query:‚Äčall

Matches all documents in the index.

query:‚Äčcomplement

Negates the query you provide.

query:‚Äčcomposite

Composes a list of queries using the AND or OR operator.

query:‚Äčfilter

Narrows down the matches of the query you provide to the set of documents also matched by the filter query.

query:‚Äčfor‚ÄčField‚ÄčValues

Matches documents containing any of the provided values in one or more content fields.

query:‚Äčfor‚ÄčLabels

Matches documents containing labels you provide.

query:‚Äčfrom‚ÄčDocuments
A query that matches the documents you provide.
query:‚Äčstring

Parses text queries using the Lucene query parser of your choice.


query:‚Äčreference

References a query:‚Äč* component defined in the request or in the project's default components.


query:‚Äčall

A query matching all documents in the index.

{
  "type": "query:all"
}

query:‚Äčcomplement

Negates the set of documents from the query you provide.

{
  "type": "query:complement",
  "query": null
}

query

Type
query
Default
null
Required
yes

The query to negate. Any documents not matching this query will be returned.

query:‚Äčcomposite

Composes a list of queries using the A‚ÄčN‚ÄčD or O‚ÄčR operators.

{
  "type": "query:composite",
  "operator": "OR",
  "queries": []
}

Note that certain query component implementations (like query:‚Äčstring) may offer built-in Boolean operations that are more efficient. This component should be used to combine documents from different query implementations.

operator

Type
string
Default
"OR"
Constraints
one of [OR, AND]
Required
no

Declares the way documents from queries are combined. The operator property supports the following values:

O‚ÄčR

Produces the union of all unique documents from all queries.

A‚ÄčN‚ÄčD

Produces the intersection of all documents from all queries. A document must appear in all queries to appear in the output.

queries

Type
array of query
Default
[]
Required
no

A list of query:* components to compose.

query:‚Äčfilter

Narrows down the matches of the query you provide to the set of documents also matched by the filter query.

{
  "type": "query:filter",
  "filter": null,
  "query": null
}

The query:‚Äčfilter component acts similar to the query:‚Äčcomposite component with the A‚ÄčN‚ÄčD operator. The subtle difference is that filter queries do not contribute to document scores.

Here is an example request using the query:‚Äčfilter component and searching for occurrences of cats and dogs, where the document score is only computed for the hits on dogs.

{
  "comment": "query:filter component example (content and highlighting stages for demonstration)",
  "components": {
    "query": {
      "type": "query:filter",
      "query": {
        "type": "query:string",
        "query": "dogs"
      },
      "filter": {
        "type": "query:string",
        "query": "cats"
      }
    }
  },
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:reference",
        "use": "query"
      },
      "limit": 10
    },
    "content": {
      "type": "documentContent",
      "fields": {
        "type": "contentFields:grouped",
        "groups": [
          {
            "fields": [
              "title", "abstract"
            ],
            "config": {
              "maxValues": 3,
              "maxValueLength": 160,
              "highlighting": {
                "enabled": true,
                "startMarker": "‚ĀĆ%s‚Āć",
                "endMarker": "‚ĀĆ\\%s‚Āć"
              }
            }
          }
        ]
      },
      "queries": {
        "q1": {
          "type": "query:reference",
          "use": "query"
        }
      }
    }
  }
}

filter

Type
query
Default
null
Required
yes

Any query:‚Äč* component that acts as and A‚ÄčN‚ÄčD (conjunctive) clause but does not contribute to scoring.

query

Type
query
Default
null
Required
yes

Any query:‚Äč* component reference which takes part in document scoring.

query:‚Äčfor‚ÄčField‚ÄčValues

Matches documents containing any of the provided values in one or more content fields.

{
  "type": "query:forFieldValues",
  "fields": {
    "type": "contentFields:reference",
    "auto": true
  },
  "values": []
}

The typical use case for this type of query is selecting large numbers (thousands) of documents based on their identifiers or some other unique field values. An equivalent Boolean string query will be less efficient.

fields

Type
contentFields
Default
{
  "type": "contentFields:reference",
  "auto": true
}
Required
no

A reference to the content‚ÄčField:‚Äč* component providing the set of field names to scan for the presence of values. At least one field is required.

values

Type
array of string
Default
[]
Required
no

An array of field values to match.

Note that a "field value" is actually the value stored in the inverted index. A field with an analyzer that tokenizes strings into multiple values (or otherwise manipulates them) will result in index values that are different to those passed on input. We recommend to use this type of query for literal fields only.

query:‚Äčfor‚ÄčLabels

Matches documents containing labels from any labels:‚Äč* component you provide.

{
  "type": "query:forLabels",
  "fields": {
    "type": "featureFields:reference",
    "auto": true
  },
  "labels": {
    "type": "labels:reference",
    "auto": true
  },
  "minOrMatches": 1,
  "operator": "OR"
}

In this example request, we search for any documents that contain any existing labels present in an explicit snippet of text. Such a scenario can be useful for looking up documents that are similar to the provided text (a basic more-like-this functionality).

{
  "comment": "query:filter component example (content and highlighting stages for demonstration)",
  "components": {
    "query": {
      "type": "query:forLabels",
      "fields":{
        "type": "featureFields:simple",
        "fields": [
          "abstract$phrases"
        ]
      },
      "labels": {
        "type": "labels:fromText",
        "text": "yellow cats and blue dogs"
      },
      "operator": "OR",
      "minOrMatches": 2
    }
  },
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:reference",
        "use": "query"
      },
      "limit": 10
    },
    "content": {
      "type": "documentContent",
      "fields": {
        "type": "contentFields:grouped",
        "groups": [
          {
            "fields": [
              "title", "abstract"
            ],
            "config": {
              "maxValues": 3,
              "maxValueLength": 160,
              "highlighting": {
                "enabled": true,
                "startMarker": "‚ĀĆ%s‚Āć",
                "endMarker": "‚ĀĆ\\%s‚Āć"
              }
            }
          }
        ]
      },
      "queries": {
        "q1": {
          "type": "query:reference",
          "use": "query"
        }
      }
    }
  }
}

fields

Type
featureFields
Default
{
  "type": "featureFields:reference",
  "auto": true
}
Required
no

An array of one or more feature fields.

labels

Type
labels
Default
{
  "type": "labels:reference",
  "auto": true
}
Required
no

The source of labels.

min‚ÄčOr‚ÄčMatches

Type
integer
Default
1
Constraints
value > 0
Required
no

Sets the minimum number of labels that must match in a document for it to be included in the result. This setting applies to O‚ÄčR-type queries only (disjunction queries).

operator

Type
string
Default
"OR"
Constraints
one of [OR, AND]
Required
no

Declares the way labels should be composed:

O‚ÄčR

Produces documents matching any of the labels.

A‚ÄčN‚ÄčD

Produces documents matching all the labels.

query:‚Äčfrom‚ÄčDocuments

Extracts the search query from the documents stage you provide.

{
  "type": "query:fromDocuments",
  "buildFromDocumentIds": false,
  "documents": {
    "type": "documents:reference",
    "auto": true
  }
}

If the input documents originate from a search query, such as query:‚Äčstring, this query becomes equal to that underlying search query. Otherwise, which is the case, for example, for documents:‚Äčby‚ÄčId or documents:‚Äčembedding‚ÄčNearest‚ÄčNeighbors, this query becomes a synthetic query matching exactly the ids of the input documents.

query:‚Äčfrom‚ÄčDocuments has two practical use cases:

  • Highlighting query occurrences in a union of document lists. The following request illustrates this use case:

    {
      "name": "Using query:fromDocuments for query occurrence highlighting",
      "comment": "The primary use case for query:fromDocuments is highlighting query occurrences in a union of documents.",
      "stages": {
        "documents1": {
          "type": "documents:byQuery",
          "query": {
            "type": "query:string",
            "query": "photon"
          }
        },
        "documents2": {
          "type": "documents:byQuery",
          "query": {
            "type": "query:string",
            "query": "electron"
          }
        },
        "union": {
          "type": "documents:composite",
          "selectors": [
            {
              "type": "documents:reference",
              "use": "documents1"
            },
            {
              "type": "documents:reference",
              "use": "documents2"
            }
          ],
          "operator": "OR"
        },
        "content": {
          "type": "documentContent",
          "limit": 10,
          "documents": {
            "type": "documents:reference",
            "use": "union"
          },
          "queries": {
            "q1": {
              "type": "query:fromDocuments",
              "documents": {
                "type": "documents:reference",
                "use": "documents1"
              }
            },
            "q2": {
              "type": "query:fromDocuments",
              "documents": {
                "type": "documents:reference",
                "use": "documents2"
              }
            }
          }
        }
      }
    }

    Using query:‚Äčfrom‚ÄčDocuments to highlight query occurrences in a union of multiple lists of documents.

    While query highlighting in the above request could be implemented by referencing the corresponding queries both in the documents:‚Äčby‚ÄčQuery and document‚ÄčContent.queries map, this is not possible in general. In particular, the documents:‚Äčrwmd stage generates a query that cannot be constructed in any other way ‚Äď query:‚Äčfrom‚ÄčDocuments is the only way to access that query for highlighting purposes. See the similar document retrieval tutorial for real-world examples.

  • Filtering by sampled document set. To improve the performance of certain requests, you can use the documents:sample stage to take a random sample of a set of documents and process only that sample rather than the whole set. For stages requiring a query on input, you can use the query:‚Äčfrom‚ÄčDocuments component to convert the random sample of documents into a query.

    The following example uses query:‚Äčfrom‚ÄčDocuments to create a consistent sample of documents falling within two overlapping time periods.

    {
      "name": "Using query:fromDocuments with documents:sample.",
      "components": {
        "rangeQuery0": {
          "type": "query:string",
          "query": "created:[2015-01-01 TO 2017-01-01]"
        },
        "rangeQuery1": {
          "type": "query:string",
          "query": "created:[2016-01-01 TO 2018-01-01]"
        }
      },
      "stages": {
        "sample": {
          "type": "documents:sample",
          "limit": 10000,
          "query": {
            "type": "query:composite",
            "queries": [
              {
                "type": "query:reference",
                "use": "rangeQuery0"
              },
              {
                "type": "query:reference",
                "use": "rangeQuery1"
              }
            ],
            "operator": "OR"
          }
        },
        "documents0": {
          "type": "documents:byQuery",
          "query": {
            "type": "query:filter",
            "query": {
              "type": "query:reference",
              "use": "rangeQuery0"
            },
            "filter": {
              "type": "query:fromDocuments",
              "documents": {
                "type": "documents:reference",
                "use": "sample"
              },
              "buildFromDocumentIds": true
            }
          }
        },
        "documents1": {
          "type": "documents:byQuery",
          "query": {
            "type": "query:filter",
            "query": {
              "type": "query:reference",
              "use": "rangeQuery1"
            },
            "filter": {
              "type": "query:fromDocuments",
              "documents": {
                "type": "documents:reference",
                "use": "sample"
              },
              "buildFromDocumentIds": true
            }
          }
        }
      }
    }

    Using query:‚Äčfrom‚ÄčDocuments for random sampling of documents.

    In the components section, the request defines two queries that determine the boundaries of two overlapping time periods. The sample stage samples 10k documents covering the union of the time periods. Finally, the documents0 and documents1 stages select the sample of documents for the two time periods, using query:‚Äčfrom‚ÄčDocuments in query:‚Äčfilter.filter property. Note that the request sets the build‚ÄčFrom‚ÄčDocument‚ÄčIds property to true in both filter queries. This causes Lingo4G to build queries matching only the documents selected at the sampling stage rather than pass the original search query provided to the sample stage.

    Note that for overlapping time periods, sampling from each individual period leads to overrepresentation of certain periods. If this is undesirable, the above request avoids the problem by performing sampling only once for the union of all time periods.

build‚ÄčFrom‚ÄčDocument‚ÄčIds

Type
boolean
Default
false
Required
no

If true, builds a query that matches the input documents by internal identifiers. Otherwise, returns the original query used by the input documents stage.

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

The document selector from which to extract the query.

query:‚Äčstring

Parses text queries using the Apache Lucene query parser of your choice.

{
  "type": "query:string",
  "query": "",
  "queryParser": {
    "type": "queryParser:project",
    "queryParserKey": ""
  }
}

Text queries provide a very powerful and flexible way of selecting a subset of documents matching the provided criteria. The type of query‚ÄčParser will determine how the query text is interpreted. The query parsers chapter lists all available query parsers and provides examples of their query syntax.

In the example request below, we search for all occurrences of the word cat, preceding the word dog by no more than 15 word positions. Note the highlighted fragment, which is the query:‚Äčstring component inside the documents:‚Äčby‚ÄčQuery stage.

{
  "comment": "query:string component example (content and highlighting stages for demonstration)",
  "components": {
    "query": {
      "type": "query:string",
      "query": "fn:maxWidth(15 fn:ordered(cat dog))"
    }
  },
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:reference",
        "use": "query"
      },
      "limit": 10
    },
    "content": {
      "type": "documentContent",
      "fields": {
        "type": "contentFields:grouped",
        "groups": [
          {
            "fields": [
              "title", "abstract"
            ],
            "config": {
              "maxValues": 3,
              "maxValueLength": 160,
              "highlighting": {
                "enabled": true,
                "startMarker": "‚ĀĆ%s‚Āć",
                "endMarker": "‚ĀĆ\\%s‚Āć"
              }
            }
          }
        ]
      },
      "queries": {
        "q1": {
          "type": "query:reference",
          "use": "query"
        }
      }
    }
  }
}

query

Type
string
Default
<empty string>
Required
no

The text query to pass to the query parser.

query‚ÄčParser

Type
queryParser
Default
{
  "type": "queryParser:project",
  "queryParserKey": ""
}
Required
no

The name of the query parser to use. If blank, the default query parser definition from the project descriptor is used.

Consumers of query:‚Äč*

The following stages and components take query:‚Äč* as input:

Stage or component Property
debug:‚Äčexplain
  • query
  • dictionary:‚Äčquery‚ÄčTerms
  • query
  • document‚ÄčContent
  • queries
  • document‚ÄčPairs:‚Äčduplicates
  • query
  • query
  • documents:‚Äčby‚ÄčQuery
  • query
  • documents:‚Äčembedding‚ÄčNearest‚ÄčNeighbors
  • filter‚ÄčQuery
  • documents:‚Äčsample
  • query
  • query:‚Äčcomplement
  • query
  • query:‚Äčcomposite
  • queries
  • query:‚Äčfilter
  • query
  • filter