documentContent
The documentContent
stage retrieves the fields of documents you provide, optionally highlighting
occurrences of a list of search queries.
This document contains API reference only, please refer to document content retrieval chapter for examples and strategies of formatting document fields.
documentContent
Returns the fields of documents provided by the
documents
reference.
{
"type": "documentContent",
"documents": {
"type": "documents:reference",
"auto": true
},
"fields": {
"type": "contentFields:reference",
"auto": true
},
"limit": "unlimited",
"mode": "STREAMING",
"queries": {},
"start": 0
}
documents
A mandatory reference to any documents:* component or stage, providing documents for which fields should be retrieved.
fields
This property controls which fields are returned for each document and how they are formatted.
The value of this property should contain or reference one of the
contentFields:*
components.
limit
An optional document limit (can be used to implement paging through large document sets).
The value must be an integer >= 0 or the string
unlimited
.
mode
Document content retrieval mode.
The mode
property supports the following values:
STREAMING
-
Causes Lingo4G to retrieve the document content on demand while serializing and sending the result JSON back to the caller. In this mode, Lingo4G can avoid allocating extra buffers for the document contents and can hide some of the content retrieval time within the network access latencies.
The streaming mode is suitable for typical document content retrieval scenarios, where you retrieve up to a thousand of query-highlighted documents or any number of documents without query highlighting.
Note, however, the following caveats:
-
In the
STREAMING
mode, the complexity of the content retrieval, mostly the amount of work required to highlight query occurrences, has impact on the speed with which the Lingo4G REST API delivers the response JSON. The speed will vary depending on the complexity of content retrieval. Use theEAGER
mode for peak response download speeds. -
In the
STREAMING
mode, Lingo4G cannot track the progress and report the time spent on retrieving the document contents in the response status. This is because the retrieval happens when sending the response back to the client. Use theEAGER
mode to track the document retrieval progress.
-
EAGER
-
Causes Lingo4G to retrieve the contents of all documents when processing the analysis request. Use this mode to track the progress of document retrieval as part of the analysis request execution.
Note that in the
EAGER
mode Lingo4G temporarily allocates a memory buffer that holds all the contents you requested to retrieve.
queries
An optional map of named
query:*
components, each of which should be highlighted in
the returned document fields.
See the query highlighting section in the content retrieval tutorial for more information.
start
An optional starting offset of the first document in documents (can be used to implement paging through large document sets).
documentContent:*
Consumers of
The following stages and components take documentContent:*
as
input:
Stage or component | Property |
---|---|
vectors:fromEmbeddingService | documentContent |