documentContent

The document​Content stage retrieves the fields of documents you provide, optionally highlighting occurrences of a list of search queries.

This document contains API reference only, please refer to document content retrieval chapter for examples and strategies of formatting document fields.

document​Content

Returns the fields of documents provided by the documents reference.

{
  "type": "documentContent",
  "documents": {
    "type": "documents:reference",
    "auto": true
  },
  "fields": {
    "type": "contentFields:reference",
    "auto": true
  },
  "limit": "unlimited",
  "mode": "STREAMING",
  "queries": {},
  "start": 0
}

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

A mandatory reference to any documents:* component or stage, providing documents for which fields should be retrieved.

fields

Type
contentFields
Default
{
  "type": "contentFields:reference",
  "auto": true
}
Required
no

This property controls which fields are returned for each document and how they are formatted.

The value of this property should contain or reference one of the content​Fields:​* components.

limit

Type
limit
Default
unlimited
Required
no

An optional document limit (can be used to implement paging through large document sets).

The value must be an integer >= 0 or the string unlimited.

mode

Type
string
Default
"STREAMING"
Constraints
one of [STREAMING, EAGER]
Required
no

Document content retrieval mode.

The mode property supports the following values:

S​T​R​E​A​M​I​N​G

Causes Lingo4G to retrieve the document content on demand while serializing and sending the result JSON back to the caller. In this mode, Lingo4G can avoid allocating extra buffers for the document contents and can hide some of the content retrieval time within the network access latencies.

The streaming mode is suitable for typical document content retrieval scenarios, where you retrieve up to a thousand of query-highlighted documents or any number of documents without query highlighting.

Note, however, the following caveats:

  • In the S​T​R​E​A​M​I​N​G mode, the complexity of the content retrieval, mostly the amount of work required to highlight query occurrences, has impact on the speed with which the Lingo4G REST API delivers the response JSON. The speed will vary depending on the complexity of content retrieval. Use the E​A​G​E​R mode for peak response download speeds.

  • In the S​T​R​E​A​M​I​N​G mode, Lingo4G cannot track the progress and report the time spent on retrieving the document contents in the response status. This is because the retrieval happens when sending the response back to the client. Use the E​A​G​E​R mode to track the document retrieval progress.

E​A​G​E​R

Causes Lingo4G to retrieve the contents of all documents when processing the analysis request. Use this mode to track the progress of document retrieval as part of the analysis request execution.

Note that in the E​A​G​E​R mode Lingo4G temporarily allocates a memory buffer that holds all the contents you requested to retrieve.

queries

Type
object of query
Default
{}
Required
no

An optional map of named query:​* components, each of which should be highlighted in the returned document fields.

See the query highlighting section in the content retrieval tutorial for more information.

start

Type
integer
Default
0
Constraints
value >= 0
Required
no

An optional starting offset of the first document in documents (can be used to implement paging through large document sets).

Consumers of document​Content:​*

The following stages and components take document​Content:​* as input:

Stage or component Property
vectors:​from​Embedding​Service
  • document​Content