contentFields
contentFields:*
specifies a list of fields whose content to
retrieve for display purposes. You can also
configure various aspects of the output, such as the maximum number of field values and
characters to return, whether to include label occurrence (highlight) markers and what strings they should be.
You can use the following content field definitions in your analysis requests:
-
contentFields:empty
-
Empty list of content fields.
-
contentFields:grouped
-
Defines a list of content fields groups, each group with a dedicated output configuration.
-
contentFields:simple
-
Defines a list of content fields with output configuration for each field.
contentFields:reference
-
References a
contentFields:*
component defined in the request or in the project's default components.
contentFields:empty
An empty set of fields (no field content should be returned).
{
"type": "contentFields:empty"
}
contentFields:grouped
Defines a list of content fields groups, each group with a dedicated output configuration.
{
"type": "contentFields:grouped",
"groups": []
}
This component decreases configuration verbosity by allowing logical groups of fields with the same configuration
to be declared once. For example, in this
documentContent
request, the full content of the title
field and a maximum of two values, trimmed to at most 160
characters from the abstract
and author_name
fields is requested.
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "\"twin photon\" correlations"
},
"limit": 2
},
"documentContent": {
"type": "documentContent",
"fields": {
"type": "contentFields:grouped",
"groups": [
{
"fields": ["title"],
"config": {
"maxValues": "unlimited",
"maxValueLength": "unlimited"
}
},
{
"fields": ["abstract", "author_name"],
"config": {
"maxValues": 2,
"maxValueLength": 160
}
}
]
}
}
},
"output": {
"stages": [
"documentContent"
]
}
}
An example
documentContent
stage using the contentFields:grouped
component.
groups
An array of objects declaring configuration for a set of fields. Each object must specify the
fields
property with an array of field names and a config
property of type
contentField
with the configuration that should apply to all the
fields.
See complete definition in this example.
contentFields:simple
Defines a list of content fields with an explicit output configuration for each field.
{
"type": "contentFields:simple",
"fields": {}
}
This component can be used to specify the field content retrieval configuration explicitly for each field. It is
more verbose compared to
contentFields:grouped
but it may be simpler for a small set of fields. For example, in this request we ask for the
title
and abstract
fields, without any limits on their length or value count.
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "\"twin photon\" correlations"
},
"limit": 2
},
"documentContent": {
"type": "documentContent",
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {
"maxValues": "unlimited",
"maxValueLength": "unlimited"
},
"abstract": {
"maxValues": "unlimited",
"maxValueLength": "unlimited"
}
}
}
}
},
"output": {
"stages": [
"documentContent"
]
}
}
An example
documentContent
stage using the contentFields:simple
component.
fields
An object with content field names and their associated
contentField
specification as value.
See complete definition in this example.
contentField
Provides the specification of how content fields should be trimmed or modified before they are returned (typically by the document content retrieval stage).
{
"highlighting": {
"enabled": true,
"endMarker": "⁌\\%s⁍",
"startMarker": "⁌%s⁍"
},
"maxValueLength": 250,
"maxValues": 3,
"truncationMarker": "…",
"valueCount": false
}
highlighting
Specifies whether the field content should be highlight-marked and what the highlight markers are.
This specification is relevant for query-in-context highlighting (queries defined in the
documentContent.queries
property). See the
document content retrieval tutorial
for a full example.
When highlighting is enabled,
maxValues
specifies the maximum number of text passages (snippets), while
maxValueLength
specifies each passage's length (window size). The algorithm will try to maximize the number of query matches in
the returned text fragments.
When the field value has no query-matching regions, the value follows normal processing (respecting
maxValueLength
and maxValues
).
Note that the highlighted text ranges can nest, overlap or both. To make downstream rendering easier, any overlapping highlights are closed and reopened to form a valid HTML-like nesting structures.
enabled
Enables or disables query highlight processing for the field.
endMarker
A string inserted after each query matching region. The string can contain a special substitution
sequence %s
which is replaced by the query name (as declared in
documentContent.queries
). The default value is ⁌%s⁍
and utilizes a pair of rarely used Unicode characters
0x204C
and 0x204D
).
See this API response for an example of what example query highlights can look like.
startMarker
A string inserted before each query matching region. The string can contain a special substitution
sequence %s
which is replaced by the query name (as declared in
documentContent.queries
). The default value is ⁌%s⁍
and utilizes a pair of rarely used Unicode characters
0x204C
and 0x204D
).
See this API response for an example of what example query highlights can look like.
maxValueLength
Truncates each returned value to at most
maxValueLength
characters. This option is useful to guard against fields that can be very long
(full text content of research papers, for example). This can make API responses smaller.
If highlighting is not enabled, the trailing end of each value
will be trimmed if it exceeds maxValueLength
characters. If a non-empty
truncationMarker
is declared, it will be appended at the end of each trimmed value.
If highlighting is enabled, then the value of
maxValueLength
serves as a hint suggesting the desired length of the best highlighted fragment. The
value can be trimmed anywhere (one or more substrings can be picked from each value). If a non-empty
truncationMarker
is declared, it will be inserted at all points where the value is sliced.
See this chapter for examples of requests and responses that make use of this option.
maxValues
A single document field in Lingo4G can have multiple values. For example, a research paper can have multiple authors. This option can be used to limit the number of returned values if not all of them are needed for display purposes. This can make API responses smaller.
If the number of document values exceeds maxValues
and
truncationMarker
is not empty, then a synthetic trailing value containing that truncation marker is added.
See this chapter for examples of requests and responses that make use of this option.
truncationMarker
The string used to indicate places where the original document content has been truncated or otherwise spliced.
Also see
maxValueLength
and
highlighting
properties.
valueCount
If true
, include an additional property with the value count, even if the list of values is limited
to
maxValues
setting. This option is helpful to determine the count of values in multi-valued fields (when their values are
truncated by maxValues
.
contentFields:*
Consumers of
The following stages and components take contentFields:*
as
input:
Stage or component | Property |
---|---|
documentContent | fields |
documentOverlap | fields fields |
labelCollector:allFromContentFields | fields |
query:forFieldValues | fields |