contentFields

content​Fields:​* specifies a list of fields whose content to retrieve for display purposes. You can also configure various aspects of the output, such as the maximum number of field values and characters to return, whether to include label occurrence (highlight) markers and what strings they should be.

You can use the following content field definitions in your analysis requests:

content​Fields:​empty

Empty list of content fields.

content​Fields:​grouped

Defines a list of content fields groups, each group with a dedicated output configuration.

content​Fields:​simple

Defines a list of content fields with output configuration for each field.


content​Fields:​reference

References a content​Fields:​* component defined in the request or in the project's default components.


content​Fields:​empty

An empty set of fields (no field content should be returned).

{
  "type": "contentFields:empty"
}

content​Fields:​grouped

Defines a list of content fields groups, each group with a dedicated output configuration.

{
  "type": "contentFields:grouped",
  "groups": []
}

This component decreases configuration verbosity by allowing logical groups of fields with the same configuration to be declared once. For example, in this document​Content request, the full content of the title field and a maximum of two values, trimmed to at most 160 characters from the abstract and author_name fields is requested.

{
  "stages": {    
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "\"twin photon\" correlations"
      },
      "limit": 2
    },
    "documentContent": {
      "type": "documentContent",
      "fields": {
        "type": "contentFields:grouped",
        "groups": [
          {
            "fields": ["title"],
            "config": {
              "maxValues": "unlimited",
              "maxValueLength": "unlimited"
            }
          },
          {
            "fields": ["abstract", "author_name"],
            "config": {
              "maxValues": 2,
              "maxValueLength": 160
            }
          }
        ]
      }
    }
  },
  "output": {
    "stages": [
      "documentContent"
    ]
  }
}

An example document​Content stage using the content​Fields:​grouped component.

groups

Type
array of object
Default
[]
Required
no

An array of objects declaring configuration for a set of fields. Each object must specify the fields property with an array of field names and a config property of type content​Field with the configuration that should apply to all the fields.

See complete definition in this example.

content​Fields:​simple

Defines a list of content fields with an explicit output configuration for each field.

{
  "type": "contentFields:simple",
  "fields": {}
}

This component can be used to specify the field content retrieval configuration explicitly for each field. It is more verbose compared to content​Fields:​grouped but it may be simpler for a small set of fields. For example, in this request we ask for the title and abstract fields, without any limits on their length or value count.

{
  "stages": {    
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "\"twin photon\" correlations"
      },
      "limit": 2
    },
    "documentContent": {
      "type": "documentContent",
      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {
            "maxValues": "unlimited",
            "maxValueLength": "unlimited"
          },
          "abstract": {        
            "maxValues": "unlimited",
            "maxValueLength": "unlimited"
          }
        }
      }
    }
  },
  "output": {
    "stages": [
      "documentContent"
    ]
  }
}

An example document​Content stage using the content​Fields:​simple component.

fields

Type
object of contentField
Default
{}
Required
no

An object with content field names and their associated content​Field specification as value.

See complete definition in this example.

content​Field

Provides the specification of how content fields should be trimmed or modified before they are returned (typically by the document content retrieval stage).

{
  "highlighting": {
    "enabled": true,
    "endMarker": "⁌\\%s⁍",
    "startMarker": "⁌%s⁍"
  },
  "maxValueLength": 250,
  "maxValues": 3,
  "truncationMarker": "…",
  "valueCount": false
}

highlighting

Type
object
Default
{
  "enabled": true,
  "startMarker": "⁌%s⁍",
  "endMarker": "⁌\\%s⁍"
}
Required
no

Specifies whether the field content should be highlight-marked and what the highlight markers are.

This specification is relevant for query-in-context highlighting (queries defined in the document​Content.queries property). See the document content retrieval tutorial for a full example.

When highlighting is enabled, max​Values specifies the maximum number of text passages (snippets), while max​Value​Length specifies each passage's length (window size). The algorithm will try to maximize the number of query matches in the returned text fragments.

When the field value has no query-matching regions, the value follows normal processing (respecting max​Value​Length and max​Values).

Note that the highlighted text ranges can nest, overlap or both. To make downstream rendering easier, any overlapping highlights are closed and reopened to form a valid HTML-like nesting structures.

enabled

Type
boolean
Default
true
Required
no

Enables or disables query highlight processing for the field.

end​Marker

Type
string
Default
"⁌\\%s⁍"
Required
no

A string inserted after each query matching region. The string can contain a special substitution sequence %s which is replaced by the query name (as declared in document​Content.queries). The default value is ⁌%s⁍ and utilizes a pair of rarely used Unicode characters 0x204​C and 0x204​D).

See this API response for an example of what example query highlights can look like.

start​Marker

Type
string
Default
"⁌%s⁍"
Required
no

A string inserted before each query matching region. The string can contain a special substitution sequence %s which is replaced by the query name (as declared in document​Content.queries). The default value is ⁌%s⁍ and utilizes a pair of rarely used Unicode characters 0x204​C and 0x204​D).

See this API response for an example of what example query highlights can look like.

max​Value​Length

Type
limit
Default
250
Required
no

Truncates each returned value to at most max​Value​Length characters. This option is useful to guard against fields that can be very long (full text content of research papers, for example). This can make API responses smaller.

If highlighting is not enabled, the trailing end of each value will be trimmed if it exceeds max​Value​Length characters. If a non-empty truncation​Marker is declared, it will be appended at the end of each trimmed value.

If highlighting is enabled, then the value of max​Value​Length serves as a hint suggesting the desired length of the best highlighted fragment. The value can be trimmed anywhere (one or more substrings can be picked from each value). If a non-empty truncation​Marker is declared, it will be inserted at all points where the value is sliced.

See this chapter for examples of requests and responses that make use of this option.

max​Values

Type
limit
Default
3
Required
no

A single document field in Lingo4G can have multiple values. For example, a research paper can have multiple authors. This option can be used to limit the number of returned values if not all of them are needed for display purposes. This can make API responses smaller.

If the number of document values exceeds max​Values and truncation​Marker is not empty, then a synthetic trailing value containing that truncation marker is added.

See this chapter for examples of requests and responses that make use of this option.

truncation​Marker

Type
string
Default
"…"
Required
no

The string used to indicate places where the original document content has been truncated or otherwise spliced. Also see max​Value​Length and highlighting properties.

value​Count

Type
boolean
Default
false
Required
no

If true, include an additional property with the value count, even if the list of values is limited to max​Values setting. This option is helpful to determine the count of values in multi-valued fields (when their values are truncated by max​Values.

Consumers of content​Fields:​*

The following stages and components take content​Fields:​* as input:

Stage or component Property
document​Content
  • fields
  • document​Overlap
  • fields
  • fields
  • label​Collector:​all​From​Content​Fields
  • fields
  • query:​for​Field​Values
  • fields