contentFields

content‚ÄčFields:‚Äč* specifies a list of fields whose content to retrieve for display purposes. You can also configure various aspects of the output, such as the maximum number of field values and characters to return, whether to include label occurrence (highlight) markers and what strings they should be.

You can use the following content field definitions in your analysis requests:

content‚ÄčFields:‚Äčempty

Empty list of content fields.

content‚ÄčFields:‚Äčgrouped

Defines a list of content fields groups, each group with a dedicated output configuration.

content‚ÄčFields:‚Äčsimple

Defines a list of content fields with output configuration for each field.


content‚ÄčFields:‚Äčreference

References a content‚ÄčFields:‚Äč* component defined in the request or in the project's default components.


content‚ÄčFields:‚Äčempty

An empty set of fields (no field content should be returned).

{
  "type": "contentFields:empty"
}

content‚ÄčFields:‚Äčgrouped

Defines a list of content fields groups, each group with a dedicated output configuration.

{
  "type": "contentFields:grouped",
  "groups": []
}

This component decreases configuration verbosity by allowing logical groups of fields with the same configuration to be declared once. For example, in this document‚ÄčContent request, the full content of the title field and a maximum of two values, trimmed to at most 160 characters from the abstract and author_name fields is requested.

{
  "stages": {    
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "\"twin photon\" correlations"
      },
      "limit": 2
    },
    "documentContent": {
      "type": "documentContent",
      "fields": {
        "type": "contentFields:grouped",
        "groups": [
          {
            "fields": ["title"],
            "config": {
              "maxValues": "unlimited",
              "maxValueLength": "unlimited"
            }
          },
          {
            "fields": ["abstract", "author_name"],
            "config": {
              "maxValues": 2,
              "maxValueLength": 160
            }
          }
        ]
      }
    }
  },
  "output": {
    "stages": [
      "documentContent"
    ]
  }
}

An example document‚ÄčContent stage using the content‚ÄčFields:‚Äčgrouped component.

groups

Type
array of object
Default
[]
Required
no

An array of objects declaring configuration for a set of fields. Each object must specify the fields property with an array of field names and a config property of type content‚ÄčField with the configuration that should apply to all the fields.

See complete definition in this example.

content‚ÄčFields:‚Äčsimple

Defines a list of content fields with an explicit output configuration for each field.

{
  "type": "contentFields:simple",
  "fields": {}
}

This component can be used to specify the field content retrieval configuration explicitly for each field. It is more verbose compared to content‚ÄčFields:‚Äčgrouped but it may be simpler for a small set of fields. For example, in this request we ask for the title and abstract fields, without any limits on their length or value count.

{
  "stages": {    
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "\"twin photon\" correlations"
      },
      "limit": 2
    },
    "documentContent": {
      "type": "documentContent",
      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {
            "maxValues": "unlimited",
            "maxValueLength": "unlimited"
          },
          "abstract": {        
            "maxValues": "unlimited",
            "maxValueLength": "unlimited"
          }
        }
      }
    }
  },
  "output": {
    "stages": [
      "documentContent"
    ]
  }
}

An example document‚ÄčContent stage using the content‚ÄčFields:‚Äčsimple component.

fields

Type
object of contentField
Default
{}
Required
no

An object with content field names and their associated content‚ÄčField specification as value.

See complete definition in this example.

content‚ÄčField

Provides the specification of how content fields should be trimmed or modified before they are returned (typically by the document content retrieval stage).

{
  "highlighting": {
    "enabled": true,
    "endMarker": "‚ĀĆ\\%s‚Āć",
    "startMarker": "‚ĀĆ%s‚Āć"
  },
  "maxValueLength": 250,
  "maxValues": 3,
  "truncationMarker": "…",
  "valueCount": false
}

highlighting

Type
object
Default
{
  "enabled": true,
  "startMarker": "‚ĀĆ%s‚Āć",
  "endMarker": "‚ĀĆ\\%s‚Āć"
}
Required
no

Specifies whether the field content should be highlight-marked and what the highlight markers are.

This specification is relevant for query-in-context highlighting (queries defined in the document‚ÄčContent.queries property). See the document content retrieval tutorial for a full example.

When highlighting is enabled, max‚ÄčValues specifies the maximum number of text passages (snippets), while max‚ÄčValue‚ÄčLength specifies each passage's length (window size). The algorithm will try to maximize the number of query matches in the returned text fragments.

When the field value has no query-matching regions, the value follows normal processing (respecting max‚ÄčValue‚ÄčLength and max‚ÄčValues).

Note that the highlighted text ranges can nest, overlap or both. To make downstream rendering easier, any overlapping highlights are closed and reopened to form a valid HTML-like nesting structures.

enabled

Type
boolean
Default
true
Required
no

Enables or disables query highlight processing for the field.

end‚ÄčMarker

Type
string
Default
"‚ĀĆ\\%s‚Āć"
Required
no

A string inserted after each query matching region. The string can contain a special substitution sequence %s which is replaced by the query name (as declared in document‚ÄčContent.queries). The default value is ‚ĀĆ%s‚Āć and utilizes a pair of rarely used Unicode characters 0x204‚ÄčC and 0x204‚ÄčD).

See this API response for an example of what example query highlights can look like.

start‚ÄčMarker

Type
string
Default
"‚ĀĆ%s‚Āć"
Required
no

A string inserted before each query matching region. The string can contain a special substitution sequence %s which is replaced by the query name (as declared in document‚ÄčContent.queries). The default value is ‚ĀĆ%s‚Āć and utilizes a pair of rarely used Unicode characters 0x204‚ÄčC and 0x204‚ÄčD).

See this API response for an example of what example query highlights can look like.

max‚ÄčValue‚ÄčLength

Type
limit
Default
250
Required
no

Truncates each returned value to at most max‚ÄčValue‚ÄčLength characters. This option is useful to guard against fields that can be very long (full text content of research papers, for example). This can make API responses smaller.

If highlighting is not enabled, the trailing end of each value will be trimmed if it exceeds max‚ÄčValue‚ÄčLength characters. If a non-empty truncation‚ÄčMarker is declared, it will be appended at the end of each trimmed value.

If highlighting is enabled, then the value of max‚ÄčValue‚ÄčLength serves as a hint suggesting the desired length of the best highlighted fragment. The value can be trimmed anywhere (one or more substrings can be picked from each value). If a non-empty truncation‚ÄčMarker is declared, it will be inserted at all points where the value is sliced.

See this chapter for examples of requests and responses that make use of this option.

max‚ÄčValues

Type
limit
Default
3
Required
no

A single document field in Lingo4G can have multiple values. For example, a research paper can have multiple authors. This option can be used to limit the number of returned values if not all of them are needed for display purposes. This can make API responses smaller.

If the number of document values exceeds max‚ÄčValues and truncation‚ÄčMarker is not empty, then a synthetic trailing value containing that truncation marker is added.

See this chapter for examples of requests and responses that make use of this option.

truncation‚ÄčMarker

Type
string
Default
"…"
Required
no

The string used to indicate places where the original document content has been truncated or otherwise spliced. Also see max‚ÄčValue‚ÄčLength and highlighting properties.

value‚ÄčCount

Type
boolean
Default
false
Required
no

If true, include an additional property with the value count, even if the list of values is limited to max‚ÄčValues setting. This option is helpful to determine the count of values in multi-valued fields (when their values are truncated by max‚ÄčValues.

Consumers of content‚ÄčFields:‚Äč*

The following stages and components take content‚ÄčFields:‚Äč* as input:

Stage or component Property
document‚ÄčContent
  • fields
  • document‚ÄčOverlap
  • fields
  • fields
  • query:‚Äčfor‚ÄčField‚ÄčValues
  • fields