labelAggregator

label​Aggregator:​* components aggregate labels collected from multiple individual documents into a single list of labels. This can be used for presentation purposes to display a list of dominant labels present in the set of documents.

You can use the following label aggregators in your analysis requests:

label​Aggregator:​top​Weight

Returns the top-N highest-weighted labels.

label​Aggregator:​top​Weight

Aggregates labels returned by the label​Collector for each input document using the output​Weight​Formula.

{
  "type": "labelAggregator:topWeight",
  "labelCollector": {
    "type": "labelCollector:topFromFeatureFields",
    "fields": {
      "type": "featureFields:reference",
      "auto": true
    },
    "labelFilter": {
      "type": "labelFilter:reference",
      "auto": true
    },
    "labelListFilter": {
      "type": "labelListFilter:truncatedPhrases"
    },
    "minTf": 0,
    "minTfMass": 1,
    "tieResolution": "AUTO"
  },
  "maxLabelsPerDocument": 10,
  "maxRelativeDf": 1,
  "minAbsoluteDf": 1,
  "minRelativeDf": 0,
  "minWeight": 0,
  "outputWeightFormula": "TF",
  "threads": "auto",
  "tieResolution": "AUTO"
}

For example, this request displays the top-10 labels that are two words or longer in documents matching the electric field phrase. Note label aggregator component is declared inside the labels:​from​Documents stage.

{
  "output": {
    "stages": [
      "labels"
    ]
  },
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "\"electric field\""
      },
      "limit": "unlimited"
    },
    "labels": {
      "type": "labels:fromDocuments",
      "maxLabels": {
        "type": "labelCount:fixed",
        "value": 10
      },
      "labelAggregator": {
        "type": "labelAggregator:topWeight",
        "labelCollector": {
          "type": "labelCollector:topFromFeatureFields",
          "labelFilter": {
            "type": "labelFilter:tokenCount",
            "minTokens": 2
          }
        }
      }
    }
  }
}

The request above produces the following output:

{
  "result" : {
    "labels" : {
      "labels" : [
        {
          "label" : "electric field",
          "weight" : 3564.0
        },
        {
          "label" : "magnetic field",
          "weight" : 902.0
        },
        {
          "label" : "black hole",
          "weight" : 149.0
        },
        {
          "label" : "quantum dot",
          "weight" : 146.0
        },
        {
          "label" : "domain wall",
          "weight" : 133.0
        },
        {
          "label" : "external electric field",
          "weight" : 128.0
        },
        {
          "label" : "thin films",
          "weight" : 125.0
        },
        {
          "label" : "pair production",
          "weight" : 123.0
        },
        {
          "label" : "electromagnetic field",
          "weight" : 118.0
        },
        {
          "label" : "ground state",
          "weight" : 113.0
        }
      ]
    }
  }
}

label​Collector

Type
labelCollector
Default
{
  "type": "labelCollector:topFromFeatureFields",
  "labelFilter": {
    "type": "labelFilter:reference",
    "auto": true
  },
  "labelListFilter": {
    "type": "labelListFilter:truncatedPhrases"
  },
  "fields": {
    "type": "featureFields:reference",
    "auto": true
  },
  "minTf": 0,
  "minTfMass": 1,
  "tieResolution": "AUTO"
}
Required
no

Defines the source of labels for each document. You can tune the source field where labels are read from, as well as a set of filters applied to them, prior to any aggregation.

max​Labels​Per​Document

Type
limit
Default
10
Required
no

Maximum number of labels taken for the aggregation from each document.

max​Relative​Df

Type
number
Default
1
Constraints
value >= 0 and value <= 1
Required
no

Maximum relative document frequency of each label to be included in aggregation (inclusive). The threshold is computed relative to the size of document set (scope size). For example, a relative df of 0.8 means if more than 80% of documents contain a given label, it will be skipped in the aggregation.

min​Absolute​Df

Type
integer
Default
1
Constraints
value >= 0
Required
no

Minimum absolute document frequency of each label to be included in aggregation (inclusive).

min​Relative​Df

Type
number
Default
0
Constraints
value >= 0 and value <= 1
Required
no

Minimum relative document frequency of each label to be included in aggregation (inclusive). The threshold is computed relative to the size of document set (scope size). For example, a relative df of 0.3 means at least 30% of documents must contain a given label for it to be included in the aggregation.

min​Weight

Type
number
Default
0
Constraints
value >= 0
Required
no

Minimum weight of aggregated labels.

output​Weight​Formula

Type
string
Default
"TF"
Constraints
one of [TF, DF]
Required
no

The strategy of computing the output weight for each label.

The output​Weight​Formula property supports the following values:

T​F

Weight is computed from aggregated term occurrence counts.

D​F

Weight is computed from aggregated document occurrence counts.

threads

Type
threads
Default
auto
Required
no

The number of threads used for computing label aggregations.

tie​Resolution

Type
string
Default
"AUTO"
Constraints
one of [TRUNCATE, EXTEND, REDUCE, AUTO]
Required
no

The strategy of computing the number of returned labels when their weights at the tail of the list are equal and the consuming component requests a fixed number of labels.

The tie​Resolution property supports the following values:

T​R​U​N​C​A​T​E

Truncate the output at the limit of labels set by the consumer component.

E​X​T​E​N​D

Extend the list of labels past the limit to include all labels with the same weight.

R​E​D​U​C​E

Reduce the list of labels so that all labels with non-tied weights are included.

A​U​T​O

Behaves the same as R​E​D​U​C​E, unless the returned list of labels would be empty, in which case behaves like E​X​T​E​N​D.

Consumers of label​Aggregator:​*

The following stages and components take label​Aggregator:​* as input:

Stage or component Property
label​Clusters:​document​Cluster​Labels
  • label​Aggregator
  • labels:​from​Documents
  • label​Aggregator