labelAggregator
label​Aggregator:​*
components aggregate labels
collected from multiple individual documents into a single list of
labels. This can be used for presentation purposes to display a list of dominant labels present in the set of
documents.
You can use the following label aggregators in your analysis requests:
-
label​Aggregator:​top​Weight
-
Returns the top-N highest-weighted labels.
label​Aggregator:​top​Weight
Aggregates labels returned by the
label​Collector
for each input document using the
output​Weight​Formula
.
{
"type": "labelAggregator:topWeight",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"failIfEmbeddingsNotAvailable": true,
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"labelWeighting": "EMBEDDING",
"minWeight": 0,
"minWeightMass": 1,
"tieResolution": "AUTO"
},
"maxLabelsPerDocument": 10,
"maxRelativeDf": 1,
"minAbsoluteDf": 1,
"minRelativeDf": 0,
"minWeight": 0,
"outputWeightFormula": "TF",
"threads": "auto",
"tieResolution": "AUTO"
}
For example, this request displays the top-10 labels that are two words or longer in documents matching the
electric field phrase. Note label aggregator component is declared inside the
labels:​from​Documents
stage.
The request above produces the following output:
label​Collector
Defines the source of labels for each document. You can tune the source field where labels are read from, as well as a set of filters applied to them, prior to any aggregation.
max​Labels​Per​Document
Maximum number of labels taken for the aggregation from each document.
max​Relative​Df
Maximum relative document frequency of each label to be included in aggregation (inclusive). The threshold is computed relative to the size of document set (scope size). For example, a relative df of 0.8 means if more than 80% of documents contain a given label, it will be skipped in the aggregation.
min​Absolute​Df
Minimum absolute document frequency of each label to be included in aggregation (inclusive).
min​Relative​Df
Minimum relative document frequency of each label to be included in aggregation (inclusive). The threshold is computed relative to the size of document set (scope size). For example, a relative df of 0.3 means at least 30% of documents must contain a given label for it to be included in the aggregation.
min​Weight
Minimum weight of aggregated labels.
output​Weight​Formula
The strategy of computing the output weight for each label.
The output​Weight​Formula
property supports the following values:
T​F
-
Weight is computed from aggregated term occurrence counts.
D​F
-
Weight is computed from aggregated document occurrence counts.
threads
The number of threads used for computing label aggregations.
tie​Resolution
The strategy of computing the number of returned labels when their weights at the tail of the list are equal and the consuming component requests a fixed number of labels.
The tie​Resolution
property supports the following values:
T​R​U​N​C​A​T​E
-
Truncate the output at the limit of labels set by the consumer component.
E​X​T​E​N​D
-
Extend the list of labels past the limit to include all labels with the same weight.
R​E​D​U​C​E
-
Reduce the list of labels so that all labels with non-tied weights are included.
A​U​T​O
-
Behaves the same as
R​E​D​U​C​E
, unless the returned list of labels would be empty, in which case behaves likeE​X​T​E​N​D
.
label​Aggregator:​*
Consumers of
The following stages and components take label​Aggregator:​*
as
input:
Stage or component | Property |
---|---|
label​Clusters:​document​Cluster​Labels | label​Aggregator |
labels:​from​Documents | label​Aggregator |