labelCollector
label​Collector:​*
components extract labels from feature fields of a single document. Label collectors
play a crucial role when
fetching and aggregating labels from many documents
or when
computing similarities between documents.
You can use the following label collector components in your analysis requests:
-
label​Collector:​top​From​Feature​Fields
-
Collects the document's most frequent labels based on the frequency thresholds of your choice.
label​Collector:​reference
-
References a
label​Collector:​*
component defined in the request or in the project's default components.
label​Collector:​top​From​Feature​Fields
Collects the document's most frequent labels from one or more feature fields, based on the frequency thresholds of your choice.
{
"type": "labelCollector:topFromFeatureFields",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"minTf": 0,
"minTfMass": 1,
"tieResolution": "AUTO"
}
This component works by summing up occurrences of all input labels from the provided
fields
. Then, labels that don't pass
labelFilter
criteria or the minimum frequency thresholds are removed. An additional
label list filter
can be applied to the entire result to eliminate truncated phrases, for example.
In the example request shown below, we look for the top three most frequent labels in each document returned for the provided query.
The above request produces the following response.
fields
One or more fields from which labels are retrieved. The value of this property should contain or reference one
of the
content​Fields:​*
components.
label​Filter
A
label​Filter:​*
component that can be used to remove undesired labels.
label​List​Filter
A
label​List​Filter:​*
component that can be used to remove undesired labels, similar to the
label filter. Label list filters have access to
the entire set of labels of each document so they can make more optimal global choices.
Incomplete phrase removal
filter is an example of this.
min​Tf
Minimum label frequency (inclusive). Label frequency is aggregated across all selected fields.
min​Tf​Mass
Minimum relative term frequency of a label with respect to the total frequency of all labels (after filtering) retrieved from the document's fields.
The values of this parameter must be between 0 and 1.
tie​Resolution
The strategy of computing the number of returned labels when their frequencies at the tail of the sorted list are equal and the consuming component requests a fixed number of labels.
The tie​Resolution
property supports the following values:
T​R​U​N​C​A​T​E
-
Truncate the output at the limit of labels set by the consumer component.
E​X​T​E​N​D
-
Extend the list of labels past the limit to include all labels with the same weight.
R​E​D​U​C​E
-
Reduce the list of labels so that all labels with non-tied weights are included.
A​U​T​O
-
Behaves the same as
R​E​D​U​C​E
, unless the returned list of labels would be empty, in which case behaves likeE​X​T​E​N​D
.
label​Collector:​*
Consumers of
The following stages and components take label​Collector:​*
as
input: