labelClusters
The label​Clusters:​*
produce clusters of labels. One typical use case of these stages is to generate
label-based descriptions for clusters of documents.
You can use the following label clustering stages in your requests:
-
label​Clusters:​document​Cluster​Labels
-
Creates label clusters aligned with the document clusters you provide. Use this stage to generate label-based descriptions for clusters of documents.
label​Clusters:​reference
-
References the results of another
label​Clusters:​*
stage defined in the request.
The JSON output of the labelClusters stage has the following structure:
{
"clusters": [
{
"clusters": [
// sub-clusters (recursive structure)
],
"labels": [
{
"label": "first-label",
"weight": 44
},
...
]
},
{
... second cluster
},
... more clusters
]
}
The clusters
property contains an array of clusters. Each cluster has an array of labels (labels
property) and a nested array named clusters
with recursive sub-clusters (the array is empty when no
sub-clusters are present).
Each label inside labels
has a display label
and weight
.
label​Clusters:​document​Cluster​Labels
Creates label clusters aligned with the document clusters you provide in such a way that each label cluster contains labels that occur most frequently in the documents from the corresponding document cluster.
{
"type": "labelClusters:documentClusterLabels",
"clusters": {
"type": "clusters:reference",
"auto": true
},
"documents": {
"type": "documents:reference",
"auto": true
},
"labelAggregator": {
"type": "labelAggregator:topWeight",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"minTf": 0,
"minTfMass": 1,
"tieResolution": "AUTO"
},
"maxLabelsPerDocument": 10,
"maxRelativeDf": 1,
"minAbsoluteDf": 1,
"minRelativeDf": 0,
"minWeight": 0,
"outputWeightFormula": "TF",
"threads": "auto",
"tieResolution": "AUTO"
},
"maxLabels": 3
}
In the example below, we request the top documents matching the query photon, compute their clusters and describe them with cluster labels.
Label clusters for clusters 1-3 are shown below:
clusters
documents
clusters to create label clusters for.
documents
The source documents of clusters referenced in
clusters
.
label​Aggregator
The
label​Aggregator:​*
component used to filter and aggregate labels from each document cluster.
max​Labels
Maximum labels for each cluster, retrieved from
label​Aggregator
.