labelClusters
The label​Clusters:​*
produce clusters of labels. One typical use case of these stages is to generate
label-based descriptions for clusters of documents.
You can use the following label clustering stages in your requests:
-
label​Clusters:​document​Cluster​Labels
-
Creates label clusters aligned with the document clusters you provide. Use this stage to generate label-based descriptions for clusters of documents.
label​Clusters:​reference
-
References the results of another
label​Clusters:​*
stage defined in the request.
The JSON output of the labelClusters stage has the following structure:
{
"clusters": [
{
"clusters": [
// sub-clusters (recursive structure)
],
"labels": [
{
"label": "first-label",
"weight": 44
},
...
]
},
{
... second cluster
},
... more clusters
]
}
The clusters
property contains an array of clusters. Each cluster has an array of labels (labels
property) and a nested array named clusters
with recursive sub-clusters (the array is empty when no
sub-clusters are present).
Each label inside labels
has a display label
and weight
.
label​Clusters:​document​Cluster​Labels
Creates label clusters aligned with the document clusters you provide in such a way that each label cluster contains labels that occur most frequently in the documents from the corresponding document cluster.
{
"type": "labelClusters:documentClusterLabels",
"clusters": {
"type": "clusters:reference",
"auto": true
},
"documents": {
"type": "documents:reference",
"auto": true
},
"labelAggregator": {
"type": "labelAggregator:topWeight",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"minTf": 0,
"minTfMass": 1,
"tieResolution": "AUTO"
},
"maxLabelsPerDocument": 10,
"maxRelativeDf": 1,
"minAbsoluteDf": 1,
"minRelativeDf": 0,
"minWeight": 0,
"outputWeightFormula": "TF",
"threads": "auto",
"tieResolution": "AUTO"
},
"maxLabels": 3
}
In the example below, we request the top documents matching the query photon, compute their clusters and describe them with cluster labels.
{
"name": "Document clusters by More-Like-This similarity",
"comment": "Clusters a set of top documents matching the provided query, based on the common labels the documents share. Attempts to describe the clusters by top-frequency labels from each cluster's documents. Fetches the content of clustered documents.",
"variables": {
"query": {
"name": "Documents query",
"comment": "Defines the set of documents to cluster.",
"value": "photon"
},
"limit": {
"name": "Max documents",
"comment": "The maximum number of documents matching the query to select for clustering.",
"value": 2000
},
"clusterCreationPreference": {
"name": "Cluster creation preference",
"comment": "How many clusters to create. The more negative the preference, the fewer clusters. The closer the preference to 0, the more clusters.",
"value": -1000
},
"clusterLinkingPreference": {
"name": "Cluster linking preference",
"comment": "How many links to create between clusters. Softening of 0 creates unlinked, flat structure of clusters. Softening of 1.0 creates a highly-linked structure of clusters.",
"value": 0
},
"maxSimilarDocuments": {
"name": "Max similar documents",
"comment": "How many similar documents to find for each document in the similarity matrix. The larger the number of similar documents, the larger and more general the clusters and the longer clustering time.",
"value": 10
},
"maxClusterLabels": {
"name": "Max cluster labels",
"comment": "How many labels to use to label each cluster.",
"value": 3
}
},
"components": {
"query": {
"type": "query:string",
"query": {
"@var": "query"
}
}
},
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:reference",
"use": "query"
},
"limit": {
"@var": "limit"
}
},
"content": {
"type": "documentContent",
"limit": {
"@var": "limit"
}
},
"clusters": {
"type": "clusters:ap",
"matrix": {
"type": "matrix:keywordDocumentSimilarity",
"maxNeighbors": {
"@var": "maxSimilarDocuments"
}
},
"inputPreference": {
"@var": "clusterCreationPreference"
},
"softening": {
"@var": "clusterLinkingPreference"
}
},
"labelClusters": {
"type": "labelClusters:documentClusterLabels",
"maxLabels": {
"@var": "maxClusterLabels"
},
"labelAggregator": {
"type": "labelAggregator:topWeight",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"labelFilter": {
"type": "labelFilter:dictionary",
"exclude": [
{
"type": "dictionary:queryTerms",
"query": {
"type": "query:reference",
"use": "query"
}
}
]
}
}
}
}
},
"output": {
"stages": [
"content",
"clusters",
"labelClusters"
]
}
}
Label clusters for clusters 1-3 are shown below:
"clusters": [
{
"clusters": [],
"labels": [
{
"label": "image",
"weight": 9
}
]
},
{
"clusters": [],
"labels": [
{
"label": "two-photon",
"weight": 25
},
{
"label": "cavity",
"weight": 19
},
{
"label": "atoms",
"weight": 18
}
]
},
{
"clusters": [],
"labels": [
{
"label": "camera",
"weight": 9
},
{
"label": "mu",
"weight": 8
},
{
"label": "pair",
"weight": 7
}
]
},
{
"clusters": [],
"labels": [
{
"label": "Ď€",
"weight": 20
},
{
"label": "p_(T)",
"weight": 11
},
{
"label": "K",
"weight": 9
}
]
}
]
clusters
documents
clusters to create label clusters for.
documents
The source documents of clusters referenced in
clusters
.
label​Aggregator
The
label​Aggregator:​*
component used to filter and aggregate labels from each document cluster.
max​Labels
Maximum labels for each cluster, retrieved from
label​Aggregator
.