labelListFilter
label​List​Filter:​*
accepts or rejects labels based on their relation to other labels on the label list.
You can use label list filters to shape the label lists returned by label collectors.
You can use the following label list filters in your analysis requests:
-
label​List​Filter:​accept​All
-
Accepts all labels on the list.
-
label​List​Filter:​composite
-
Accepts labels if they are accepted by all or any of the label list filters you provide.
-
label​List​Filter:​diversified
-
Prunes the semantically-similar labels from the list.
-
label​List​Filter:​switch
-
Enables or disables the label list filter you provide.
-
label​List​Filter:​truncated​Phrases
-
Rejects truncated labels, that is labels that are word-wise prefixes or suffixes of longer labels of exactly the same frequency.
label​List​Filter:​reference
-
References a
label​List​Filter:​*
component defined in the request or in the project's default components.
label​List​Filter:​accept​All
Accepts all labels on the list.
{
"type": "labelListFilter:acceptAll"
}
label​List​Filter:​composite
Accepts labels if they are accepted by all or any of the label list filters you provide.
{
"type": "labelListFilter:composite",
"labelListFilters": [],
"operator": "AND",
"sortOrder": "DESCENDING",
"weightAggregation": "MEAN"
}
To compute the output list of labels, Lingo4G applies each label filter you provide to the input list and
aggregates the results based on the operator
and
weight​Aggregation
properties.
label​List​Filters
The label filters to apply.
operator
Determines how to combine labels accepted by each of the
label​List​Filters
. The operator
property supports the following values:
O​R
-
Returns labels accepted by at least one of the filters.
A​N​D
-
Returns labels accepted by all the filters.
sort​Order
Determines the sorting order of the output labels. The sorting criterion is the label weight after aggregation, be ascending or descending.
See
sort​Order
in the documentation of common types for the list of possible values.
weight​Aggregation
Determines how to aggregate weights for labels accepted by more than one label list filter.
See
weight​Aggregation
in the documentation of common types for the list of possible values.
label​List​Filter:​diversified
Attempts to prune semantically-similar labels from the list, replacing a group of related labels with one representative.
{
"type": "labelListFilter:diversified",
"diversity": -1,
"limit": "unlimited"
}
Application to cluster labeling
This filter is particularly useful when labeling document clusters using the
label​Clusters:​document​Cluster​Labels
stage, which tries to describe a cluster of documents with a handful of labels appearing in the cluster's
documents. Without the diversification filter, a four-label cluster description may consist of repetitive
labels, such as: globular clusters, globular cluster system, GC, and
globular clusters (GCs). With the label list diversification filter, Lingo4G can prune the repetitive
labels leaving room for a broader set of meanings, such as: globular clusters, Fornax cluster,
Virgo cluster, dwarf galaxies.
The following request shows how to apply label list diversification as part of document cluster labeling.
{
"stages": {
"docs": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "clustering"
}
},
"clusters": {
"type": "clusters:cd",
"matrix": {
"type": "matrix:knnVectorsSimilarity",
"vectors": {
"type": "vectors:precomputedDocumentEmbeddings"
}
}
},
"clusterLabels": {
"type": "labelClusters:documentClusterLabels",
"labelListFilter": {
"type": "labelListFilter:diversified",
"limit": 4,
"diversity": -0.5
},
"maxLabels": 20
}
}
}
Note that to collect a four-label description for each cluster, we need to ensure that the label diversification
filter receives a larger number of labels to prune. In this example, we provide 20 (possibly repetitive) labels
on input using the
max​Labels
property and ask the filter to narrow those down to 4 labels.
Algorithm
The filter processes the labels in the order they appear on input, taking the following steps:
-
Add the first input label to the result.
-
For the second and subsequent labels on the input list:
-
Compute the embedding similarity of the input label against the labels already included in the result.
-
If the similarity to any of the labels in the result is larger than the following threshold:
where:
— the similarity threshold,
— the average similarity of the input label against all other input labels,
— the standard deviation of the similarity of the input label against all other input labels,
— thediversity
property value.
-
Diversity
Determines how aggressive the filtering of repetitive labels is.
The larger the value of this property, the stronger the filtering. With very large diversity values, the filtering may leave just one label.
See the filtering algorithm for the technical details.
limit
The maximum number of labels to output.
Note that for large diversity
values, the
number of output labels may be smaller than the limit.
label​List​Filter:​switch
Enables or disables the label list filter you provide.
{
"type": "labelListFilter:switch",
"enabled": true,
"labelListFilter": null
}
You can use this filter to dynamically activate or deactivate any label list filter based on the value of a
boolean
variable passed to the
enabled
property. Without the
label​List​Filter:​switch
component, the only way to deactivate a filter would be to remove the filter
from the request.
The following request shows how to control two label filters with two boolean variables. Achieving this requires three elements:
-
Defining boolean variables, called
remove​Incomplete​Phrases
andremove​Redundant​Phrases
in our request. -
Wrapping the label list filters to control with a
label​List​Filter:​switch
filter. -
Referencing the boolean variables in the
enabled
properties of thelabel​Filter:​switch
filters.
{
"name": "Enabling / disabling a label list filter using a variable.",
"variables": {
"removeIncompletePhrases": {
"name": "Remove incomplete phrases",
"value": true
},
"removeRedundantPhrases": {
"name": "Remove redundant phrases",
"value": true
}
},
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "clustering"
},
"limit": 1000
},
"labels": {
"type": "labels:filtered",
"labels": {
"type": "labels:fromDocuments"
},
"labelListFilter": {
"type": "labelListFilter:composite",
"labelListFilters": [
{
"type": "labelListFilter:switch",
"enabled": {
"@var": "removeIncompletePhrases"
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
}
},
{
"type": "labelListFilter:switch",
"enabled": {
"@var": "removeRedundantPhrases"
},
"labelListFilter": {
"type": "labelListFilter:diversified",
"diversity": -3
}
}
]
}
}
}
}
The above request exposes two variable to enable or disable the
label​List​Filter:​truncated​Phrases
and
label​List​Filter:​diversified
filters. Try changing the
values of the variables and observe how the label list changes.
With the label​List​Filter:​switch
you can dynamically control label list filters without changing the
structure of the request. Additionally, boolean variables get represented as check boxes in the
request variable editor.
enabled
Enables or disables the label filter you provide.
If true
, applies the
label​Filter
to the input labels. If false
, returns the input labels without any filtering.
label​List​Filter
The label list filter to control.
label​List​Filter:​truncated​Phrases
Rejects truncated labels, that is labels that are word-wise prefixes or suffixes of longer labels of exactly the same frequency.
{
"type": "labelListFilter:truncatedPhrases"
}
label​List​Filter:​*
Consumers of
The following stages and components take label​List​Filter:​*
as
input: