labelListFilter

label​List​Filter:​* accepts or rejects labels based on their relation to other labels on the label list. You can use label list filters to shape the label lists returned by label collectors.

You can use the following label list filters in your analysis requests:

label​List​Filter:​accept​All

Accepts all labels on the list.

label​List​Filter:​composite

Accepts labels if they are accepted by all or any of the label list filters you provide.

label​List​Filter:​diversified

Prunes the semantically-similar labels from the list.

label​List​Filter:​switch

Enables or disables the label list filter you provide.

label​List​Filter:​truncated​Phrases

Rejects truncated labels, that is labels that are word-wise prefixes or suffixes of longer labels of exactly the same frequency.


label​List​Filter:​reference

References a label​List​Filter:​* component defined in the request or in the project's default components.


label​List​Filter:​accept​All

Accepts all labels on the list.

{
  "type": "labelListFilter:acceptAll"
}

label​List​Filter:​composite

Accepts labels if they are accepted by all or any of the label list filters you provide.

{
  "type": "labelListFilter:composite",
  "labelListFilters": [],
  "operator": "AND",
  "sortOrder": "DESCENDING",
  "weightAggregation": "MEAN"
}

To compute the output list of labels, Lingo4G applies each label filter you provide to the input list and aggregates the results based on the operator and weight​Aggregation properties.

label​List​Filters

Type
array of labelListFilter
Default
[]
Required
no

The label filters to apply.

operator

Type
string
Default
"AND"
Constraints
one of [OR, AND]
Required
no

Determines how to combine labels accepted by each of the label​List​Filters. The operatorproperty supports the following values:

O​R

Returns labels accepted by at least one of the filters.

A​N​D

Returns labels accepted by all the filters.

sort​Order

Type
sortOrder
Default
"DESCENDING"
Required
no

Determines the sorting order of the output labels. The sorting criterion is the label weight after aggregation, be ascending or descending.

See sort​Order in the documentation of common types for the list of possible values.

weight​Aggregation

Type
weightAggregation
Default
"MEAN"
Required
no

Determines how to aggregate weights for labels accepted by more than one label list filter.

See weight​Aggregation in the documentation of common types for the list of possible values.

label​List​Filter:​diversified

Attempts to prune semantically-similar labels from the list, replacing a group of related labels with one representative.

{
  "type": "labelListFilter:diversified",
  "diversity": -1,
  "limit": "unlimited"
}

Application to cluster labeling

This filter is particularly useful when labeling document clusters using the label​Clusters:​document​Cluster​Labels stage, which tries to describe a cluster of documents with a handful of labels appearing in the cluster's documents. Without the diversification filter, a four-label cluster description may consist of repetitive labels, such as: globular clusters, globular cluster system, GC, and globular clusters (GCs). With the label list diversification filter, Lingo4G can prune the repetitive labels leaving room for a broader set of meanings, such as: globular clusters, Fornax cluster, Virgo cluster, dwarf galaxies.

The following request shows how to apply label list diversification as part of document cluster labeling.

{
  "stages": {
    "docs": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "clustering"
      }
    },
    "clusters": {
      "type": "clusters:cd",
      "matrix": {
        "type": "matrix:knnVectorsSimilarity",
        "vectors": {
          "type": "vectors:precomputedDocumentEmbeddings"
        }
      }
    },
    "clusterLabels": {
      "type": "labelClusters:documentClusterLabels",
      "labelListFilter": {
        "type": "labelListFilter:diversified",
        "limit": 4,
        "diversity": -0.5
      },
      "maxLabels": 20
    }
  }
}

Note that to collect a four-label description for each cluster, we need to ensure that the label diversification filter receives a larger number of labels to prune. In this example, we provide 20 (possibly repetitive) labels on input using the max​Labels property and ask the filter to narrow those down to 4 labels.

Algorithm

The filter processes the labels in the order they appear on input, taking the following steps:

  1. Add the first input label to the result.

  2. For the second and subsequent labels on the input list:

    1. Compute the embedding similarity of the input label against the labels already included in the result.

    2. If the similarity to any of the labels in the result is larger than the following threshold:

      T sim = sim avg − diversity × sim dev

      where:

      T sim — the similarity threshold,

      sim avg — the average similarity of the input label against all other input labels,

      sim dev — the standard deviation of the similarity of the input label against all other input labels,

      diversity — the diversity property value.

Diversity

Type
number
Default
-1
Required
no

Determines how aggressive the filtering of repetitive labels is.

The larger the value of this property, the stronger the filtering. With very large diversity values, the filtering may leave just one label.

See the filtering algorithm for the technical details.

limit

Type
limit
Default
unlimited
Required
no

The maximum number of labels to output.

Note that for large diversity values, the number of output labels may be smaller than the limit.

label​List​Filter:​switch

Enables or disables the label list filter you provide.

{
  "type": "labelListFilter:switch",
  "enabled": true,
  "labelListFilter": null
}

You can use this filter to dynamically activate or deactivate any label list filter based on the value of a boolean variable passed to the enabled property. Without the label​List​Filter:​switch component, the only way to deactivate a filter would be to remove the filter from the request.

The following request shows how to control two label filters with two boolean variables. Achieving this requires three elements:

  1. Defining boolean variables, called remove​Incomplete​Phrases and remove​Redundant​Phrases in our request.

  2. Wrapping the label list filters to control with a label​List​Filter:​switch filter.

  3. Referencing the boolean variables in the enabled properties of the label​Filter:​switch filters.

{
  "name": "Enabling / disabling a label list filter using a variable.",
  "variables": {
    "removeIncompletePhrases": {
      "name": "Remove incomplete phrases",
      "value": true
    },
    "removeRedundantPhrases": {
      "name": "Remove redundant phrases",
      "value": true
    }
  },
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "clustering"
      },
      "limit": 1000
    },
    "labels": {
      "type": "labels:filtered",
      "labels": {
        "type": "labels:fromDocuments"
      },
      "labelListFilter": {
        "type": "labelListFilter:composite",
        "labelListFilters": [
          {
            "type": "labelListFilter:switch",
            "enabled": {
              "@var": "removeIncompletePhrases"
            },
            "labelListFilter": {
              "type": "labelListFilter:truncatedPhrases"
            }
          },
          {
            "type": "labelListFilter:switch",
            "enabled": {
              "@var": "removeRedundantPhrases"
            },
            "labelListFilter": {
              "type": "labelListFilter:diversified",
              "diversity": -3
            }
          }
        ]
      }
    }
  }
}

The above request exposes two variable to enable or disable the label​List​Filter:​truncated​Phrases and label​List​Filter:​diversified filters. Try changing the values of the variables and observe how the label list changes.

With the label​List​Filter:​switch you can dynamically control label list filters without changing the structure of the request. Additionally, boolean variables get represented as check boxes in the request variable editor.

enabled

Type
boolean
Default
true
Required
no

Enables or disables the label filter you provide.

If true, applies the label​Filter to the input labels. If false, returns the input labels without any filtering.

label​List​Filter

Type
labelListFilter
Default
null
Required
yes

The label list filter to control.

label​List​Filter:​truncated​Phrases

Rejects truncated labels, that is labels that are word-wise prefixes or suffixes of longer labels of exactly the same frequency.

{
  "type": "labelListFilter:truncatedPhrases"
}

Consumers of label​List​Filter:​*

The following stages and components take label​List​Filter:​* as input:

Stage or component Property
label​Clusters:​document​Cluster​Labels
  • label​List​Filter
  • label​Collector:​all​From​Feature​Fields
  • label​List​Filter
  • label​Collector:​top​Embedding​Nearest​Neighbors
  • label​List​Filter
  • label​Collector:​top​From​Feature​Fields
  • label​List​Filter
  • label​List​Filter:​composite
  • label​List​Filters
  • label​List​Filter:​switch
  • label​List​Filter
  • labels:​filtered
  • label​List​Filter
  • labels:​from​Text
  • label​List​Filter