Label filtering

Label filtering lets you shape the labels you retrieve by excluding labels based on various criteria, such as the number of words.

Most label retrieval stages offer an option to filter the label lists they produce. If a label retrieval stage supports filtering, it usually exposes the label​Filter property, in which you can provide a label​Filter component.

For example, the following request retrieves labels similar to the word photon but not containing that exact word:

{
  "stages": {
    "similarLabels": {
      "type": "labels:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:labelEmbedding",
        "labels": {
          "type": "labels:direct",
          "labels": [
            {
              "label": "photon"
            }
          ]
        }
      },
      "labelFilter": {
        "type": "labelFilter:dictionary",
        "exclude": [
          {
            "type": "dictionary:regex",
            "entries": [
              ".*photon.*"
            ]
          }
        ]
      }
    }
  }
}

Label list filters

You can use the label​Filter:​accept​Labels and label​Filter:​reject​Labels to apply filtering based on a closed explicit list of labels. You can use any labels stage to provide the closed list of labels to filter by.

This kind of filtering is useful, for example, when you would like to limit the analysis to a set of labels appearing in a specific set of documents. The following example returns labels matching the photo prefix, but the search is limited to the labels appearing in the documents defined by the set:math query.

{
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "set:math"
      },
      "limit": 20000
    },
    "documentLabels": {
      "type": "labels:fromDocuments",
      "maxLabels": {
        "type": "labelCount:progressive",
        "min": 10000
      }
    },
    "prefixLabels":{
      "type": "labels:byPrefix",
      "prefix": "elec",
      "limit": 100,
      "labelFilter": {
        "type": "labelFilter:acceptLabels",
        "labels": {
          "type": "labels:reference",
          "use": "documentLabels"
        }
      }
    }
  },
  "output": {
    "stages": [
      "prefixLabels"
    ]
  }
}

The request starts with selecting documents matching the set:math and extracting a sizeable set of labels from those documents. Then, the request searches for labels starting with the elec prefix, but limits the results to the labels appearing in the document set.