labels
The labels:​*
stages group various ways of producing lists of labels. You can display the labels
directly or feed them as input to other stages, such as
similarity matrix computation
and subsequently clustering and 2d embedding.
You can use the following labels stages in your analysis requests:
-
labels:​by​Prefix
-
Returns labels with a string prefix you provide.
-
labels:​composite
-
Returns a union or intersection of label lists you provide, aggregating their weights according to the provided criteria.
-
labels:​direct
-
Returns a list of labels whose text you provide directly.
-
labels:​embedding​Nearest​Neighbors
-
Selects labels that are most similar to the multidimensional vector you provide.
-
labels:​filtered
-
Applies the label filters of your choice to the list of labels you provide.
-
labels:​from​Documents
-
Collects labels occurring in the documents you provide.
-
labels:​from​Text
-
Extracts labels from the raw text you provide.
-
labels:​scored
-
Computes new weights for the labels you provide using the label scorer of your choice.
labels:​reference
-
References the results of another
labels:​*
stage defined in the request.
labels:​by​Prefix
Returns labels containing at least one term starting with the string prefix you provide.
{
"type": "labels:byPrefix",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"limit": 30,
"prefix": ""
}
This stage can be used to return a list of suggestions for labels present in a set of provided fields. Typically, the returned result will contain labels where the first word starts with the provided prefix. It is possible for the suggestion engine to return results where the prefixed word is in the middle of the label.
For example, here is a request fetching the ten labels present in the title field and containing the prefix pha:
The result of the above request, on the reference Arxiv index:
fields
One or more sources of labels (a featureField:* component).
label​Filter
An optional labelFilter:* component used to filter out undesired labels.
limit
The maximum number of labels to return.
prefix
Case-insensitive prefix of at least one word contained in the label. The suggestion engine will favor labels starting with this prefix but may also return labels where the prefix is in the middle of the label.
labels:​composite
Returns a union or intersection of label lists you provide, aggregating their weights according to the provided criteria.
{
"type": "labels:composite",
"operator": "OR",
"sortOrder": "DESCENDING",
"sources": [],
"weightAggregation": "SUM"
}
operator
Declares the way labels from sources
are combined. The operator
property supports the following values:
O​R
-
Produces the union of all unique labels from all sources.
A​N​D
-
Produces the intersection of all labels from all sources. A label must appear in all sources to appear in the output.
sort​Order
Controls the sort order for the output list of labels. Labels are sorted by their weight after aggregation, the sort order can be ascending or descending.
See
sort​Order
in the documentation of common types for the list of possible values.
sources
A source list of other labels:​*
components.
weight​Aggregation
Controls how label weights are aggregated for labels that exist in more than one source (or more than one time within a single source).
See
weight​Aggregation
in the documentation of common types for the list of possible values.
labels:​direct
Returns a list of labels whose text you provide directly.
{
"type": "labels:direct",
"labels": []
}
labels
An array of labels and their optional weights, for example:
labels:​embedding​Nearest​Neighbors
Selects labels that are most similar to the multidimensional embedding vector you provide.
{
"type": "labels:embeddingNearestNeighbors",
"failIfEmbeddingsNotAvailable": true,
"labelFilter": {
"type": "labelFilter:acceptAll"
},
"limit": 10,
"vector": {
"type": "vector:reference",
"auto": true
}
}
This stage requires label embeddings to be present in the index.
This example request searches for the closest embedding-space neighbors of an explicit label (synonymous or related labels):
The result of the above request, on the reference Arxiv index:
fail​If​Embeddings​Not​Available
Determines the behavior of this stage if the index does not contain label embeddings.
If the index does not contain label embeddings and fail​If​Embeddings​Not​Available
is:
true
- this stage fails and logs an error.
false
- this stage returns an empty set of label embeddings.
label​Filter
An optional labelFilter:* component used to filter out undesired labels.
limit
The maximum number of labels to return.
vector
The source vector for which neighboring labels should be returned.
labels:​filtered
Applies the label filters you provide to the provided list of labels.
{
"type": "labels:filtered",
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:acceptAll"
},
"labels": {
"type": "labels:reference",
"auto": true
}
}
In combination with the
label​Filter:​accept​Labels
and
label​Filter:​reject​Labels
filters, you can use this stage to compare two lists of labels.
The following request uses two different methods to extract labels from the same set of documents. The request
uses the labels:​filtered
stage to compare the methods by taking the intersection and the differences
between label lists produced by the two methods.
label​Filter
The label filter to apply to the labels.
label​List​Filter
The label list filter to apply to the labels.
labels
The labels to filter.
labels:​from​Documents
Collects and aggregates labels occurring in the documents you provide, using the selected label aggregator and label count limits.
{
"type": "labels:fromDocuments",
"documents": {
"type": "documents:reference",
"auto": true
},
"labelAggregator": {
"type": "labelAggregator:topWeight",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"failIfEmbeddingsNotAvailable": true,
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"labelWeighting": "EMBEDDING",
"minWeight": 0,
"minWeightMass": 1,
"tieResolution": "AUTO"
},
"maxLabelsPerDocument": 10,
"maxRelativeDf": 1,
"minAbsoluteDf": 1,
"minRelativeDf": 0,
"minWeight": 0,
"outputWeightFormula": "TF",
"threads": "auto",
"tieResolution": "AUTO"
},
"maxLabels": {
"type": "labelCount:fixed",
"value": 10000
}
}
documents
The reference to the source list of
documents:​*
from which labels should be retrieved. The
label aggregator
property specifies which fields are used as label sources and how labels from these fields should be aggregated.
label​Aggregator
The label aggregator used to aggregate labels from input documents into the final list.
max​Labels
The maximum number of labels to be returned after aggregation.
labels:​from​Text
Extracts labels from the raw text you provide.
{
"type": "labels:fromText",
"analyzer": "english",
"featureExtractor": "",
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"text": ""
}
This stage can be used to retrieve labels that would be produced by the referenced feature extractor from a snippet of text, if it were indexed as a document. For example:
The result of the above request, on the reference Arxiv index:
analyzer
The analyzer pipeline to use when splitting the input text into words.
feature​Extractor
The feature extractor to use.
label​Filter
An optional labelFilter:* component used to filter out undesired labels.
text
The text to extract labels from.
labels:​scored
Computes new weights for the labels you provide using the label scorer of your choice.
{
"type": "labels:scored",
"labels": {
"type": "labels:reference",
"auto": true
},
"scorer": {
"type": "labelScorer:identity"
}
}
This stage can be used to recompute the weights of labels retrieved from one source, with statistics coming from another source. In this example request, we compute the document frequency of labels occurring in documents matching the query photon with occurrence statistics from a set of documents matching the query solar power.
labels
The source of labels to recompute weights for.
scorer
A labelScorer:* component used to recompute weights of the source labels.
labels:​*
Consumers of
The following stages and components take labels:​*
as
input: