vector
The vector:​*
stages retrieve multidimensional vectors based on various criteria. You can use these
vectors to search for semantically-similar
labels
and
documents.
You can use the following vector stages in your analysis requests:
-
vector:​composite
-
Computes a composite of the vectors you provide.
-
vector:​direct
-
Returns directly provided vector data.
-
vector:​document​Embedding
-
Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.
-
vector:​estimate​From​Context
-
Estimates an embedding vector based on the embedding vector of the surrounding context.
-
vector:​from​Embedding​Service
-
Returns an embedding vector acquired from an external embedding service.
-
vector:​from​Vector​Field
-
Returns a vector from an explicit field (requires that an external vector field is added to documents during indexing.
-
vector:​label​Embedding
-
Returns a vector with weights corresponding to the composition of embedding vectors for the set of labels you provide.
vector:​reference
-
References the results of another
vector:​*
stage defined in the request.
The JSON output of the vector stage has the following structure:
{
"values": [
// a list of N floating point numbers: vector weights.
]
}
vector:​composite
Creates a composite vector from the vectors you provide by summing up their individual components.
{
"type": "vector:composite",
"vectors": []
}
vectors
The input vectors to compose. All input vectors must have the same size.
vector:​direct
Returns directly provided vector data.
{
"type": "vector:direct",
"vector": null
}
vector
Contains an array of numbers with vector component data.
vector:​document​Embedding
Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.
{
"type": "vector:documentEmbedding",
"documents": {
"type": "documents:reference",
"auto": true
},
"failIfEmbeddingsNotAvailable": true
}
The composite vector is a document score-weighted sum of individual embedding vectors.
Unless failIfEmbeddingsNotAvailable
is set to
false
, this stage will require document embeddings to be
present in the index.
Consider the following request, which returns the embedding vector for the first document matching the photon query and then computes three most similar documents to that embedding vector.
Shown below, is the embedding vector part of the response:
And this is the list of similar documents retrieved for the vector above:
documents
One or more input documents for which the embedding vector should be returned.
fail​If​Embeddings​Not​Available
Determines the behavior of this stage if the index does not contain document embeddings.
If the index does not contain document embeddings and fail​If​Embeddings​Not​Available
is:
true
- this stage fails and logs an error.
false
- this stage returns an empty set of document embeddings.
vector:​estimate​From​Context
Estimates an embedding vector based on the embedding vector of the surrounding context.
{
"type": "vector:estimateFromContext",
"contextVector": {
"type": "vector:reference",
"auto": true
},
"failIfEmbeddingsNotAvailable": true
}
You can use this stage to estimate the embedding of a phrase, passage, or a paragraph, based on the embeddings of the surrounding context.
A typical use case for this stage is retrieving documents similar to a paragraph of text:
The labels​From​Text
stage extracts labels from the input text. The similar​Documents
stage
retrieves documents based on the embedding vector similarity. Notice how the
vector:​estimate​From​Context
stage wraps the
vector:​label​Embedding
stage, which computes the average
embedding vector of the labels Lingo4G extracted from the input text.
The vector:​estimate​From​Context
stage improves the match between the document embedding vectors
Lingo4G stores in the index and the embedding vector we compute for the input paragraph. If you remove the
vector:​estimate​From​Context
stage and pass the vector:​label​Embedding
stage result
directly to the document search stage, you will get similar but worse-matching results.
context​Vector
The vector of the surrounding context to use for estimation.
To compute the context vector for a set of labels, use the
vector:​label​Embedding
stage applied to the list of labels
comprising the surrounding context.
fail​If​Embeddings​Not​Available
Determines the behavior of this stage if the index does not contain label embeddings.
If the index does not contain label embeddings and fail​If​Embeddings​Not​Available
is:
true
- this stage fails and logs an error.
false
- this stage returns an empty set of label embeddings.
vector:​from​Embedding​Service
Returns an embedding vector acquired from an external
embedding​Service:​*
component. This can be used to
retrieve embeddings from the same large language models as those used during
indexing.
{
"type": "vector:fromEmbeddingService",
"embeddingService": {
"type": "embeddingService:reference",
"auto": true
},
"text": null
}
embedding​Service
A reference to an embedding service component. Embedding service components may only be declared in the project descriptor's shared components section.
text
A non-empty snippet of text for which the embedding vector should be returned.
vector:​from​Vector​Field
Retrieves vector data from an explicit document field. The data must be provided during document indexing, most likely computed from an external source (like an external vector model).
{
"type": "vector:fromVectorField",
"documents": {
"type": "documents:reference",
"auto": true
},
"fieldName": null
}
documents
One or more documents containing a field with vector data. If more than one document is provided by the selector, vectors are averaged and a single result is returned.
field​Name
Document field containing external vector data added during indexing.
vector:​label​Embedding
Returns a vector with weights corresponding to the composition of embedding vectors for the label set you provide.
{
"type": "vector:labelEmbedding",
"failIfEmbeddingsNotAvailable": true,
"labels": {
"type": "labels:reference",
"auto": true
}
}
The composite vector is a weighted sum of individual label embedding vectors (weighted by each label's weight).
Unless failIfEmbeddingsNotAvailable
is set to
false
, this stage will require label embeddings to be
present in the index.
Consider the following request, which returns the embedding vector for the specific label oil and then computes three most similar labels based on oil's embedding vector.
Shown below, is the embedding vector part of the response:
And this is the list of similar labels retrieved for the vector above:
fail​If​Embeddings​Not​Available
Determines the behavior of this stage if the index does not contain label embeddings.
If the index does not contain label embeddings and fail​If​Embeddings​Not​Available
is:
true
- this stage fails and logs an error.
false
- this stage returns an empty set of label embeddings.
labels
One or more input labels for which the embedding vector should be returned.
vector:​*
Consumers of
The following stages and components take vector:​*
as
input: