vector

The vector:​* stages retrieve multidimensional vectors based on various criteria. You can use these vectors to search for semantically-similar labels and documents.

You can use the following vector stages in your analysis requests:

vector:​composite

Computes a composite of the vectors you provide.

vector:​direct

Returns directly provided vector data.

vector:​document​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.

vector:​estimate​From​Context

Estimates an embedding vector based on the embedding vector of the surrounding context.

vector:​from​Embedding​Service

Returns an embedding vector acquired from an external embedding service.

vector:​from​Vector​Field

Returns a vector from an explicit field (requires that an external vector field is added to documents during indexing.

vector:​label​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the set of labels you provide.


vector:​reference

References the results of another vector:​* stage defined in the request.


The JSON output of the vector stage has the following structure:

{
  "values": [
    // a list of N floating point numbers: vector weights.
  ]
}

vector:​composite

Creates a composite vector from the vectors you provide by summing up their individual components.

{
  "type": "vector:composite",
  "vectors": []
}

vectors

Type
array of vector
Default
[]
Required
no

The input vectors to compose. All input vectors must have the same size.

vector:​direct

Returns directly provided vector data.

{
  "type": "vector:direct",
  "vector": null
}

vector

Type
array of number
Default
null
Required
yes

Contains an array of numbers with vector component data.

vector:​document​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.

{
  "type": "vector:documentEmbedding",
  "documents": {
    "type": "documents:reference",
    "auto": true
  },
  "failIfEmbeddingsNotAvailable": true
}

The composite vector is a document score-weighted sum of individual embedding vectors.

Unless failIfEmbeddingsNotAvailable is set to false, this stage will require document embeddings to be present in the index.

Consider the following request, which returns the embedding vector for the first document matching the photon query and then computes three most similar documents to that embedding vector.

{
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "photon"
      },
      "limit": 1
    },
    "documentEmbedding": {
      "type": "vector:documentEmbedding",
      "documents": {
        "type": "documents:reference",
        "use": "documents"
      },
      "failIfEmbeddingsNotAvailable": true
    },
    "similarDocuments": {
      "type": "documents:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:reference",
        "use": "documentEmbedding"
      },
      "limit": 3
    }
  }
}

Retrieving a document embedding vector and using it to find similar documents.

Shown below, is the embedding vector part of the response:

"documentEmbedding": {
  "values": [
    -0.57422113,
    -0.068582624,
    0.20096491,
    -0.31315422,
    1.2161951,
    -1.2248037,
    -1.2113761,
    -0.2941759,
    0.48545796,
    -0.7712026,
    -0.22011408,
    -0.608371,
    0.90569186,
    -0.84938866,
    0.21414198,
    0.6553282,
    -0.5452179,
    0.35045803,
    0.3903727,
    1.0092365,
    -0.20052889,
    0.4890626,
    0.7744073,
    0.5591404,
    0.32964814,
    0.60655844,
    1.0742085,
    0.06991704,
    -0.71906906,
    0.69443387,
    -0.118437186,
    -0.7543861,
    0.33848658,
    -0.9806077,
    0.43837917,
    -1.358122,
    0.9023791,
    -0.77267313,
    0.40730473,
    0.7971989,
    0.48312485,
    0.79833794,
    -0.22340186,
    -0.8068244,
    -0.7050206,
    0.99506307,
    0.34527537,
    0.5570518,
    -0.11054978,
    -0.046042304,
    0.21246919,
    -0.2658922,
    -0.4778983,
    -0.3066737,
    -1.660988,
    -0.89577734,
    0.5269679,
    -0.4207494,
    0.92221236,
    -0.4347983,
    0.8746356,
    -0.90795076,
    -0.1511306,
    -0.31478116,
    0.7130491,
    -0.40270978,
    0.24806926,
    -0.17758742,
    -0.19879192,
    0.14168695,
    1.5389348,
    0.27244258,
    -0.07646382,
    0.13408655,
    0.20799705,
    0.03196173,
    1.2283746,
    0.51575106,
    0.9134348,
    1.7040988,
    0.006191571,
    -0.50861776,
    -0.6366052,
    0.46974674,
    -0.21639216,
    0.13149987,
    0.0041523827,
    -0.28997338,
    -0.09312176,
    -0.52090764,
    0.15886594,
    0.8034965,
    0.2687663,
    -0.34138128,
    0.5532988,
    -1.5849437
  ]
}

And this is the list of similar documents retrieved for the vector above:

"similarDocuments": {
  "documents": [
    {
      "id": 482237,
      "weight": 0.9999999
    },
    {
      "id": 120499,
      "weight": 0.9680746
    },
    {
      "id": 451357,
      "weight": 0.9576727
    }
  ]
}

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

One or more input documents for which the embedding vector should be returned.

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain document embeddings.

If the index does not contain document embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
this stage returns an empty set of document embeddings.

vector:​estimate​From​Context

Estimates an embedding vector based on the embedding vector of the surrounding context.

{
  "type": "vector:estimateFromContext",
  "contextVector": {
    "type": "vector:reference",
    "auto": true
  },
  "failIfEmbeddingsNotAvailable": true
}

You can use this stage to estimate the embedding of a phrase, passage, or a paragraph, based on the embeddings of the surrounding context.

A typical use case for this stage is retrieving documents similar to a paragraph of text:

{
  "stages": {
    "labelsFromText": {
      "type": "labels:fromText",
      "text": "This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available."
    },
    "similarDocuments": {
      "type": "documents:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:estimateFromContext",
        "contextVector": {
          "type": "vector:labelEmbedding",
          "labels": {
            "type": "labels:reference",
            "use": "labelsFromText"
          }
        }
      }
    },
    "similarDocumentsContent": {
      "type": "documentContent"
    }
  }
}

The labels​From​Text stage extracts labels from the input text. The similar​Documents stage retrieves documents based on the embedding vector similarity. Notice how the vector:​estimate​From​Context stage wraps the vector:​label​Embedding stage, which computes the average embedding vector of the labels Lingo4G extracted from the input text.

The vector:​estimate​From​Context stage improves the match between the document embedding vectors Lingo4G stores in the index and the embedding vector we compute for the input paragraph. If you remove the vector:​estimate​From​Context stage and pass the vector:​label​Embedding stage result directly to the document search stage, you will get similar but worse-matching results.

context​Vector

Type
vector
Default
{
  "type": "vector:reference",
  "auto": true
}
Required
no

The vector of the surrounding context to use for estimation.

To compute the context vector for a set of labels, use the vector:​label​Embedding stage applied to the list of labels comprising the surrounding context.

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain label embeddings.

If the index does not contain label embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
this stage returns an empty set of label embeddings.

vector:​from​Embedding​Service

Returns an embedding vector acquired from an external embedding​Service:​* component. This can be used to retrieve embeddings from the same large language models as those used during indexing.

{
  "type": "vector:fromEmbeddingService",
  "embeddingService": {
    "type": "embeddingService:reference",
    "auto": true
  },
  "text": null
}

embedding​Service

Type
embeddingService
Default
{
  "type": "embeddingService:reference",
  "auto": true
}
Required
no

A reference to an embedding service component. Embedding service components may only be declared in the project descriptor's shared components section.

text

Type
string
Default
null
Required
yes

A non-empty snippet of text for which the embedding vector should be returned.

vector:​from​Vector​Field

Retrieves vector data from an explicit document field. The data must be provided during document indexing, most likely computed from an external source (like an external vector model).

{
  "type": "vector:fromVectorField",
  "documents": {
    "type": "documents:reference",
    "auto": true
  },
  "fieldName": null
}

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

One or more documents containing a field with vector data. If more than one document is provided by the selector, vectors are averaged and a single result is returned.

field​Name

Type
project:vectorFields
Default
null
Required
yes

Document field containing external vector data added during indexing.

vector:​label​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the label set you provide.

{
  "type": "vector:labelEmbedding",
  "failIfEmbeddingsNotAvailable": true,
  "labels": {
    "type": "labels:reference",
    "auto": true
  }
}

The composite vector is a weighted sum of individual label embedding vectors (weighted by each label's weight).

Unless failIfEmbeddingsNotAvailable is set to false, this stage will require label embeddings to be present in the index.

Consider the following request, which returns the embedding vector for the specific label oil and then computes three most similar labels based on oil's embedding vector.

{
  "stages": {
    "labels": {
      "type": "labels:direct",
      "labels": [
        {
          "label": "oil",
          "weight": 1
        }
      ]
    },
    "labelEmbedding": {
      "type": "vector:labelEmbedding",
      "labels": {
        "type": "labels:reference",
        "use": "labels"
      },
      "failIfEmbeddingsNotAvailable": true
    },
    "similarLabels": {
      "type": "labels:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:reference",
        "use": "labelEmbedding"
      },
      "limit": 5
    }
  }
}

Retrieving a label embedding vector and using it to find similar labels.

Shown below, is the embedding vector part of the response:

"labelEmbedding": {
  "values": [
    -0.12831993,
    0.01152784,
    0.1956976,
    -0.15225223,
    0.060247205,
    0.07064087,
    -0.016338939,
    -0.12616406,
    0.012088387,
    -0.24430434,
    -0.14441729,
    0.06686689,
    -0.107107274,
    -0.044391025,
    -0.046843033,
    0.21312226,
    0.105229,
    -0.079795375,
    -0.056593914,
    -0.032695055,
    0.034506015,
    0.036390483,
    0.07200203,
    0.018575044,
    -0.08143535,
    -0.0713344,
    -0.10609374,
    -0.02981584,
    -0.08283925,
    0.25963604,
    0.09551317,
    -0.014513606,
    -0.016406003,
    -0.14880298,
    0.047252104,
    -0.15333739,
    0.049600437,
    0.10121724,
    -0.058828235,
    0.065453626,
    -0.06516631,
    -0.124355644,
    -0.02075785,
    -0.060544737,
    0.101969235,
    -0.12816632,
    -0.031221058,
    0.034108836,
    -0.013031488,
    0.117349006,
    -0.12670065,
    -0.03494617,
    -0.053229712,
    0.03090798,
    0.04805559,
    0.077390105,
    -0.2854027,
    -0.1219978,
    -0.004126611,
    -0.0039055229,
    -0.010549579,
    -0.1936508,
    0.06262723,
    0.05931132,
    -0.025006529,
    -0.17581728,
    0.017531814,
    0.048754454,
    0.1519659,
    0.06530251,
    0.0074464735,
    0.012522171,
    -0.07628108,
    0.1371478,
    0.041159574,
    -0.023005126,
    -0.072386354,
    -0.028989486,
    -0.045731183,
    -0.086477764,
    0.15788856,
    -0.025388826,
    0.029495971,
    0.0889432,
    0.17877185,
    0.037153076,
    0.05789343,
    -0.10531316,
    -0.13218993,
    -0.030996056,
    -0.29946643,
    -0.024343388,
    0.026959129,
    0.015948545,
    0.104478784,
    -0.02320289
  ]
}

And this is the list of similar labels retrieved for the vector above:

"similarLabels": {
  "labels": [
    {
      "label": "oil",
      "weight": 1
    },
    {
      "label": "spill",
      "weight": 0.86921465
    },
    {
      "label": "aquifer",
      "weight": 0.83875275
    },
    {
      "label": "cement",
      "weight": 0.8105788
    },
    {
      "label": "soil",
      "weight": 0.8037261
    }
  ]
}

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain label embeddings.

If the index does not contain label embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
this stage returns an empty set of label embeddings.

labels

Type
labels
Default
{
  "type": "labels:reference",
  "auto": true
}
Required
no

One or more input labels for which the embedding vector should be returned.

Consumers of vector:​*

The following stages and components take vector:​* as input:

Stage or component Property
documents:​embedding​Nearest​Neighbors
  • vector
  • documents:​vector​Field​Nearest​Neighbors
  • vector
  • labels:​embedding​Nearest​Neighbors
  • vector
  • vector:​composite
  • vectors
  • vector:​estimate​From​Context
  • context​Vector