vector

The vector:​* stages retrieve multidimensional vectors based on various criteria. You can use these vectors to search for semantically-similar labels and documents.

You can use the following vector stages in your analysis requests:

vector:​composite

Computes a composite of the vectors you provide.

vector:​document​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.

vector:​label​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the set of labels you provide.


vector:​reference

References the results of another vector:​* stage defined in the request.


The JSON output of the vector stage has the following structure:

{
  "values": [
    // a list of N floating point numbers: vector weights.
  ]
}

vector:​composite

Creates a composite vector from the vectors you provide by summing up their individual components.

{
  "type": "vector:composite",
  "vectors": []
}

vectors

Type
array of vector
Default
[]
Required
no

The input vectors to compose. All input vectors must have the same size.

vector:​document​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the document set you provide.

{
  "type": "vector:documentEmbedding",
  "documents": {
    "type": "documents:reference",
    "auto": true
  },
  "failIfEmbeddingsNotAvailable": true
}

The composite vector is a document score-weighted sum of individual embedding vectors.

Unless failIfEmbeddingsNotAvailable is set to false, this stage will require document embeddings to be present in the index.

Consider the following request, which returns the embedding vector for the first document matching the photon query and then computes three most similar documents to that embedding vector.

{
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "photon"
      },
      "limit": 1
    },
    "documentEmbedding": {
      "type": "vector:documentEmbedding",
      "documents": {
        "type": "documents:reference",
        "use": "documents"
      },
      "failIfEmbeddingsNotAvailable": true
    },
    "similarDocuments": {
      "type": "documents:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:reference",
        "use": "documentEmbedding"
      },
      "limit": 3
    }
  }
}

Retrieving a document embedding vector and using it to find similar documents.

Shown below, is the embedding vector part of the response:

"documentEmbedding": {
  "values": [
    -0.99103373,
    -0.61993295,
    0.4042985,
    -1.1614705,
    -0.52302164,
    0.6941693,
    0.27720472,
    -0.89477026,
    -0.054978706,
    -0.27918375,
    -0.21151417,
    0.6429526,
    0.69916975,
    -0.2415477,
    0.40307912,
    -0.78268766,
    0.17965193,
    -0.4961045,
    -0.31955218,
    0.52410066,
    1.3476398,
    -0.5696117,
    -0.1011139,
    0.36646006,
    -0.0368462,
    0.42020366,
    0.22188598,
    0.49885812,
    0.38663238,
    -0.9042661,
    -1.0002325,
    0.3590454,
    0.37262416,
    -0.16460687,
    0.050458044,
    0.6337493,
    0.0632156,
    0.29189587,
    -0.86135393,
    -0.0827778,
    -1.9721417,
    0.0946002,
    0.83832526,
    1.4163167,
    -0.6701371,
    0.9395537,
    -1.0429033,
    -0.6764844,
    -0.11551779,
    1.035179,
    0.4344454,
    0.5452566,
    0.47005108,
    -0.73416495,
    0.44311583,
    0.014774237,
    1.319659,
    -0.69617754,
    -0.33849755,
    -0.017449401,
    -0.0040018084,
    0.9872341,
    1.2851647,
    -0.13578914,
    -0.43303326,
    -0.69944334,
    0.98159957,
    0.19036204,
    -0.6602206,
    -0.9635043,
    -0.14981964,
    -1.1410289,
    0.29195544,
    -0.31277227,
    -0.09985383,
    0.6923386,
    -1.1024474,
    0.46135065,
    0.37878135,
    1.5208758,
    1.1232415,
    0.79433787,
    -0.12940468,
    0.17998278,
    0.108538546,
    -0.23473981,
    -1.07629,
    -0.15014234,
    0.4499113,
    1.218994,
    -0.2030959,
    -0.885567,
    -0.041270923,
    0.70257646,
    0.30352822,
    0.9245882
  ]
}

And this is the list of similar documents retrieved for the vector above:

"similarDocuments": {
  "documents": [
    {
      "id": 184486,
      "weight": 0.9531418
    },
    {
      "id": 314007,
      "weight": 0.95071065
    },
    {
      "id": 245260,
      "weight": 0.94943094
    }
  ]
}

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

One or more input documents for which the embedding vector should be returned.

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain document embeddings.

If the index does not contain document embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
this stage returns an empty set of document embeddings.

vector:​label​Embedding

Returns a vector with weights corresponding to the composition of embedding vectors for the label set you provide.

{
  "type": "vector:labelEmbedding",
  "failIfEmbeddingsNotAvailable": true,
  "labels": {
    "type": "labels:reference",
    "auto": true
  }
}

The composite vector is a weighted sum of individual label embedding vectors (weighted by each label's weight).

Unless failIfEmbeddingsNotAvailable is set to false, this stage will require label embeddings to be present in the index.

Consider the following request, which returns the embedding vector for the specific label oil and then computes three most similar labels based on oil's embedding vector.

{
  "stages": {
    "labels": {
      "type": "labels:direct",
      "labels": [
        {
          "label": "oil",
          "weight": 1
        }
      ]
    },
    "labelEmbedding": {
      "type": "vector:labelEmbedding",
      "labels": {
        "type": "labels:reference",
        "use": "labels"
      },
      "failIfEmbeddingsNotAvailable": true
    },
    "similarLabels": {
      "type": "labels:embeddingNearestNeighbors",
      "vector": {
        "type": "vector:reference",
        "use": "labelEmbedding"
      },
      "limit": 5
    }
  }
}

Retrieving a label embedding vector and using it to find similar labels.

Shown below, is the embedding vector part of the response:

"labelEmbedding": {
  "values": [
    -0.056218036,
    -0.016219355,
    -0.097506955,
    0.03419229,
    -0.14090592,
    0.012211461,
    -0.06729116,
    0.13687253,
    -0.12202141,
    0.05840475,
    0.10308239,
    -0.039139103,
    -0.066286676,
    -0.04229747,
    0.062129702,
    -0.0709934,
    -0.1390585,
    0.074060366,
    0.11323412,
    -0.053741228,
    -0.09797654,
    -0.00074452395,
    0.12603402,
    -0.11076641,
    -0.01178393,
    -0.1688074,
    0.030520564,
    -0.07905724,
    0.004189321,
    0.0035251991,
    -0.096744716,
    -0.070698775,
    -0.035969727,
    -0.053766362,
    0.06178642,
    0.21652527,
    -0.18458892,
    0.080813296,
    0.082455836,
    -0.07065346,
    0.085473984,
    -0.036417063,
    0.08940274,
    -0.11169317,
    -0.11553197,
    -0.014741239,
    0.028184947,
    0.027740661,
    -0.20679004,
    -0.063633166,
    -0.13676593,
    -0.077558,
    0.15755467,
    -0.045371544,
    0.047832448,
    -0.02211245,
    -0.020387521,
    -0.029218199,
    -0.0649806,
    -0.058419973,
    0.08937007,
    -0.084519245,
    -0.1271999,
    0.031995956,
    -0.08992219,
    -0.038587146,
    0.20926896,
    -0.122392155,
    -0.015908016,
    -0.18183255,
    -0.013469522,
    0.1941475,
    -0.09968846,
    0.104315154,
    0.115674205,
    0.022810362,
    0.14583807,
    0.2034118,
    -0.11636415,
    -0.3147494,
    0.04710036,
    -0.020315854,
    0.05568123,
    0.010184015,
    -0.18691273,
    -0.14869832,
    0.083398715,
    -0.12028383,
    -0.06248831,
    0.07073338,
    -0.095944166,
    0.11422471,
    -0.021139272,
    -0.04514896,
    0.07884139,
    -0.031363823
  ]
}

And this is the list of similar labels retrieved for the vector above:

"similarLabels": {
  "labels": [
    {
      "label": "oil",
      "weight": 1
    },
    {
      "label": "moisture",
      "weight": 0.789073
    },
    {
      "label": "saline",
      "weight": 0.77412647
    },
    {
      "label": "geothermal",
      "weight": 0.77246994
    },
    {
      "label": "relative humidity",
      "weight": 0.76045656
    }
  ]
}

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain label embeddings.

If the index does not contain label embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
this stage returns an empty set of label embeddings.

labels

Type
labels
Default
{
  "type": "labels:reference",
  "auto": true
}
Required
no

One or more input labels for which the embedding vector should be returned.

Consumers of vector:​*

The following stages and components take vector:​* as input:

Stage or component Property
documents:​embedding​Nearest​Neighbors
  • vector
  • labels:​embedding​Nearest​Neighbors
  • vector
  • vector:​composite
  • vectors