vectors

The vectors:​* stages return sets of multidimensional embedding vectors which you can then use to compute vector-based similarity matrices.

The matrix:​knn​Vectors​Similarity stage is the most likely consumer of the results computed by the vectors:​* stages. To compute vector-based similarities between, for example, a set of documents matching a query, your request first needs to use vectors:​precomputed​Document​Embeddings to prepare a subset of embedding vectors corresponding to your documents and then submit the subset of vectors to the matrix computation stage.


You can use the following vectors source stages in your analysis requests:

vectors:​precomputed​Document​Embeddings

Returns the precomputed document embedding vectors, narrowed down to the list of documents you provide.

vectors:​precomputed​Label​Embeddings

Returns the precomputed label embedding vectors, narrowed down to the list of labels you provide.


vectors:​reference

References the results of another vectors:​* stage defined in the request.


To avoid large responses, the JSON output of the vectors stage does not include the actual vectors, but only the total number of vectors and a list of undefined vectors.

vectors:​precomputed​Document​Embeddings

Returns the precomputed document embedding vectors, narrowed down to the list of documents you provide.

{
  "type": "vectors:precomputedDocumentEmbeddings",
  "documents": {
    "type": "documents:reference",
    "auto": true
  },
  "maxDocumentsForSubIndex": 0.05
}

You can use this stage to compute vector-based similarities between a set of documents:

{
  "name": "Computing vector-based similarities between documents matching a query",
  "stages": {
    "similarities": {
      "type": "matrix:knnVectorsSimilarity",
      "vectors": {
        "type": "vectors:precomputedDocumentEmbeddings",
        "documents": {
          "type": "documents:byQuery",
          "query": {
            "type": "query:string",
            "query": "clustering"
          }
        }
      }
    }
  }
}

Using vectors:​precomputed​Document​Embeddings to build a similarity matrix between document matching the clustering query.

The request combines the vectors:​precomputed​Document​Embeddings stage with documents:​by​Query to build a subset of document embeddings. Then, the matrix:​knn​Vectors​Similarity stage uses the subset of vectors to compute the similarity matrix.

documents

Type
documents
Default
{
  "type": "documents:reference",
  "auto": true
}
Required
no

The set of documents to which to narrow down the set of document embedding vectors.

max​Documents​For​Sub​Index

Type
number
Default
0.05
Constraints
value >= 0 and value <= 1
Required
no

Determines the threshold for creating a temporary kNN index.

Lingo4G can significantly speed up the computation of vector-based similarities for a small subset of documents by creating and querying a temporary kNN index containing just the vectors corresponding to the input documents. Lingo4G creates the temporary index only when the number of input documents divided by the total number of documents in the index is smaller or equal to the value of this property.

For example, if max​Documents​For​Sub​Index is 0.3, if the input document set contains fewer than 30% of all documents in the index, Lingo4G creates a temporary index to speed up the computation of similarities. We don't recommend setting this property to 0.0 or 1.0 in production.

vectors:​precomputed​Label​Embeddings

Returns the precomputed label embedding vectors, narrowed down to the list of labels you provide.

{
  "type": "vectors:precomputedLabelEmbeddings",
  "labels": {
    "type": "labels:reference",
    "auto": true
  },
  "maxLabelsForSubIndex": 0.05
}

You can use this stage to compute vector-based similarities between a set of documents:

{
  "name": "Computing vector-based similarities between labels",
  "stages": {
    "similarities": {
      "type": "matrix:knnVectorsSimilarity",
      "vectors": {
        "type": "vectors:precomputedLabelEmbeddings",
        "labels": {
          "type": "labels:fromDocuments",
          "documents": {
            "type": "documents:byQuery",
            "query": {
              "type": "query:string",
              "query": "clustering"
            }
          },
          "maxLabels": {
            "type": "labelCount:fixed",
            "value": 200
          }
        }
      }
    }
  }
}

Using vectors:​precomputed​Label​Embeddings to build a similarity matrix between labels related to clustering.

The request combines the vectors:​precomputed​Label​Embeddings stage with labels:​from​Documents to build a subset of label embeddings for labels related to the query clustering. Then, the matrix:​knn​Vectors​Similarity stage uses the subset of vectors to compute the similarity matrix.

labels

Type
labels
Default
{
  "type": "labels:reference",
  "auto": true
}
Required
no

The list of labels to which to narrow down the set of embedding vectors.

max​Labels​For​Sub​Index

Type
number
Default
0.05
Constraints
value >= 0 and value <= 1
Required
no

Determines the threshold for creating a temporary kNN index.

Lingo4G can significantly speed up the computation of vector-based similarities for a small subset of labels by creating and querying a temporary kNN index containing just the vectors corresponding to the input labels. Lingo4G creates the temporary index only when the number of input labels divided by the total number of labels in the index is smaller or equal to the value of this property.

For example, if max​Labels​For​Sub​Index is 0.3, if the input labels list contains fewer than 30% of all labels in the index, Lingo4G creates a temporary index to speed up the computation of similarities. We don't recommend setting this property to 0.0 or 1.0 in production.

Consumers of vectors:​*

The following stages and components take vectors:​* as input:

Stage or component Property
matrix:​knn​Vectors​Similarity
  • vectors
  • matrix​Rows:​knn​Vectors​Similarity
  • rows
  • columns