documentNeighbors
document​Neighbors:​*
components compute documents that are similar to one input document. You can use
the document neighborhood components to score documents based on complex criteria, such as results of per-document
queries combined with embedding vector similarity filtering.
You can use the following document​Neighbors:​*
in your analysis requests:
-
document​Neighbors:​filtered​By​Embedding​Vector​Similarity
-
Rejects neighbors whose similarity to the seed document is lower than the threshold.
-
document​Neighbors:​from​Query​Builder
-
Executes per-document search queries to compute the document's neighbors.
document​Neighbors:​reference
-
References a
document​Neighbors:​*
component defined in the request or in the project's default components.
document​Neighbors:​filtered​By​Embedding​Vector​Similarity
Rejects neighbors whose similarity to the seed document is lower than the threshold.
{
"type": "documentNeighbors:filteredByEmbeddingVectorSimilarity",
"documentNeighbors": {
"type": "documentNeighbors:reference",
"auto": true
},
"failIfEmbeddingsNotAvailable": true,
"ifVectorUndefined": "REJECT",
"minSimilarity": 0.7
}
You can use this component to filter the results produced by the provided document​Neighbors:​*
based on the document's embedding similarity to the seed document.
document​Neighbors
The document neighbors to filter.
fail​If​Embeddings​Not​Available
Determines the behavior of this stage if the index does not contain document embeddings.
If the index does not contain document embeddings and fail​If​Embeddings​Not​Available
is:
true
- this stage fails and logs an error.
false
-
performs filtering based on the value of the
if​Vector​Undefined
property.
If your request combines keyword- and embedding-based processing, you can set
fail​If​Embeddings​Not​Available
to false
to have Lingo4G degrade gently to keyword-based
processing if the index does not contain document embeddings.
if​Vector​Undefined
Determines what happens when an embedding vector is not available for the documents being filtered.
It may happen that document vectors are not available for specific documents being filtered or for the seed document.
The if​Vector​Undefined
determines the result of filtering in these cases:
A​C​C​E​P​T
-
Documents with undefined embedding vectors pass the filtering and are included in the resulting neighbor list.
R​E​J​E​C​T
-
Documents with undefined vectors don't pass the filtering and are excluded from the resulting neighbor list.
min​Similarity
The minimum embedding vector similarity to the seed document each filtered document must have to be included in the resulting list of neighbors.
document​Neighbors:​from​Query​Builder
Executes per-document search queries to compute the document's neighbors.
{
"type": "documentNeighbors:fromQueryBuilder",
"limit": 1000,
"queryBuilder": {
"type": "queryBuilder:reference",
"auto": true
}
}
The queries to execute come from the query​Builder
you provide. The query builder can build queries specific to each seed document, based on the values of the seed
document's fields.
limit
The maximum number of neighbors to produce for each seed document.
query​Builder
The query builder to provide the seed document-specific query to execute.
document​Neighbors:​*
Consumers of
The following stages and components take document​Neighbors:​*
as
input:
Stage or component | Property |
---|---|
document​Neighbors:​filtered​By​Embedding​Vector​Similarity | document​Neighbors |
document​Scorer:​by​Document​Neighbors | document​Neighbors |