documentNeighbors

document​Neighbors:​* components compute documents that are similar to one input document. You can use the document neighborhood components to score documents based on complex criteria, such as results of per-document queries combined with embedding vector similarity filtering.

You can use the following document​Neighbors:​* in your analysis requests:

document​Neighbors:​filtered​By​Embedding​Vector​Similarity

Rejects neighbors whose similarity to the seed document is lower than the threshold.

document​Neighbors:​from​Query​Builder

Executes per-document search queries to compute the document's neighbors.


document​Neighbors:​reference

References a document​Neighbors:​* component defined in the request or in the project's default components.


document​Neighbors:​filtered​By​Embedding​Vector​Similarity

Rejects neighbors whose similarity to the seed document is lower than the threshold.

{
  "type": "documentNeighbors:filteredByEmbeddingVectorSimilarity",
  "documentNeighbors": {
    "type": "documentNeighbors:reference",
    "auto": true
  },
  "failIfEmbeddingsNotAvailable": true,
  "ifVectorUndefined": "REJECT",
  "minSimilarity": 0.7
}

You can use this component to filter the results produced by the provided document​Neighbors:​* based on the document's embedding similarity to the seed document.

document​Neighbors

Type
documentNeighbors
Default
{
  "type": "documentNeighbors:reference",
  "auto": true
}
Required
no

The document neighbors to filter.

fail​If​Embeddings​Not​Available

Type
boolean
Default
true
Required
no

Determines the behavior of this stage if the index does not contain document embeddings.

If the index does not contain document embeddings and fail​If​Embeddings​Not​Available is:

true
this stage fails and logs an error.
false
performs filtering based on the value of the if​Vector​Undefined property.

If your request combines keyword- and embedding-based processing, you can set fail​If​Embeddings​Not​Available to false to have Lingo4G degrade gently to keyword-based processing if the index does not contain document embeddings.

if​Vector​Undefined

Type
string
Default
"REJECT"
Constraints
one of [ACCEPT, REJECT]
Required
no

Determines what happens when an embedding vector is not available for the documents being filtered.

It may happen that document vectors are not available for specific documents being filtered or for the seed document.

The if​Vector​Undefined determines the result of filtering in these cases:

A​C​C​E​P​T

Documents with undefined embedding vectors pass the filtering and are included in the resulting neighbor list.

R​E​J​E​C​T

Documents with undefined vectors don't pass the filtering and are excluded from the resulting neighbor list.

min​Similarity

Type
number
Default
0.7
Constraints
value >= 0
Required
no

The minimum embedding vector similarity to the seed document each filtered document must have to be included in the resulting list of neighbors.

document​Neighbors:​from​Query​Builder

Executes per-document search queries to compute the document's neighbors.

{
  "type": "documentNeighbors:fromQueryBuilder",
  "limit": 1000,
  "queryBuilder": {
    "type": "queryBuilder:reference",
    "auto": true
  }
}

The queries to execute come from the query​Builder you provide. The query builder can build queries specific to each seed document, based on the values of the seed document's fields.

limit

Type
limit
Default
1000
Required
no

The maximum number of neighbors to produce for each seed document.

query​Builder

Type
queryBuilder
Default
{
  "type": "queryBuilder:reference",
  "auto": true
}
Required
no

The query builder to provide the seed document-specific query to execute.

Consumers of document​Neighbors:​*

The following stages and components take document​Neighbors:​* as input:

Stage or component Property
document​Neighbors:​filtered​By​Embedding​Vector​Similarity
  • document​Neighbors
  • document​Scorer:​by​Document​Neighbors
  • document​Neighbors