Polymorphism
Polymorphism is a powerful feature of analysis JSON that lets you set each component- or stage-typed property to contain any definition, as long as it matches the required type.
To explain stage and component polymorphism, let's circle back to the basic document search request we introduced at the start of this tutorial:
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 50
}
}
}
The query
property accepts any query component, that is a
component whose type starts with the query:
prefix. By replacing the string query with a different
query component, we can turn the basic document search into example-based similar document search.
Similar document search
Let's modify the simple document search request by replacing
query:​string
with query:​for​Labels
– a query that matches documents containing a
list of labels:
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:forLabels",
"labels": {
"type": "labels:fromDocuments",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 1
}
}
},
"limit": 50
}
}
}
The query:​for​Labels
component requires a list of labels in its labels
property, and,
again, Lingo4G accepts any labels stage there. Our request uses
labels:​from​Documents
to extract labels from the highest-scoring document matching the query
photon.
The updated request implements a simple, keyword-based approach to finding documents that are semantically similar to the "seed" document provided on input. (In Lucene, Solr and Elasticsearch world, searching for semantically-similar documents is often called More-Like-This querying.)
To make the request and its results easier to follow, let's promote inlined stages into named stages and add fetching of document content.
{
"stages": {
"seed": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "electron"
},
"limit": 1
},
"seedLabels": {
"type": "labels:fromDocuments",
"documents": {
"type": "documents:reference",
"use": "seed"
},
"maxLabels": {
"type": "labelCount:fixed",
"value": 20
},
"labelAggregator": {
"type": "labelAggregator:topWeight",
"minWeight": 1,
"tieResolution": "EXTEND",
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"tieResolution": "EXTEND"
}
}
},
"similar": {
"type": "documents:byQuery",
"query": {
"type": "query:forLabels",
"labels": {
"type": "labels:reference",
"use": "seedLabels"
}
},
"limit": 50
},
"seedContent": {
"type": "documentContent",
"limit": 10,
"documents": {
"type": "documents:reference",
"use": "seed"
}
},
"similarContent": {
"type": "documentContent",
"limit": 10,
"documents": {
"type": "documents:reference",
"use": "similar"
}
}
}
}
For a better overview of the request, execute it in the JSON Sandbox app and then switch to the diagram tab.


Keyword-based finding of similar documents, request diagram.
The data flow of the similar document search is the following:
-
First, the
seed
stage selects the "seed" document. We'll be then looking for documents that are semantically similar to that seed document.In our request, we use the
documents:​by​Query
stage limited to one result, but you can replace this stage with any other document selection stage, such as one that selects documents by an internal identifier. -
Then, the
seed​Labels
stage extracts labels that characterize the seed documents. -
Finally, the
similar
stage builds a query from seed labels and finds the semantically-similar documents. -
Additionally, the
seed​Content
andsimilar​Content
stages fetch the content, such as title and abstract, of the seed and similar documents.
You can switch to the labels list tab to view the seed labels and to the docs list to view the documents. Our request has two document content stages, use the Documents dropdown to choose the result to show.


Keyword-based finding of similar documents, similar document list.
Filtering similar documents
Let's extend our request with filtering of the similar document list, so that the list contains only documents
whose category
field is not astro-ph. This helps to illustrate the concept of component
composition, where you build a compound component from a number of components of the same type.
The similar
stage of the original request was the following:
"similar": {
"type": "documents:byQuery",
"query": {
"type": "query:forLabels",
"labels": {
"type": "labels:reference",
"use": "seedLabels"
}
},
"limit": 50
}
To apply filtering to the similar document search result, let's replace the query:​for​Labels
component
with query:​filter
:
"similar": {
"type": "documents:byQuery",
"query": {
"type": "query:filter",
"query": {
"type": "query:forLabels",
"labels": {
"type": "labels:reference",
"use": "seedLabels"
}
},
"filter": {
"type": "query:complement",
"query": {
"type": "query:string",
"query": "category:astro-ph"
}
}
},
"limit": 50
}
query:​filter
composes two queries you provide in the query
and
filter
properties. The former is the query for our unfiltered similar document list, the latter is
the additional query each document must also meet. Note that our filter query is itself a composite that "negates"
the provided string query.
Composition and polymorphism are powerful patterns you can use to build complex analysis requests. For more ideas around the search for semantically-similar document, see the Similar document retrieval chapter.