Document selection
Document selection produces a set of documents you can use at later stages of your analyses, for example for clustering or 2d mapping.
Most of your analysis requests will operate on some subset of documents stored in the Lingo4G index. The
documents:​*
category of analysis stages
groups various ways of specifying which documents to select for processing.
This article is an overview of the available document selector stages and their typical use cases. For in-depth descriptions of specific selectors and their properties, see the document selector API reference.
This article assumes you are familiar with the structure and concepts behind Lingo4G analysis request JSONs.
Common document selectors
The following stages should cover the document selection needs of most typical analysis requests.
documents:​by​Query
The
documents:​by​Query
stage selects documents that match the
query you provide. Coupled with the
query:​string
component, which parses
Lucene-like query syntax,
documents:​by​Query
is the most likely source of documents in your analyses.
Let's use the
documents:​by​Query
stage to select the top 100 arXiv abstracts containing the
dark energy phrase and created in 2016 or later.
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "\"dark energy\" AND created:[2016-01-01 TO *]"
},
"limit": 100
}
}
}
Using the
documents:​by​Query
stage and the
query:​string
to select the top 100 documents matching a string query.
If you execute the above request in the
JSON Sandbox app, you should see a result similar to the
following JSON. Note that we present only the top 5 results for brevity. In the real response, the number of
elements in the documents
array is not greater than the
limit
value you set in the request.
{
"result" : {
"documents" : {
"matches" : {
"value" : 1023,
"relation" : "GREATER_OR_EQUAL"
},
"documents" : [
{
"id" : 435659,
"weight" : 10.30529
},
{
"id" : 494588,
"weight" : 10.237257
},
{
"id" : 175646,
"weight" : 10.211353
},
{
"id" : 91095,
"weight" : 10.153916
},
{
"id" : 228105,
"weight" : 10.114384
}
]
}
}
}
Document selection JSON response.
Document selection JSON output
contains the documents
array, which holds the internal identifier and weight (importance) of each
selected document. The semantics of the document's weight property depends on the specific document selection
stage. In case of the documents:​by​Query
stage, each document's weight is the search score returned
by Apache Lucene, which Lingo4G uses to perform query-based searches.
Some document selection stages may add extra information on top of the list of selected documents. In our case,
documents:​by​Query
adds the matches
section, which shows the total number of documents matching the query. The number of matches may be larger than
the document selection limit you provide in the request. See the
documents:​by​Query
reference documentation for a detailed description of its output JSON.
documents:embeddingNearestNeighbors
The
documents:​embedding​Nearest​Neighbors
stage selects the documents that are most semantically-similar to the multidimensional
embedding vector you provide. In contrast to
documents:​by​Query
, which requires certain words to be present in the selected documents, the embedding-based document selection
performs a more "fuzzy", semantics-based matching. You may use the embedding-based document selection to
discover documents that are hard to find using keyword-based methods.
To use embedding-based selectors, your index must contain label and document embedding vectors. See the learning embeddings article for detailed instructions.
Document-based selection
Let's select documents that are semantically similar to one specific seed document. We'll break the request down into three stages:
-
Selecting the seed document using the
documents:​by​Query
stage. -
Retrieving the embedding vector of the seed document using the
vector:​document​Embedding
stage. -
Retrieving the semantically similar documents using the
documents:​embedding​Nearest​Neighbors
stage.
{
"name": "Selecting documents semantically-similar to another document",
"stages": {
"seedDocuments": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "id:1703.01028"
}
},
"seedVector": {
"type": "vector:documentEmbedding",
"documents": {
"type": "documents:reference",
"use": "seedDocuments"
}
},
"similarDocuments": {
"type": "documents:embeddingNearestNeighbors",
"vector": {
"type": "vector:reference",
"use": "seedVector"
},
"limit": 20
}
},
"output": {
"stages": [
"seedDocuments",
"similarDocuments"
]
}
}
Selecting documents that are semantically-similar to one specific seed document (request).
The request contains three named stages corresponding to the above list: seed​Documents
,
seed​Vector
and similar​Document
. Note how the request uses
stage references to pass results from one stage to another. We
use the $.output.stages
property to output only the results of document stages, the output of the
vector stage is not relevant.
If you run the above request in the JSON sandbox app, you should see a response similar to the following JSON.
{
"result" : {
"seedDocuments" : {
"matches" : {
"value" : 1,
"relation" : "EXACT"
},
"documents" : [
{
"id" : 0,
"weight" : 5.78041
}
]
},
"similarDocuments" : {
"documents" : [
{
"id" : 0,
"weight" : 1.0
},
{
"id" : 270194,
"weight" : 0.8336917
},
{
"id" : 49358,
"weight" : 0.82552254
},
{
"id" : 210544,
"weight" : 0.80131805
},
{
"id" : 326177,
"weight" : 0.79910874
}
]
}
}
}
Selecting documents that are semantically-similar to one specific seed document (response).
The $.result.seed​Documents
section contains the identifier of the seed document, while the
$.result.similar​Documents
section contains the documents whose embedding vectors lie close to the
seed document's vector. The
documents:​embedding​Nearest​Neighbors
stage computes document weights as the dot product between the search vector and the result document's vector,
normalized to the 0...1 range. In our case, the first document on the list is the same as the seed document,
hence the weight of 1.0. Also note, that the embedding-based documents selector does not output the
matches
section.
Looking at the response to the original request, it is not possible to tell if the selected documents are
indeed semantically-similar to the seed document. You can use the
document​Content
stage to retrieve the contents, such as title or abstract, of the documents returned by the document selection
stages. See the document content retrieval
tutorial for detailed explanations and example requests.
Label-based selection
Lingo4G can also learn multidimensional embedding vectors for labels. Therefore, instead of the seed document
vector, you can pass a label vector to the
documents:​embedding​Nearest​Neighbors
stage. In this arrangement, Lingo4G selects documents that are semantically similar to one or more labels you
provide.
The following request returns 20 documents whose embedding vectors lie closest to the embedding vector of the LIGO label (which stands for Laser Interferometer Gravitational-Wave Observatory).
{
"name": "Selecting documents semantically-similar to a label but not containing that label",
"stages": {
"similarDocuments": {
"type": "documents:embeddingNearestNeighbors",
"vector": {
"type": "vector:labelEmbedding",
"labels": {
"type": "labels:direct",
"labels": [
{
"label": "LIGO"
}
]
}
},
"limit": 100
}
}
}
Selecting documents that are semantically-similar to the LIGO label.
This request in-lines all the necessary dependencies into the
documents:​embedding​Nearest​Neighbors
stage. The vector
property contains the
vector:​label​Embedding
stage, which in turn uses the
labels:​direct
stage to provide a literal label.
If you run the above request in the JSON Sandbox app, you should see a list of matching document identifiers,
along with the 0...1 similarity scores. Again, to verify that the resulting documents relate to the seed
label, you can a document​Content
stage to retrieve the titles and abstracts of the papers. See the
Document content retrieval article for a complete tutorial.
We can extend the above request to return the embedding-wise similar documents, but only those that do not contain the LIGO keyword. These would be the related documents that are impossible to find using the traditional keyword-based method.
{
"name": "Selecting documents semantically-similar to a label but not containing that label",
"stages": {
"similarDocuments": {
"type": "documents:embeddingNearestNeighbors",
"vector": {
"type": "vector:labelEmbedding",
"labels": {
"type": "labels:direct",
"labels": [
{
"label": "LIGO"
}
]
}
},
"limit": 100
},
"similarDocumentsWithoutKeywords": {
"type": "documents:byQuery",
"query": {
"type": "query:composite",
"queries": [
{
"type": "query:fromDocuments",
"documents": {
"type": "documents:reference",
"use": "similarDocuments"
}
},
{
"type": "query:complement",
"query": {
"type": "query:string",
"query": "LIGO"
}
}
],
"operator": "AND"
}
},
"similarDocumentsWithoutKeywordsContent": {
"type": "documentContent",
"documents": {
"type": "documents:reference",
"use": "similarDocumentsWithoutKeywords"
},
"fields": {
"type": "contentFields:grouped",
"groups": [
{
"fields": ["title", "abstract"],
"config": {
"maxValueLength": 2048
}
}
]
}
}
},
"output": {
"stages": [
"similarDocumentsWithoutKeywordsContent",
"similarDocuments"
]
}
}
Selecting documents that are semantically-similar to the LIGO label but do not contain the LIGO word.
The similar​Documents
stage is the same as in the previous request, with retrieval limit
increased to 100.
The similar​Documents​Without​Keywords
stage uses thedocuments:​by​Query
stage to remove from similar​Documents
those documents that contain the LIGO word. To
this end, the request uses the
query:​composite
,
query:​from​Documents
and
query:​complement
components to intersect the list of all similar documents with those documents that do not contain the
LIGO word.
Finally, the similar​Documents​Without​Keywords​Content
stage retrieves the titles and abstracts of the selected documents to confirm that they are related to the
seed label but do not contain it.
Let's examine the top results Lingo4G returns for this request.
{
"result" : {
"similarDocumentsWithoutKeywordsContent" : {
"documents" : [
{
"id" : 11882,
"fields" : {
"title" : {
"values" : [
"Black hole spectroscopy for KAGRA future prospect in O5"
]
},
"abstract" : {
"values" : [
" Ringdown gravitational waves of compact binary mergers are an important target to test general relativity. The main components of the ringdown waveform after merger are black hole quasinormal modes. In general relativity, all multipolar quasinormal modes of a black hole should give the same values of black hole parameters. Although the observed binary black hole events so far are not significant enough to perform the test with ringdown gravitational waves, it is expected that the test will be achieved in third generation detectors. The Japanese gravitational wave detector KAGRA, called bKAGRA for the current configuration, has started observation, and discussions for the future upgrade plans have also started. In this study, we consider which KAGRA upgrade plan is the best to detect the subdominant quasinormal modes of black holes in the aim of testing general relativity. We use a numerical relativity waveform as injected signals that contains two multipolar modes and analyze each mode by matched filtering. Our results suggest that the plan FDSQZ, which improves the sensitivity of KAGRA for broad frequency range, is the most suitable configuration for black hole spectroscopy. "
]
}
}
},
{
"id" : 19993,
"fields" : {
"title" : {
"values" : [
"Constraining the evolutionary history of Newton's constant with gravitational wave observations"
]
},
"abstract" : {
"values" : [
"Space-borne gravitational wave detectors, such as the proposed Laser\nInterferometer Space Antenna, are expected to observe black hole coalescences\nto high redshift and with large signal-to-noise ratios, rendering their\ngravitational waves ideal probes of fundamental physics. The promotion of\nNewton's constant to a time-function introduces modifications to the binary's\nbinding energy and the gravitational wave luminosity, leading to corrections in\nthe chirping frequency. Such corrections propagate into the response function\nand, given a gravitational wave observation, they allow for constraints on the\nfirst time-derivative of Newton's constant at the time of merger. We find that\nspace-borne detectors could indeed place interesting constraints on this\nquantity as a function of sky position and redshift, providing a\n{\\emph{constraint map}} over the entire range of redshifts where binary black\nhole mergers are expected to occur. A LISA observation of an equal-mass\ninspiral event with total redshifted mass of 10^5 solar masses for three years\nshould be able to measure Ġ/G at the time of merger to better than\n10^(-11)/yr."
]
}
}
},
{
"id" : 33106,
"fields" : {
"title" : {
"values" : [
"New Class of Gravitational Wave Templates for Inspiralling Compact Binaries"
]
},
"abstract" : {
"values" : [
" Compact binaries inspiralling along quasi-circular orbits are the most plausible gravitational wave (GW) sources for the operational, planned and proposed laser interferometers. We provide new class of restricted post-Newtonian accurate GW templates for non-spinning compact binaries inspiralling along PN accurate quasi-circular orbits. Arguments based on data analysis, theoretical and astrophysical considerations are invoked to show why these time-domain Taylor approximants should be interesting to various GW data analysis communities. "
]
}
}
},
{
"id" : 39706,
"fields" : {
"title" : {
"values" : [
"Sensing and vetoing loud transient noises for the gravitational-wave detection"
]
},
"abstract" : {
"values" : [
" Since the first detection of gravitational-wave (GW), GW150914, September 14th 2015, the multi-messenger astronomy added a new way of observing the Universe together with electromagnetic (EM) waves and neutrinos. After two years, GW together with its EM counterpart from binary neutron stars, GW170817 and GRB170817A, has been observed. The detection of GWs opened a new window of astronomy/astrophysics and will be an important messenger to understand the Universe. In this article, we briefly review the gravitational-wave and the astrophysical sources and introduce the basic principle of the laser interferometer as a gravitational-wave detector and its noise sources to understand how the gravitational-waves are detected in the laser interferometer. Finally, we summarize the search algorithms currently used in the gravitational-wave observatories and the detector characterization algorithms used to suppress noises and to monitor data quality in order to improve the reach of the astrophysical searches. "
]
}
}
},
{
"id" : 102607,
"fields" : {
"title" : {
"values" : [
"Gravitational wave polarization from combined Earth-space detectors"
]
},
"abstract" : {
"values" : [
" In this paper, we investigate the sensitivity to additional gravitational wave polarization modes of future detectors. We first look at the upcoming Einstein Telescope and its combination with existing or planned Earth-based detectors in the case of a stochastic gravitational wave background. We then study its correlation with a possible future space-borne detector sensitive to high-frequencies, like DECIGO. Finally, we adapt those results for a single GW source and establish the sensitivity of the modes, as well as the localization on the sky. "
]
}
}
}
]
},
"similarDocuments" : {
"documents" : [
{
"id" : 1613,
"weight" : 0.8930617
},
{
"id" : 383026,
"weight" : 0.89125544
},
{
"id" : 37221,
"weight" : 0.8881976
},
{
"id" : 268469,
"weight" : 0.8876746
},
{
"id" : 425691,
"weight" : 0.8848885
}
]
}
}
}
Documents that are semantically-similar to the LIGO label but do not contain the LIGO word.
Document 29454 talks about "Next Generation Gravitational Wave Detectors", so it's very much related – LIGO, is also a gravitational wave detector. Document 40400 does not contain the LIGO word, but does contain the acronym spelled out. Further documents talk about various aspecs of gravitation wave detection, which is again close related to what LIGO does.
{
"result" : {
"similarDocumentsWithoutKeywordsContent" : {
"documents" : [
{
"id" : 11882,
"fields" : {
"title" : {
"values" : [
"Black hole spectroscopy for KAGRA future prospect in O5"
]
},
"abstract" : {
"values" : [
" Ringdown gravitational waves of compact binary mergers are an important target to test general relativity. The main components of the ringdown waveform after merger are black hole quasinormal modes. In general relativity, all multipolar quasinormal modes of a black hole should give the same values of black hole parameters. Although the observed binary black hole events so far are not significant enough to perform the test with ringdown gravitational waves, it is expected that the test will be achieved in third generation detectors. The Japanese gravitational wave detector KAGRA, called bKAGRA for the current configuration, has started observation, and discussions for the future upgrade plans have also started. In this study, we consider which KAGRA upgrade plan is the best to detect the subdominant quasinormal modes of black holes in the aim of testing general relativity. We use a numerical relativity waveform as injected signals that contains two multipolar modes and analyze each mode by matched filtering. Our results suggest that the plan FDSQZ, which improves the sensitivity of KAGRA for broad frequency range, is the most suitable configuration for black hole spectroscopy. "
]
}
}
},
{
"id" : 19993,
"fields" : {
"title" : {
"values" : [
"Constraining the evolutionary history of Newton's constant with gravitational wave observations"
]
},
"abstract" : {
"values" : [
"Space-borne gravitational wave detectors, such as the proposed Laser\nInterferometer Space Antenna, are expected to observe black hole coalescences\nto high redshift and with large signal-to-noise ratios, rendering their\ngravitational waves ideal probes of fundamental physics. The promotion of\nNewton's constant to a time-function introduces modifications to the binary's\nbinding energy and the gravitational wave luminosity, leading to corrections in\nthe chirping frequency. Such corrections propagate into the response function\nand, given a gravitational wave observation, they allow for constraints on the\nfirst time-derivative of Newton's constant at the time of merger. We find that\nspace-borne detectors could indeed place interesting constraints on this\nquantity as a function of sky position and redshift, providing a\n{\\emph{constraint map}} over the entire range of redshifts where binary black\nhole mergers are expected to occur. A LISA observation of an equal-mass\ninspiral event with total redshifted mass of 10^5 solar masses for three years\nshould be able to measure Ġ/G at the time of merger to better than\n10^(-11)/yr."
]
}
}
},
{
"id" : 33106,
"fields" : {
"title" : {
"values" : [
"New Class of Gravitational Wave Templates for Inspiralling Compact Binaries"
]
},
"abstract" : {
"values" : [
" Compact binaries inspiralling along quasi-circular orbits are the most plausible gravitational wave (GW) sources for the operational, planned and proposed laser interferometers. We provide new class of restricted post-Newtonian accurate GW templates for non-spinning compact binaries inspiralling along PN accurate quasi-circular orbits. Arguments based on data analysis, theoretical and astrophysical considerations are invoked to show why these time-domain Taylor approximants should be interesting to various GW data analysis communities. "
]
}
}
},
{
"id" : 39706,
"fields" : {
"title" : {
"values" : [
"Sensing and vetoing loud transient noises for the gravitational-wave detection"
]
},
"abstract" : {
"values" : [
" Since the first detection of gravitational-wave (GW), GW150914, September 14th 2015, the multi-messenger astronomy added a new way of observing the Universe together with electromagnetic (EM) waves and neutrinos. After two years, GW together with its EM counterpart from binary neutron stars, GW170817 and GRB170817A, has been observed. The detection of GWs opened a new window of astronomy/astrophysics and will be an important messenger to understand the Universe. In this article, we briefly review the gravitational-wave and the astrophysical sources and introduce the basic principle of the laser interferometer as a gravitational-wave detector and its noise sources to understand how the gravitational-waves are detected in the laser interferometer. Finally, we summarize the search algorithms currently used in the gravitational-wave observatories and the detector characterization algorithms used to suppress noises and to monitor data quality in order to improve the reach of the astrophysical searches. "
]
}
}
},
{
"id" : 102607,
"fields" : {
"title" : {
"values" : [
"Gravitational wave polarization from combined Earth-space detectors"
]
},
"abstract" : {
"values" : [
" In this paper, we investigate the sensitivity to additional gravitational wave polarization modes of future detectors. We first look at the upcoming Einstein Telescope and its combination with existing or planned Earth-based detectors in the case of a stochastic gravitational wave background. We then study its correlation with a possible future space-borne detector sensitive to high-frequencies, like DECIGO. Finally, we adapt those results for a single GW source and establish the sensitivity of the modes, as well as the localization on the sky. "
]
}
}
}
]
},
"similarDocuments" : {
"documents" : [
{
"id" : 1613,
"weight" : 0.8930617
},
{
"id" : 383026,
"weight" : 0.89125544
},
{
"id" : 37221,
"weight" : 0.8881976
},
{
"id" : 268469,
"weight" : 0.8876746
},
{
"id" : 425691,
"weight" : 0.8848885
}
]
}
}
}
Documents that are semantically-similar to the LIGO label but do not contain the LIGO word.
documents:sample
The documents:​sample
stage takes a
random sample of the documents matching the query
you provide. In many cases you can save time and resources by processing a random subset of a large document set
instead of the whole set.
One natural use case for documents:​sample
is computing the occurrence statistics for a list of
labels. The following request computes the numbers of occurrences of the photon, electron and
proton labels across papers published between 2006 and 2008.
{
"name": "Frequency estimates for a list of labels",
"components": {
"scope": {
"type": "query:string",
"query": "created:[2006-01-01 TO 2008-12-31]"
}
},
"stages": {
"labels": {
"type": "labels:direct",
"labels": [
{
"label": "photon"
},
{
"label": "electron"
},
{
"label": "proton"
}
]
},
"tfSample": {
"type": "labels:scored",
"scorer": {
"type": "labelScorer:tf",
"scope": {
"type": "documents:sample",
"samplingRatio": 0.1,
"query": {
"type": "query:reference",
"use": "scope"
}
}
},
"labels": {
"type": "labels:reference",
"use": "labels"
}
},
"tf": {
"type": "labels:scored",
"scorer": {
"type": "labelScorer:tf",
"scope": {
"type": "documents:byQuery",
"query": {
"type": "query:reference",
"use": "scope"
},
"limit": "unlimited"
}
},
"labels": {
"type": "labels:reference",
"use": "labels"
}
}
},
"output": {
"stages": [
"labels",
"tfSample",
"tf"
]
}
}
Computing the numbers of occurrences of the photon, electron and proton labels across papers published between 2006 and 2008.
The scope
component is the query defining the subset of
documents for which to compute the occurrence frequencies. The labels
stage uses the
labels:​direct
stage to provide the list of labels for which to compute frequencies. Finally, the
tf​Sample
stage computes the estimated occurrence counts. Notice how we use the
documents:​sample
stage in the scope
property to take a 10% sample of all the documents matched by the
scope
query. For comparison, the request also computes the same statistics using all documents in
scope.
If you run the above request in JSON Sandbox, you should see a result similar to the following JSON.
{
"result" : {
"labels" : {
"labels" : [
{
"label" : "photon"
},
{
"label" : "electron"
},
{
"label" : "proton"
}
]
},
"tfSample" : {
"labels" : [
{
"label" : "photon",
"weight" : 2190.346
},
{
"label" : "electron",
"weight" : 3340.5276
},
{
"label" : "proton",
"weight" : 940.1485
}
]
},
"tf" : {
"labels" : [
{
"label" : "photon",
"weight" : 2073.0
},
{
"label" : "electron",
"weight" : 3462.0
},
{
"label" : "proton",
"weight" : 705.0
}
]
}
},
"status" : {
"status" : "AVAILABLE",
"elapsedMs" : 1945,
"tasks" : [
{
"name" : "tf",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1678701214708,
"elapsedMs" : 1759,
"tasks" : [
{
"name" : "Computing term frequencies",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1678701214755,
"elapsedMs" : 1712,
"tasks" : [ ],
"attributes" : [ ]
}
],
"attributes" : [ ]
},
{
"name" : "tfSample",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1678701216468,
"elapsedMs" : 185,
"tasks" : [
{
"name" : "Computing term frequencies",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1678701216472,
"elapsedMs" : 181,
"tasks" : [ ],
"attributes" : [ ]
}
],
"attributes" : [ ]
}
]
}
}
The numbers of occurrences of the photon, electron and proton labels across papers published between 2006 and 2008.
The tf​Sample
section contains estimated numbers of occurrences (the weight
property),
while the tf
section shows the accurate values computed using all documents in scope. Notice that
the estimates can be either larger or smaller than the actual value, sometimes by a noticeable margin as it is
the case with the proton label. Also, in most cases the estimates will contain fractional parts due to
the scaling Lingo4G applies as part of the sampling process.
The response also contains the status
section, which describes the specific tasks Lingo4G performed
to process the request. The elapsed​Ms
property shows the time Lingo4G took to complete the specific
task. Notice that computing estimated frequencies was 8 times faster than computing the accurate result. For
large scopes this may be a reduction of minutes to seconds.
Common use cases
The output of a document selection stage contains very limited information on its own: just a list of internal document identifiers and their weights. Practical requests will usually combine document selection with other stages to obtain a results meaningful to end users.
Source of documents for other stages
Typically, the documents:​*
stages provide input for other types of stages, for example:
-
Content retrieval. Use the
document​Content
stage to retrieve the textual contents, such as title, abstract or category tags, for a list of documents. -
Label collection. Use the
labels:​from​Documents
stage to collect labels that characterize the documents in your list. -
Document similarities for clustering and 2d mapping. For a list of documents, you can compute a matrix of similarities between pairs of documents on the list. Matrices are not very useful on their own, but you can pass them as input for clustering and 2d embedding.
Counting documents
You can use the
documents:​by​Query
stage with its
limit
property set to
0
to count the numbers of documents based on different criteria.
The following request computes the numbers of documents containing the deep learning phrase in arXiv articles published in 2012, 2014, 2016 and 2018.
{
"name": "Computing the number of papers containing the 'deep learing' phrase in 2012, 2014, 2016 and 2018.",
"components": {
"query": {
"type": "query:string",
"query": "\"deep learning\""
}
},
"stages": {
"2012": {
"type": "documents:byQuery",
"query": {
"type": "query:filter",
"query": {
"type": "query:reference",
"use": "query"
},
"filter": {
"type": "query:string",
"query": "created:2012*"
}
},
"limit": 0
},
"2014": {
"type": "documents:byQuery",
"query": {
"type": "query:filter",
"query": {
"type": "query:reference",
"use": "query"
},
"filter": {
"type": "query:string",
"query": "created:2014*"
}
},
"limit": 0
},
"2016": {
"type": "documents:byQuery",
"query": {
"type": "query:filter",
"query": {
"type": "query:reference",
"use": "query"
},
"filter": {
"type": "query:string",
"query": "created:2016*"
}
},
"limit": 0
},
"2018": {
"type": "documents:byQuery",
"query": {
"type": "query:filter",
"query": {
"type": "query:reference",
"use": "query"
},
"filter": {
"type": "query:string",
"query": "created:2018*"
}
},
"limit": 0
}
}
}
Counting the papers containing the deep learning phrase and published in 2012, 2014, 2016 and 2018.
The request defines the search phrase part of the query, which is common to all counting periods, in the
components
section, so that all stages can reuse it. In the stages
section, the
request defines four stages corresponding to the annual periods in which we want to count documents. Each such
stage uses the query:​filter
to intersect the
phrase part of the query with the counting period. Each stage sets the
limit
property to zero, so that Lingo4G only counts the matches, which is usually faster than selecting the
identifiers of the matching documents.
If you run the request in the JSON Sandbox app, you should get a response similar to the following JSON.
{
"result" : {
"2012" : {
"matches" : {
"value" : 0,
"relation" : "EXACT"
},
"documents" : [ ]
},
"2014" : {
"matches" : {
"value" : 24,
"relation" : "EXACT"
},
"documents" : [ ]
},
"2016" : {
"matches" : {
"value" : 153,
"relation" : "EXACT"
},
"documents" : [ ]
},
"2018" : {
"matches" : {
"value" : 763,
"relation" : "EXACT"
},
"documents" : [ ]
}
}
}
Numbers of papers containing the deep learning phrase and published in 2012, 2014, 2016 and 2018.
As expected, the number of papers containing the deep learning phrase grows exponentially after 2012.