/v2/analysis/*
Validates and executes a new analysis request.
You can use this endpoint in the asynchronous or synchronous (blocking) mode. The synchronous mode is much easier: the request blocks until the result (or an error) is returned. In the asynchronous mode, Lingo4G returns a new analysis resource URI. This URI is a prefix of all resource-specific API endpoints for polling analysis status and progress, and eventually retrieving of the result.
This chapter is split into the following subsections for clarity:
- analysis request and response JSONs,
- submitting requests in synchronous mode,
- submitting requests in asynchronous mode and the description of analysis resource-specific endpoints,
- error responses and their handling.
Request and Response JSONs
The REST API is essentially about submitting a valid analysis request JSON to a remote service that computes its result and returns the analysis response JSON.
Regardless of the mode of execution (synchronous or asynchronous), an analysis request always includes the specification of stages to be processed. Here is an example request returning the title field of the first three documents matching the query photon:
{
"stages": {
"documents": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {}
}
}
}
}
}
Shown below for quick skimming through, is the full analysis API response for this request:
{
"result" : {
"documents" : {
"documents" : [
{
"id" : 482237,
"fields" : {
"title" : {
"values" : [
"Jets in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collisions"
]
}
}
},
{
"id" : 298152,
"fields" : {
"title" : {
"values" : [
"Studying 750 GeV Di-⁌q⁍photon⁌\\q⁍ Resonance at ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collider"
]
}
}
},
{
"id" : 275187,
"fields" : {
"title" : {
"values" : [
"Two-⁌q⁍photon⁌\\q⁍ interference of temporally separated ⁌q⁍photons⁌\\q⁍"
]
}
}
}
]
}
},
"status" : {
"status" : "AVAILABLE",
"elapsedMs" : 1,
"tasks" : [
{
"name" : "documents → documents:byQuery",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1705010276498,
"elapsedMs" : 0,
"tasks" : [
{
"name" : "Selecting documents (byQuery)",
"status" : "SKIPPED",
"tasks" : [ ],
"attributes" : [
{
"name" : "Skipped",
"value" : "cached"
}
]
}
],
"attributes" : [
{
"name" : "Skipped",
"value" : "cached"
}
]
},
{
"name" : "documents",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1705010276497,
"elapsedMs" : 0,
"tasks" : [ ],
"attributes" : [ ]
}
]
},
"log" : [ ]
}
An analysis request JSON can get quite complex: include multiple stages, nested structures or cross-reference
links. In the (relatively simple) example above, we use the top-level
documentContent
stage, which has a nested
documents:byQuery
component.
The detailed description of how analysis requests and responses are structured goes beyond the scope of this REST API reference. See the Analysis JSON overview and the following chapters for a tutorial on how to build Lingo4G analysis requests. See the Analysis response chapter for a detailed description of the analysis response JSON. We highly recommend going through those parts of the documentation first, then playing with various requests in the JSON Sandbox app before proceeding to writing HTTP REST API clients. A solid understanding of the request/ response JSON structure will make reading through the documentation of the REST API much easier.
Synchronous mode
This section describes the analysis endpoint in blocking (synchronous) mode.
Access Methods
POST
URL Parameters
The following URL parameters are available.
- async
-
Must be set to
false
to force blocking mode. Note the default istrue
, which implies asynchronous mode. - download
-
If
true
, the server will addContent-Disposition
HTTP header with the suggested file name to save the result of the analysis to.Default value:
false
Request Body
The request body must contain exactly one JSON object with a complete analysis request to be executed.
The request should specify an appropriate Content-Type
header equal to
application/json
.
Response
The HTTP Found (200
) status code is returned upon successful validation and execution of the
analysis request. The response will contain the analysis result JSON.
Analysis progress is not available in synchronous mode. Make sure the HTTP connection has long timeouts or use the asynchronous mode.
Errors
See the analysis error handling section.
Examples
Given the following valid analysis request in a file named analysis-synchronous.request.json
:
{
"stages": {
"documents": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {}
}
}
}
}
}
this curl
command posts it to the analysis endpoint in blocking mode:
curl -XPOST -H "Content-Type: application/json" --max-time 180 --data @analysis-synchronous.request.json http://localhost:8080/api/v2/analysis?async=false
Note the --max-time
parameter: because the request blocks, an increased timeout may be required to
prevent the client from terminating the connection before Lingo4G finishes computing the result of a
long-running analysis.
The HTTP request sent to the server looks like this:
POST /api/v2/analysis?async=false HTTP/1.1
Content-Type: application/json
{
"stages": {
"documents": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {}
}
}
}
}
}
And the server replies with the following response:
HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked
{
"result" : {
"documents" : {
"documents" : [
{
"id" : 188201,
"fields" : {
"title" : {
"values" : [
"⁌q⁍Photons⁌\\q⁍, ⁌q⁍Photon⁌\\q⁍ Jets and Dark ⁌q⁍Photons⁌\\q⁍ at 750 GeV and Beyond"
]
}
}
},
{
"id" : 62168,
"fields" : {
"title" : {
"values" : [
"Final States in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ and ⁌q⁍Photon⁌\\q⁍-Proton Interactions"
]
}
}
},
{
"id" : 252264,
"fields" : {
"title" : {
"values" : [
"Two-⁌q⁍Photon⁌\\q⁍ Processes and ⁌q⁍Photon⁌\\q⁍ Structure"
]
}
}
}
]
}
},
"status" : {
"status" : "AVAILABLE",
"elapsedMs" : 54,
"tasks" : [
{
"name" : "documents → documents:byQuery",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335062066,
"elapsedMs" : 36,
"tasks" : [
{
"name" : "Selecting documents (byQuery)",
"status" : "DONE",
"startedAt" : 1682335062069,
"elapsedMs" : 32,
"tasks" : [ ],
"attributes" : [
{
"name" : "Limit",
"value" : "3"
},
{
"name" : "Document scores",
"value" : "yes"
},
{
"name" : "Accurate hit count",
"value" : "no"
},
{
"name" : "Total hits (approximation)",
"value" : "1,008"
}
]
}
],
"attributes" : [ ]
},
{
"name" : "documents",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335062048,
"elapsedMs" : 54,
"tasks" : [ ],
"attributes" : [ ]
}
]
},
"log" : [ ]
}
Asynchronous mode
This section describes the analysis endpoint in asynchronous mode. This mode is more suitable for production purposes as it allows tracking partial completion progress for an analysis running on the server.
Access Methods
POST
URL Parameters
The following URL parameters are available.
- async
-
Must be set to
true
or omitted entirely (the default value istrue
). - download
-
Not supported in asynchronous mode.
Request Body
The request body must contain exactly one JSON object with a complete analysis request to be executed.
The request should specify an appropriate Content-Type
header equal to
application/json
.
Response
The HTTP Accepted (202
) status code is returned upon successful validation of the request. The
response body is empty and the HTTP Location
header points at the newly created resource URI to track the analysis executing asynchronously.
The status of the asynchronously running analysis can be checked with the
/v2/analysis/{id}
endpoint.
The result of the asynchronously running analysis can be fetched with the
/v2/analysis/{id}/result
endpoint.
Once the result has been downloaded and the analysis is no longer needed, it should be deleted to release
associated caches (see DELETE
HTTP method on /v2/analysis/{id}).
Errors
The asynchronous endpoint can return an error response immediately if validation errors occurred, or later, from the returned analysis resource URI. See the analysis error handling section for more information.
Examples
Given the following valid analysis request in a file named analysis-asynchronous.request.json
:
{
"stages": {
"documents": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {}
}
}
},
"delay": {
"type": "debug:progress",
"tasks": [
{
"name": "Delay request execution",
"durationMs": 1000
}
]
}
}
}
this curl
command posts it to the analysis endpoint in asynchronous mode:
curl --include -XPOST -H "Content-Type: application/json" --data @analysis-asynchronous.request.json http://localhost:8080/api/v2/analysis
The --include
option forces curl to display the Location
and other response headers, so you can see the URI to the newly created analysis resource. The above curl
command results in the following request sent to the server:
POST /api/v2/analysis HTTP/1.1
Content-Type: application/json
{
"stages": {
"documents": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"fields":{
"type": "contentFields:simple",
"fields": {
"title": {}
}
}
},
"delay": {
"type": "debug:progress",
"tasks": [
{
"name": "Delay request execution",
"durationMs": 1000
}
]
}
}
}
And the server replies with the following response (note the Location
header; the exact analysis URI will differ from request to request):
HTTP/1.1 202
content-length: 0
location: http://localhost:58845/api/v2/analysis/a0cf0c53d2c32eed
The returned Location
URI is the prefix of
all other API endpoints you can use to poll for the status and
result of this analysis. Observe that, in the example above, we intentionally added the
debug:progress
stage that takes 1 second to compute. We can now ask for the result of the analysis we have just started with a
timeout of 100 milliseconds:
GET /api/v2/analysis/a0cf0c53d2c32eed/result?timeoutMs=100 HTTP/1.1
As expected, the result for this analysis is not yet available, as shown by the returned status
(PROCESSING
):
HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked
{
"status" : {
"status" : "PROCESSING",
"elapsedMs" : 107,
"tasks" : [
{
"name" : "delay",
"status" : "STARTED",
"progress" : 0.0,
"startedAt" : 1682335062168,
"elapsedMs" : 107,
"tasks" : [
{
"name" : "Delay request execution",
"status" : "STARTED",
"progress" : 0.0,
"startedAt" : 1682335062169,
"elapsedMs" : 106,
"tasks" : [ ],
"attributes" : [ ]
}
],
"attributes" : [ ]
},
{
"name" : "documents → documents:byQuery",
"status" : "NEW",
"tasks" : [
{
"name" : "Selecting documents (byQuery)",
"status" : "NEW",
"tasks" : [ ],
"attributes" : [ ]
}
],
"attributes" : [ ]
},
{
"name" : "documents",
"status" : "NEW",
"tasks" : [ ],
"attributes" : [ ]
}
]
},
"log" : [ ]
}
If the analysis has completed or completes within the provided deadline, the result is returned as part of the response. Let's query the same analysis again, this time with an indefinite timeout:
GET /api/v2/analysis/a0cf0c53d2c32eed/result HTTP/1.1
The returned response now contains the result and the status is AVAILABLE
:
HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked
{
"result" : {
"documents" : {
"documents" : [
{
"id" : 188201,
"fields" : {
"title" : {
"values" : [
"⁌q⁍Photons⁌\\q⁍, ⁌q⁍Photon⁌\\q⁍ Jets and Dark ⁌q⁍Photons⁌\\q⁍ at 750 GeV and Beyond"
]
}
}
},
{
"id" : 62168,
"fields" : {
"title" : {
"values" : [
"Final States in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ and ⁌q⁍Photon⁌\\q⁍-Proton Interactions"
]
}
}
},
{
"id" : 252264,
"fields" : {
"title" : {
"values" : [
"Two-⁌q⁍Photon⁌\\q⁍ Processes and ⁌q⁍Photon⁌\\q⁍ Structure"
]
}
}
}
]
},
"delay" : {
"completed" : true
}
},
"status" : {
"status" : "AVAILABLE",
"elapsedMs" : 1027,
"tasks" : [
{
"name" : "delay",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335062168,
"elapsedMs" : 1023,
"tasks" : [
{
"name" : "Delay request execution",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335062169,
"elapsedMs" : 1022,
"tasks" : [ ],
"attributes" : [ ]
}
],
"attributes" : [ ]
},
{
"name" : "documents → documents:byQuery",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335063195,
"elapsedMs" : 0,
"tasks" : [
{
"name" : "Selecting documents (byQuery)",
"status" : "SKIPPED",
"tasks" : [ ],
"attributes" : [
{
"name" : "Skipped",
"value" : "cached"
}
]
}
],
"attributes" : [
{
"name" : "Skipped",
"value" : "cached"
}
]
},
{
"name" : "documents",
"status" : "DONE",
"progress" : 1.0,
"startedAt" : 1682335063192,
"elapsedMs" : 3,
"tasks" : [ ],
"attributes" : [ ]
}
]
},
"log" : [ ]
}
The analysis is no longer needed, we can release its resources on the server early by deleting it:
DELETE /api/v2/analysis/a0cf0c53d2c32eed/ HTTP/1.1
the server responds:
HTTP/1.1 200 OK
content-length: 0
It is important to mention again that the /analysis
endpoint can return an error immediately when the analysis is started or later — from any endpoint
specific to the returned analysis URI.
Asynchronous Analysis Endpoints
API endpoints in this section serve results and status information for
analyses started in asynchronous mode. The {id}
element of their URI is returned
in the Location
header of the analysis endpoint response.
/v2/analysis/{id}
Returns just the status
block for of an
analysis resource started in asynchronous mode.
An alternative to using this endpoint is to call the /v2/analysis/{id}/result endpoint with a timeout: the status block is included in that endpoint's response, even if the analysis hasn't completed yet.
Access Methods
GET
or DELETE
URL Parameters
None
Request Body
None
Response
For HTTP GET
method, this API endpoint returns just the
status
block of the typical
analysis response JSON.
For HTTP DELETE
method, HTTP status code 200 (OK) is returned and the analysis is permanently
deleted from the server, releasing its resources. We advise to always clean up analyses that will no longer be
used to keep server resource usage low.
Errors
See error response handling section.
Examples
See the full asynchronous analysis workflow example here.
/v2/analysis/{id}/result
Returns the full result of an analysis started in asynchronous mode.
This endpoint can be called in blocking mode or with a timeout, which returns partial response (status, logs) even if the analysis is still ongoing.
Access Methods
GET
or POST
URL Parameters
The following URL parameters are available.
- timeoutMs
-
A timeout value in milliseconds. If the analysis completes prior to the timeout, the result (or an error) is returned. A partial response including the job status, task progress and logs is returned otherwise.
Default value: infinite (blocking call)
- download
-
If
true
, the server will addContent-Disposition
HTTP header with the suggested file name to save the result of the analysis to.Default value:
false
Request Body
None
Response
Full request response for completed analyses, partial response (log
and status
) for analyses in progress or an
error response
in case validation or execution errors.
Errors
See error response handling section.
Examples
See the full asynchronous analysis workflow example here.
Handling Errors
All analysis endpoints can return a variety of different errors, depending on what caused them and when the error occurred. Certain errors are signalled early (for example, invalid JSON or incorrect reference structure), other errors may occur later, at the time of computing results (certain errors are only identifiable during execution time).
In synchronous mode, the errors are returned directly in the returned response. In
asynchronous mode, errors can be returned when the analysis is started or when the returned
analysis resource endpoints (/analysis/{id}/*
) are accessed (status, result polling).
The following sections discuss the potential HTTP status codes that can be returned by the API and the response messages associated with these errors.
400 (Bad Request)
Indicates a problem with parsing the request, validation of arguments or an unrecoverable problem during request execution.
The body of the HTTP response will contain an analysis response JSON with
status
and
log
elements containing more details. For example, an invalid input (JSON parsing exception) could return the
following message:
{
"status" : {
"status" : "FAILED",
"error" : "Invalid analysis request configuration"
},
"log" : [
{
"level" : "ERROR",
"code" : "E001",
"message" : "JSON parse error.",
"details" : {
"description" : "Unexpected character (',' (code 44)): was expecting a colon to separate field name and value",
"json" : "{\n Ooops, this is not valid json.\n}\n",
"line" : 2,
"column" : 9,
"offset" : 10
}
}
]
}
Note that the error log can contain more than one entry. Here is an example response for a request with two violated parameter constraints:
{
"stages": {
"stage1": {
"type" : "documents:byQuery",
"limit": -1,
"query": {
"type": "query:string",
"query": "photon"
}
},
"stage2": {
"type" : "documents:byQuery",
"limit": -2,
"query": {
"type": "query:string",
"query": "ray"
}
}
}
}
A full reference of potential error log codes is provided in the analysis response documentation.
404 (Not Found)
This error code can be returned if the returned asynchronous analysis endpoint expires from internal caches before its result is accessed. You should poll the result of asynchronous analyses in regular intervals to prevent analyses from being deleted from server caches.
This type of error has an empty response body.
500 (Internal Server Error)
This type of error signals an unexpected exception in Lingo4G that is most likely a software defect. Please report to info@carrotsearch.com, ideally together with the request that caused the problem.