/v2/analysis/*

Validates and executes a new analysis request.

You can use this endpoint in the asynchronous or synchronous (blocking) mode. The synchronous mode is much easier: the request blocks until the result (or an error) is returned. In the asynchronous mode, Lingo4G returns a new analysis resource URI. This URI is a prefix of all resource-specific API endpoints for polling analysis status and progress, and eventually retrieving of the result.

This chapter is split into the following subsections for clarity:

Request and Response JSONs

The REST API is essentially about submitting a valid analysis request JSON to a remote service that computes its result and returns the analysis response JSON.

Regardless of the mode of execution (synchronous or asynchronous), an analysis request always includes the specification of stages to be processed. Here is an example request returning the title field of the first three documents matching the query photon:

{
  "stages": {
    "documents": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      },

      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {}
        }
      }
    }
  }
}

Shown below for quick skimming through, is the full analysis API response for this request:

{
  "result" : {
    "documents" : {
      "documents" : [
        {
          "id" : 482237,
          "fields" : {
            "title" : {
              "values" : [
                "Jets in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collisions"
              ]
            }
          }
        },
        {
          "id" : 298152,
          "fields" : {
            "title" : {
              "values" : [
                "Studying 750 GeV Di-⁌q⁍photon⁌\\q⁍ Resonance at ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collider"
              ]
            }
          }
        },
        {
          "id" : 275187,
          "fields" : {
            "title" : {
              "values" : [
                "Two-⁌q⁍photon⁌\\q⁍ interference of temporally separated ⁌q⁍photons⁌\\q⁍"
              ]
            }
          }
        }
      ]
    }
  },
  "status" : {
    "status" : "AVAILABLE",
    "elapsedMs" : 1,
    "tasks" : [
      {
        "name" : "documents → documents:byQuery",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1705010276498,
        "elapsedMs" : 0,
        "tasks" : [
          {
            "name" : "Selecting documents (byQuery)",
            "status" : "SKIPPED",
            "tasks" : [ ],
            "attributes" : [
              {
                "name" : "Skipped",
                "value" : "cached"
              }
            ]
          }
        ],
        "attributes" : [
          {
            "name" : "Skipped",
            "value" : "cached"
          }
        ]
      },
      {
        "name" : "documents",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1705010276497,
        "elapsedMs" : 0,
        "tasks" : [ ],
        "attributes" : [ ]
      }
    ]
  },
  "log" : [ ]
}

An analysis request JSON can get quite complex: include multiple stages, nested structures or cross-reference links. In the (relatively simple) example above, we use the top-level document​Content stage, which has a nested documents:​by​Query component.

The detailed description of how analysis requests and responses are structured goes beyond the scope of this REST API reference. See the Analysis JSON overview and the following chapters for a tutorial on how to build Lingo4G analysis requests. See the Analysis response chapter for a detailed description of the analysis response JSON. We highly recommend going through those parts of the documentation first, then playing with various requests in the JSON Sandbox app before proceeding to writing HTTP REST API clients. A solid understanding of the request/ response JSON structure will make reading through the documentation of the REST API much easier.

Synchronous mode

This section describes the analysis endpoint in blocking (synchronous) mode.

Access Methods

P​O​S​T

URL Parameters

The following URL parameters are available.

async

Must be set to false to force blocking mode. Note the default is true, which implies asynchronous mode.

download

If true, the server will add Content-​Disposition HTTP header with the suggested file name to save the result of the analysis to.

Default value: false

Request Body

The request body must contain exactly one JSON object with a complete analysis request to be executed.

The request should specify an appropriate Content-​Type header equal to application/json.

Response

The HTTP Found (200) status code is returned upon successful validation and execution of the analysis request. The response will contain the analysis result JSON.

Analysis progress is not available in synchronous mode. Make sure the HTTP connection has long timeouts or use the asynchronous mode.

Errors

See the analysis error handling section.

Examples

Given the following valid analysis request in a file named analysis-synchronous.request.json:

{
  "stages": {
    "documents": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      },

      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {}
        }
      }
    }
  }
}

this curl command posts it to the analysis endpoint in blocking mode:

curl -XPOST -H "Content-Type: application/json" --max-time 180 --data @analysis-synchronous.request.json http://localhost:8080/api/v2/analysis?async=false

Note the --max-time parameter: because the request blocks, an increased timeout may be required to prevent the client from terminating the connection before Lingo4G finishes computing the result of a long-running analysis.

The HTTP request sent to the server looks like this:

POST /api/v2/analysis?async=false HTTP/1.1
Content-Type: application/json

{
  "stages": {
    "documents": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      },

      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {}
        }
      }
    }
  }
}

And the server replies with the following response:

HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked

{
  "result" : {
    "documents" : {
      "documents" : [
        {
          "id" : 188201,
          "fields" : {
            "title" : {
              "values" : [
                "⁌q⁍Photons⁌\\q⁍, ⁌q⁍Photon⁌\\q⁍ Jets and Dark ⁌q⁍Photons⁌\\q⁍ at 750 GeV and Beyond"
              ]
            }
          }
        },
        {
          "id" : 62168,
          "fields" : {
            "title" : {
              "values" : [
                "Final States in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ and ⁌q⁍Photon⁌\\q⁍-Proton Interactions"
              ]
            }
          }
        },
        {
          "id" : 252264,
          "fields" : {
            "title" : {
              "values" : [
                "Two-⁌q⁍Photon⁌\\q⁍ Processes and ⁌q⁍Photon⁌\\q⁍ Structure"
              ]
            }
          }
        }
      ]
    }
  },
  "status" : {
    "status" : "AVAILABLE",
    "elapsedMs" : 54,
    "tasks" : [
      {
        "name" : "documents → documents:byQuery",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1682335062066,
        "elapsedMs" : 36,
        "tasks" : [
          {
            "name" : "Selecting documents (byQuery)",
            "status" : "DONE",
            "startedAt" : 1682335062069,
            "elapsedMs" : 32,
            "tasks" : [ ],
            "attributes" : [
              {
                "name" : "Limit",
                "value" : "3"
              },
              {
                "name" : "Document scores",
                "value" : "yes"
              },
              {
                "name" : "Accurate hit count",
                "value" : "no"
              },
              {
                "name" : "Total hits (approximation)",
                "value" : "1,008"
              }
            ]
          }
        ],
        "attributes" : [ ]
      },
      {
        "name" : "documents",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1682335062048,
        "elapsedMs" : 54,
        "tasks" : [ ],
        "attributes" : [ ]
      }
    ]
  },
  "log" : [ ]
}

Asynchronous mode

This section describes the analysis endpoint in asynchronous mode. This mode is more suitable for production purposes as it allows tracking partial completion progress for an analysis running on the server.

Access Methods

P​O​S​T

URL Parameters

The following URL parameters are available.

async

Must be set to true or omitted entirely (the default value is true).

download

Not supported in asynchronous mode.

Request Body

The request body must contain exactly one JSON object with a complete analysis request to be executed.

The request should specify an appropriate Content-​Type header equal to application/json.

Response

The HTTP Accepted (202) status code is returned upon successful validation of the request. The response body is empty and the HTTP Location header points at the newly created resource URI to track the analysis executing asynchronously.

The status of the asynchronously running analysis can be checked with the /v2/analysis/{id} endpoint.

The result of the asynchronously running analysis can be fetched with the /v2/analysis/{id}/result endpoint.

Once the result has been downloaded and the analysis is no longer needed, it should be deleted to release associated caches (see D​E​L​E​T​E HTTP method on /v2/analysis/{id}).

Errors

The asynchronous endpoint can return an error response immediately if validation errors occurred, or later, from the returned analysis resource URI. See the analysis error handling section for more information.

Examples

Given the following valid analysis request in a file named analysis-asynchronous.request.json:

{
  "stages": {
    "documents": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      },

      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {}
        }
      }
    },
    "delay": {
      "type": "debug:progress",
      "tasks": [
        {
          "name": "Delay request execution",
          "durationMs": 1000
        }
      ]
    }
  }
} 

this curl command posts it to the analysis endpoint in asynchronous mode:

curl --include -XPOST -H "Content-Type: application/json" --data @analysis-asynchronous.request.json http://localhost:8080/api/v2/analysis

The --include option forces curl to display the Location and other response headers, so you can see the URI to the newly created analysis resource. The above curl command results in the following request sent to the server:

POST /api/v2/analysis HTTP/1.1
Content-Type: application/json

{
  "stages": {
    "documents": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      },

      "fields":{
        "type": "contentFields:simple",
        "fields": {
          "title": {}
        }
      }
    },
    "delay": {
      "type": "debug:progress",
      "tasks": [
        {
          "name": "Delay request execution",
          "durationMs": 1000
        }
      ]
    }
  }
} 

And the server replies with the following response (note the Location header; the exact analysis URI will differ from request to request):

HTTP/1.1 202 
content-length: 0
location: http://localhost:58845/api/v2/analysis/a0cf0c53d2c32eed

The returned Location URI is the prefix of all other API endpoints you can use to poll for the status and result of this analysis. Observe that, in the example above, we intentionally added the debug:​progress stage that takes 1 second to compute. We can now ask for the result of the analysis we have just started with a timeout of 100 milliseconds:

GET /api/v2/analysis/a0cf0c53d2c32eed/result?timeoutMs=100 HTTP/1.1

As expected, the result for this analysis is not yet available, as shown by the returned status (P​R​O​C​E​S​S​I​N​G):

HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked

{
  "status" : {
    "status" : "PROCESSING",
    "elapsedMs" : 107,
    "tasks" : [
      {
        "name" : "delay",
        "status" : "STARTED",
        "progress" : 0.0,
        "startedAt" : 1682335062168,
        "elapsedMs" : 107,
        "tasks" : [
          {
            "name" : "Delay request execution",
            "status" : "STARTED",
            "progress" : 0.0,
            "startedAt" : 1682335062169,
            "elapsedMs" : 106,
            "tasks" : [ ],
            "attributes" : [ ]
          }
        ],
        "attributes" : [ ]
      },
      {
        "name" : "documents → documents:byQuery",
        "status" : "NEW",
        "tasks" : [
          {
            "name" : "Selecting documents (byQuery)",
            "status" : "NEW",
            "tasks" : [ ],
            "attributes" : [ ]
          }
        ],
        "attributes" : [ ]
      },
      {
        "name" : "documents",
        "status" : "NEW",
        "tasks" : [ ],
        "attributes" : [ ]
      }
    ]
  },
  "log" : [ ]
}

If the analysis has completed or completes within the provided deadline, the result is returned as part of the response. Let's query the same analysis again, this time with an indefinite timeout:

GET /api/v2/analysis/a0cf0c53d2c32eed/result HTTP/1.1

The returned response now contains the result and the status is A​V​A​I​L​A​B​L​E:

HTTP/1.1 200 OK
content-type: application/json
transfer-encoding: chunked

{
  "result" : {
    "documents" : {
      "documents" : [
        {
          "id" : 188201,
          "fields" : {
            "title" : {
              "values" : [
                "⁌q⁍Photons⁌\\q⁍, ⁌q⁍Photon⁌\\q⁍ Jets and Dark ⁌q⁍Photons⁌\\q⁍ at 750 GeV and Beyond"
              ]
            }
          }
        },
        {
          "id" : 62168,
          "fields" : {
            "title" : {
              "values" : [
                "Final States in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ and ⁌q⁍Photon⁌\\q⁍-Proton Interactions"
              ]
            }
          }
        },
        {
          "id" : 252264,
          "fields" : {
            "title" : {
              "values" : [
                "Two-⁌q⁍Photon⁌\\q⁍ Processes and ⁌q⁍Photon⁌\\q⁍ Structure"
              ]
            }
          }
        }
      ]
    },
    "delay" : {
      "completed" : true
    }
  },
  "status" : {
    "status" : "AVAILABLE",
    "elapsedMs" : 1027,
    "tasks" : [
      {
        "name" : "delay",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1682335062168,
        "elapsedMs" : 1023,
        "tasks" : [
          {
            "name" : "Delay request execution",
            "status" : "DONE",
            "progress" : 1.0,
            "startedAt" : 1682335062169,
            "elapsedMs" : 1022,
            "tasks" : [ ],
            "attributes" : [ ]
          }
        ],
        "attributes" : [ ]
      },
      {
        "name" : "documents → documents:byQuery",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1682335063195,
        "elapsedMs" : 0,
        "tasks" : [
          {
            "name" : "Selecting documents (byQuery)",
            "status" : "SKIPPED",
            "tasks" : [ ],
            "attributes" : [
              {
                "name" : "Skipped",
                "value" : "cached"
              }
            ]
          }
        ],
        "attributes" : [
          {
            "name" : "Skipped",
            "value" : "cached"
          }
        ]
      },
      {
        "name" : "documents",
        "status" : "DONE",
        "progress" : 1.0,
        "startedAt" : 1682335063192,
        "elapsedMs" : 3,
        "tasks" : [ ],
        "attributes" : [ ]
      }
    ]
  },
  "log" : [ ]
}

The analysis is no longer needed, we can release its resources on the server early by deleting it:

DELETE /api/v2/analysis/a0cf0c53d2c32eed/ HTTP/1.1

the server responds:

HTTP/1.1 200 OK
content-length: 0

It is important to mention again that the /analysis endpoint can return an error immediately when the analysis is started or later — from any endpoint specific to the returned analysis URI.

Asynchronous Analysis Endpoints

API endpoints in this section serve results and status information for analyses started in asynchronous mode. The {id} element of their URI is returned in the Location header of the analysis endpoint response.

/v2/analysis/{id}

Returns just the status block for of an analysis resource started in asynchronous mode.

An alternative to using this endpoint is to call the /v2/analysis/{id}/result endpoint with a timeout: the status block is included in that endpoint's response, even if the analysis hasn't completed yet.

Access Methods

G​E​T or D​E​L​E​T​E

URL Parameters

None

Request Body

None

Response

For HTTP G​E​T method, this API endpoint returns just the status block of the typical analysis response JSON.

For HTTP D​E​L​E​T​E method, HTTP status code 200 (OK) is returned and the analysis is permanently deleted from the server, releasing its resources. We advise to always clean up analyses that will no longer be used to keep server resource usage low.

Errors

See error response handling section.

Examples

See the full asynchronous analysis workflow example here.

/v2/analysis/{id}/result

Returns the full result of an analysis started in asynchronous mode.

This endpoint can be called in blocking mode or with a timeout, which returns partial response (status, logs) even if the analysis is still ongoing.

Access Methods

G​E​T or P​O​S​T

URL Parameters

The following URL parameters are available.

timeoutMs

A timeout value in milliseconds. If the analysis completes prior to the timeout, the result (or an error) is returned. A partial response including the job status, task progress and logs is returned otherwise.

Default value: infinite (blocking call)

download

If true, the server will add Content-​Disposition HTTP header with the suggested file name to save the result of the analysis to.

Default value: false

Request Body

None

Response

Full request response for completed analyses, partial response (log and status) for analyses in progress or an error response in case validation or execution errors.

Errors

See error response handling section.

Examples

See the full asynchronous analysis workflow example here.

Handling Errors

All analysis endpoints can return a variety of different errors, depending on what caused them and when the error occurred. Certain errors are signalled early (for example, invalid JSON or incorrect reference structure), other errors may occur later, at the time of computing results (certain errors are only identifiable during execution time).

In synchronous mode, the errors are returned directly in the returned response. In asynchronous mode, errors can be returned when the analysis is started or when the returned analysis resource endpoints (/analysis/{id}/*) are accessed (status, result polling).

The following sections discuss the potential HTTP status codes that can be returned by the API and the response messages associated with these errors.

400 (Bad Request)

Indicates a problem with parsing the request, validation of arguments or an unrecoverable problem during request execution.

The body of the HTTP response will contain an analysis response JSON with status and log elements containing more details. For example, an invalid input (JSON parsing exception) could return the following message:

{
  "status" : {
    "status" : "FAILED",
    "error" : "Invalid analysis request configuration"
  },
  "log" : [
    {
      "level" : "ERROR",
      "code" : "E001",
      "message" : "JSON parse error.",
      "details" : {
        "description" : "Unexpected character (',' (code 44)): was expecting a colon to separate field name and value",
        "json" : "{\n  Ooops, this is not valid json.\n}\n",
        "line" : 2,
        "column" : 9,
        "offset" : 10
      }
    }
  ]
}

Note that the error log can contain more than one entry. Here is an example response for a request with two violated parameter constraints:

{
  "stages": {
    "stage1": {
      "type" : "documents:byQuery",
      "limit": -1,
      "query": {
        "type": "query:string",
        "query": "photon"
      }
    },
    "stage2": {
      "type" : "documents:byQuery",
      "limit": -2,
      "query": {
        "type": "query:string",
        "query": "ray"
      }
    }
  }
}

A full reference of potential error log codes is provided in the analysis response documentation.

404 (Not Found)

This error code can be returned if the returned asynchronous analysis endpoint expires from internal caches before its result is accessed. You should poll the result of asynchronous analyses in regular intervals to prevent analyses from being deleted from server caches.

This type of error has an empty response body.

500 (Internal Server Error)

This type of error signals an unexpected exception in Lingo4G that is most likely a software defect. Please report to info@carrotsearch.com, ideally together with the request that caused the problem.