Stages

Lingo4G analysis consists of one or more stages. A stage represents one specific text processing operation, such as query-based document search, extracting labels from a list of documents, clustering or 2d-mapping of documents or labels.

Each stage produces one type of result, such as a list of documents, a list of labels or a cluster tree. Most real-world analyses consist of a pipeline of stages, where one stage produces results consumed by stages further down the chain.

One-stage request

An example one-stage analysis request JSON may look like this:

{
  "stages": {
    "documents": {
      "type": "documents:byQuery",
      "query": {
        "type": "query:string",
        "query": "photon"
      },
      "limit": 1000
    }
  }
}

The above request consists of the following properties:

stages

A JSON object containing all stages of the request. Property names in the object are stage identifiers, values are stage definitions.

documents

Defines a stage that performs a query-based document search.

type

The type determines the specific operation the stage performs. Each stage definition object must contain the type property.

By convention, most Lingo4G stage types consist of two parts separated by the : character, for example documents:​by​Idorlabels:​from​Documents. The first part indicates the general type of results provided by the stage, such as document set or list of labels. The second part determines the specific type. For stages returning documents, this could be documents:​by​Query or documents:​sample.

query

Defines the query to execute. This property is specific to the documents:​by​Query stage. Note that query itself also has the type property – there are a number of different kinds of queries you can use.

limit

Determines the maximum number of documents to return. Again, this property is specific to the documents:​by​Query stage.

To execute the analysis request, perform the following steps:

  1. Index some data set and start Lingo4G server.

  2. Open JSON Sandbox app in a modern browser: http://localhost:8080/#/code.

  3. Paste the analysis JSON into the code editor. You may need to modify the search query to match the dataset you are processing. Press the Execute button to run the request.

Once the request executes, you should see a screen similar to:

Lingo4G JSON sandbox app, documents by query request results (light theme).
Lingo4G JSON sandbox app, documents by query request results (dark theme).

Lingo4G JSON Sandbox app showing a simple analysis request (on the left) and response JSON (on the right).

The panel on the right shows the response JSON received from the Lingo4G server. The result object contains results of each stage defined in the analysis request. In our case, the only property of the object is documents, which represents the result of the document search stage we defined. The structure of the result object is again dependent on the type of the stage. In case of the documents:​by​Query stage, the object contains the number of matches and a list of document identifiers along with search scores.

Multi-stage request

Non-trivial analysis requests usually require a chain of connected stages. For example, if we wanted to extract a list of phrases from a set of documents, we'd need two stages: one stage to return a list of documents and another one to extract labels from those documents.

One way to connect analysis stages is to in-line one stage definition as part of the definition of another stage. The following request contains a top-level stage named labels of type labels:​from​Documents, which requires a list of documents in its documents property. In the example, we in-line the documents:​by​Query stage definition there.

{
  "stages": {
    "labels": {
      "type": "labels:fromDocuments",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 1000
      }
    }
  }
}

A request that extracts a list of labels from the set of documents matching the photon query.

When you execute the request, you should see a list of labels in the labels property of the result response object. You can also click the labels tab to see the labels in a graphical form.

Lingo4G JSON sandbox app, labels from documents (light theme).
Lingo4G JSON sandbox app, labels from documents  (dark theme).

A request that extracts a list of labels from a list of documents along with a tabular presentation of the labels.

Default property values