Stages
Lingo4G analysis consists of one or more stages. A stage represents one specific text processing operation, such as query-based document search, extracting labels from a list of documents, clustering or 2d-mapping of documents or labels.
Each stage produces one type of result, such as a list of documents, a list of labels or a cluster tree. Most real-world analyses consist of a pipeline of stages, where one stage produces results consumed by stages further down the chain.
One-stage request
An example one-stage analysis request JSON may look like this:
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 1000
}
}
}
The above request consists of the following properties:
stages
-
A JSON object containing all stages of the request. Property names in the object are stage identifiers, values are stage definitions.
documents
-
Defines a stage that performs a query-based document search.
type
-
The type determines the specific operation the stage performs. Each stage definition object must contain the
type
property.By convention, most Lingo4G stage types consist of two parts separated by the
:
character, for exampledocuments:​by​Id
orlabels:​from​Documents
. The first part indicates the general type of results provided by the stage, such as document set or list of labels. The second part determines the specific type. For stages returning documents, this could bedocuments:​by​Query
ordocuments:​sample
. query
-
Defines the query to execute. This property is specific to the
documents:​by​Query
stage. Note that query itself also has thetype
property – there are a number of different kinds of queries you can use. limit
-
Determines the maximum number of documents to return. Again, this property is specific to the
documents:​by​Query
stage.
To execute the analysis request, perform the following steps:
-
Index some data set and start Lingo4G server.
-
Open JSON Sandbox app in a modern browser: http://localhost:8080/#/code.
-
Paste the analysis JSON into the code editor. You may need to modify the search query to match the dataset you are processing. Press the Execute button to run the request.
Once the request executes, you should see a screen similar to:
The panel on the right shows the response JSON received from the Lingo4G server. The result
object
contains results of each stage defined in the analysis request. In our case, the only property of the object is
documents
, which represents the result of the document search stage we defined. The structure of the
result object is again dependent on the type of the stage. In case of the
documents:​by​Query
stage, the object contains the number of matches and a list of document identifiers along with search scores.
Multi-stage request
Non-trivial analysis requests usually require a chain of connected stages. For example, if we wanted to extract a list of phrases from a set of documents, we'd need two stages: one stage to return a list of documents and another one to extract labels from those documents.
One way to connect analysis stages is to in-line one stage definition as part of the definition of another stage.
The following request contains a top-level stage named labels
of type
labels:​from​Documents
, which requires a list of documents in its documents
property. In
the example, we in-line the
documents:​by​Query
stage definition there.
When you execute the request, you should see a list of labels in the
labels
property of the result
response object. You can also click the
labels
tab to see the labels in a graphical form.