REST API basics

You can use the REST API to call Lingo3G clustering from non-Java languages. This article explains the basic REST API workflow.

Installation and running

  1. Install Lingo3G on your machine.

  2. Start the Lingo3G Document Clustering Server (DCS) application located in the dcs/ folder of your Lingo3G installation.

    • On Windows, run the dcs.cmd script.
    • On Linux and Mac, run the dcs script.

    If the DCS starts successfully, you should see a terminal window with messages similar to the following:

    16:59:55: DCS context initialized [algorithms: [Lingo3G], templates: [frontend-default]]
    16:59:55: Service started on port 8080.
    16:59:55: The following contexts are available:
      http://localhost:8080/          DCS Root
      http://localhost:8080/doc       Documentation
      http://localhost:8080/frontend  End-user apps
      http://localhost:8080/javadoc   Java API Javadoc
      http://localhost:8080/service   REST API

    The DCS binds to port 8080 by default. To select a different port, pass the --port option to the launch script, for example:

    > dcs --port 8081

REST endpoints

Lingo3G REST API is available at the /service URL prefix and exposes the following endpoints:

/cluster
Clusters the documents you provide.
/list
Returns the list of clustering algorithms and languages the REST API supports.

Lingo3G REST API is stateless — the clustering results depend only on the contents of the documents you provide. This means, you can easily load-balance multiple DCS instances for redundancy and performance reasons.

The following sections show how to invoke Lingo3G clustering through the REST API. For detailed information and more examples, see the OpenAPI Swagger documentation (or alternatively RapiDoc, ReDoc, OpenAPI YAML).

/cluster

Clusters the documents you provide. To invoke clustering, make a POST request to the /cluster endpoint with a JSON object like this:

POST /service/cluster HTTP/1.1
Host: localhost:8080
Content-Type: text/json

{
  "language": "English",
  "algorithm": "Lingo3G",
  "parameters": {
    "clusters": {
      "maxClusterSize": 0.8
    }
  },
  "documents": [
    {
      "title": "Data Mining in Python",
      "content": "Collection of libraries for machine learning."
    },
    {
      "title" : "KDD Lab: knowledge discovery from large spatial data."
    },
    {
      "title": "Data Mining",
      "snippet": [
        "Data mining uses machine learning ...",
        "... automated knowledge discovery tools."
      ]
    }
  ]
}

Example /cluster request headers and body.

The request JSON object can contain the following properties:

language

The language in which to perform clustering. Use/list to get the list of supported languages.

algorithm

The clustering algorithm to use, set to Lingo3G.

parameters

Lingo3G parameter overrides. The object you provide must follow the Lingo3G parameters object structure. You can provide only the parameters you would like to customize.

You can export the parameters JSON object from Lingo3G Clustering Workbench.

documents

An array of documents for clustering. Each array element must be an object representing one document. Each document can define one or more fields to cluster. Field names can be arbitrary, such as title or body, values must be strings or arrays of strings.

Lingo3G clusters all of the document content you provide. If your data contains more fields for presentation purposes, exclude them from the clustering request.

See Documents for clustering for recommendations about the content you submit for clustering.

All properties of the request JSON are optional in the standard configuration of Lingo3G DCS. The default value of language is English, algorithm is Lingo3G and parameters and documents are empty. Therefore, the minimal meaningful request can contain only the documents array. To change the request property defaults or create custom request templates, see Request templates.

When making the request, you must set the Content-Type header to text/json.

If clustering is successful, the response is a JSON object like this:

{
  "clusters" : [
    {
      "labels" : [
        "Data Mining"
      ],
      "documents" : [
        0,
        2
      ],
      "clusters" : [ ],
      "score" : 1.0
    },
    {
      "labels" : [
        "Knowledge Discovery"
      ],
      "documents" : [
        1,
        2
      ],
      "clusters" : [ ],
      "score" : 0.9829125404836723
    }
  ]
}

Example /cluster response.

The response JSON has the following properties:

clusters

An array of top-level clusters. Each cluster has the following properties:

labels

One or more labels describing the cluster.

documents

An array of indices of documents in the cluster. Each element is a 0-based index of the document in the documents array you provided in the clustering request.

clusters

An array of subclusters of this cluster. The array may be empty if there was not enough data to create a cluster hierarchy or you disabled hierarchical clustering.

score

The cluster's quality score. The score is not normalized in any way but represents relative quality of each cluster within this response.

The example JSON response contains two clusters: Data Mining and Knowledge Discovery, each containing two documents. Document 2 is present in both clusters.

To send a clustering request to the REST API running on localhost:8080 with curl, use the following command:

curl -X POST --header "Content-Type: text/json" --data-binary @cluster-request.json "http://localhost:8080/service/cluster?indent"

Calling REST API using curl.

/list

Returns the list of supported clustering algorithm-language pairs. To fetch the list, make a GET request at the /list endpoint. The JSON response is a JSON object like this:

{
  "algorithms" : {
    "Lingo3G" : [
      "Dutch",
      "English"
    ]
  },
  "templates" : [
    "frontend-default"
  ]
}

Example /list response.

Note that each algorithm has an associated list of language codes it supports. The templates array lists the available request templates .

Java models

If your software is Java-based, instead of the direct Lingo3G Java API, you may choose to call Lingo3G REST API from your Java code. In this case, instead of handling JSON creation and parsing by hand, you can use Lingo3G model classes like this:

Lingo3GClusteringAlgorithm algorithm = new Lingo3GClusteringAlgorithm();
algorithm.clusters.maxClusterSize.set(1.0);

ClusterRequest request = new ClusterRequest();
request.algorithm = Lingo3GClusteringAlgorithm.NAME;
request.language = "English";
request.parameters = Attrs.extract(algorithm);
request.documents =
    Stream.of(
            "Data Mining in Python",
            "Knowledge Discovery and Data Mining Lab",
            "Knowledge Discovery tools")
        .map(
            value -> {
              ClusterRequest.Document doc = new ClusterRequest.Document();
              doc.setField("title", value);
              return doc;
            })
        .collect(Collectors.toList());

Using Java model classes to build Lingo3G REST API request JSONs.

You can serialize model instances into JSON using the Jackson library. See the examples-dcs folder of Lingo3G distribution for complete working examples of this approach.