Dictionaries

Lingo3G uses dictionaries to improve the quality of clustering for a specific language. This article shows how to customize dictionaries in the REST API.

Customizing global dictionaries

To customize the global dictionaries for REST API clustering, go to the DCS dictionaries folder and edit the files as required. Once you finish editing, restart the DCS for the changes to take effect.

Using per-request dictionaries

You can provide extra per-request dictionary entries for a specific clustering request. Lingo3G applies these extra entries as an addition to the default dictionaries. For example, if the end-user wants to remove specific labels from the clustering result they are currently viewing, your software can add such labels to the per-request label dictionary and rerun the clustering.

The dictionaries section of the Lingo3G parameters object groups all per-request dictionaries. You can use any combination of per-request label, synonym and tag dictionaries.

Label dictionaries

To add per-request entries to the label dictionary, use the dictionaries.labels parameter.

In the following example, without the per-request label dictionary, Lingo3G creates a cluster called Configuration Issue on Windows. Adding two entries to the per-request dictionary replaces the label with Windows:

{
  "language": "English",
  "algorithm": "Lingo3G",
  "parameters": {
    "clusters": {
      "maxClusterSize": 0.8
    },
    "dictionaries": {
      "labels": [
        {
          "glob": [
            "* issue *",
            "configuration"
          ]
        }
      ]
    }
  },
  "documents": [
    { "title": "PDF Viewer configuration issue on Windows" },
    { "title": "Firefox plugin configuration issue on Windows" },
    { "title": "CPU usage for flash in Firefox" }
  ]
}

Example /cluster request with per-request label dictionary entries.

Synonym dictionaries

To add per-request entries to the synonym dictionary, use the dictionaries.synonyms parameter.

In the following example, adding Chrome and Firefox as synonyms puts documents containing either of these words in the same cluster.

{
  "language": "English",
  "algorithm": "Lingo3G",
  "parameters": {
    "clusters": {
      "maxClusterSize": 0.8,
      "allowOneDocumentClusters": true
    },
    "dictionaries": {
      "tags": [
        {
          "tag": "name",
          "words": [
            "chrome",
            "firefox"
          ]
        }
      ]
    }
  },
  "documents": [
    { "title": "PDF Viewer configuration on Chrome Windows" },
    { "title": "Chrome plugin configuration issue on Windows" },
    { "title": "CPU usage for flash in Firefox" }
  ]
}

Example /cluster request with per-request synonym dictionary entries.

Note that setting theclusters.allowOneDocumentClusters to true is required due to the small number of documents in the example. The parameter causes Lingo3G to preserve during processing words with one occurrence, such as Firefox and Chrome in the example. If the example left allowOneDocumentClusters at its default value of false, documents containing Chrome an Firefox would not end up in the same cluster because Lingo3G would filter out the one-occurrence words before applying synonyms.

Tag dictionaries

To add per-request entries to the word tag dictionary, use the dictionaries.tags parameter.

In the following example, adding Chrome and Firefox to the tags dictionary with the name tag slightly promotes the two words in cluster labels. As a result, Lingo3G chooses Chrome and Firefox over some other words to label the clusters.

{
  "language": "English",
  "algorithm": "Lingo3G",
  "parameters": {
    "clusters": {
      "maxClusterSize": 0.8,
      "allowOneDocumentClusters": true
    },
    "dictionaries": {
      "tags": [
        {
          "glob": [
            "Chrome", "Firefox"
          ]
        }
      ]
    }
  },
  "documents": [
    { "title": "PDF Viewer configuration on Windows" },
    { "title": "Chrome plugin configuration issue on Windows" },
    { "title": "CPU usage for flash in Firefox" }
  ]
}

Example /cluster request with per-request tag dictionary entries.