Lingo3G uses dictionaries to improve the quality of clustering for a specific language. This article shows how to customize dictionaries in the Java API.
Lingo3G delegates the management of dictionaries and other language
resources to the
LanguageComponent instance. You can use its
methods to list the available languages, customize the location of
dictionaries or even replace the Lingo3G's built-in language resources, such
as stemmers. This chapter shows the basic use cases related to dictionary
The simplest way to customize Lingo3G
dictionaries is to copy the
to your application-specific location, make the necessary changes and
provide a custom
ResourceLookup implementation when loading
language resources. The following example loads English resources from a
class-relative classpath location.
Using ephemeral dictionaries
You can provide extra ephemeral dictionary entries for a specific clustering request. Lingo3G applies these extra entries as an addition to the default dictionaries. For example, if the end-user wants to remove specific labels from the clustering result they are currently viewing, your software can add such labels to the ephemeral label dictionary and rerun the clustering.
dictionaries field of
groups all ephemeral dictionaries. You can add entries to the label,
synonym and word dictionaries.
To add an ephemeral label dictionary, create a new instance of the
LabelMatcher class and add entries to its
exact fields. Then
set the matcher instance on your
The following example adds two entries to the glob matcher of the label dictionary.
LabelMatcher instances assign a zero weight to
the labels they match, removing them from the result. You can set the
weight field of the matcher to a value larger than 1 to
promote the labels.
To add an ephemeral synonym dictionary, do the following:
For each set of synonymous labels, create a
SynonymSetinstance, providing the list of labels Lingo3G should treat as synonymous. Note that Lingo3G allows only glob-style label matchers in synonym definitions.
Additionally, you can provide one label to represent all the synonymous labels in cluster labels.
Set a list of synonym sets on the
dictionaries.synonymsfield of your
The following example adds one entry making the software and tools label synonymous and represented by Tools & Software for cluster labeling purposes.
To add an ephemeral tag dictionary, create one or more instances of the
Tag class defining the list of words and the part-of-speech
tag for those words. Then set the list of tags on instance on your
The following example adds the word whereas with the
Listing supported languages
The following code lists all languages supported by Lingo3G:
Lingo3G supports: Arabic, Armenian, Bulgarian, Chinese-Simplified, Chinese-Traditional, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Thai, Turkish
Note that the code uses the
limitToAlgorithms method to limit
the list to the languages Lingo3G supports. The unfiltered list contains
all languages defined in the
framework; Lingo3G does not support some of those languages.