If you intend to run Lingo3G clustering in parallel threads, make sure your code follows the concurrency guidelines outlined in this article.
Lingo3G Java API follows the following contracts with respect to thread-safety:
Lingo3GClusteringAlgorithminstances are not thread-safe – your code must not share them among parallel threads.
LanguageComponentsinstance is thread-safe – your code should share and reused it among parallel threads.
In other words, if your code needs to cluster data in parallel threads,
each thread should "own" its own
Lingo3GClusteringAlgorithm instance. All threads should reuse
LanguageComponent instance loaded beforehand.
The following sections show two approaches to configuring Lingo3G algorithm instance once and then reusing it in subsequent, possibly concurrent, clustering calls.
The simplest way to ensure thread-safety is to create and configure a
instance on the fly and discard it after the clustering completes.
The following example defines a function that transforms a stream of documents into a list of clusters:
Note that the code loads a
LanguageComponents instance once
and then shares it among all parallel threads for reuse.
Cloning preconfigured instances
If the configuration of your
instance is complex or you would like to decouple it from the actual
clustering, your code can do the following:
Create and configure a "blueprint"
Attrs.toMapmethod to convert the "blueprint" instance into a
Mapfor sharing among concurrent threads.
In each thread, use the
Attrs.fromMapmethod to create a clone of the "blueprint" instance.
The following example demonstrates this approach:
Note that the parallel threads do not use the "blueprint" instance directly as it is not thread-safe. Instead, they create a disposable clone for each clustering call.