2.1.x release notes
Release notes for Lingo4G 2.1.x.
Version 2.1.2
Lingo4G 2.1.2 adds network interface binding option --host
to both the
server
and
server-shutdown
commands.
Compatibility
Lingo4G 2.1.2 is fully backward-compatible with the 2.1.1 release and works with indices created by the 2.1.1 release.
Version 2.1.1
Lingo4G 2.1.1 updates the dotAtlas visualization to fix label rendering artifacts and correctly handle browser viewport zooming in the 2d map visualizations.
Compatibility
Lingo4G 2.1.1 is fully backward-compatible with the 2.1.0 release and works with indices created by the 2.1.0 release.
Version 2.1.0
Lingo4G 2.1.0 significantly improves document indexing by suppressing truncated phrase (such as Association for Computing as opposed to Association for Computing Machinery) and learning high-quality embeddings for low-frequency labels.
Compatibility
- Project descriptor
-
Updates required. Lingo4G 2.1.0 comes with significant improvements in phrase extraction and embedding learning. As a result, it removes support for the project descriptor properties listed below. If your project uses any of those properties, remove them to make the descriptor compatible with Lingo4G 2.1.0.
Properties to remove Explanation In the embeddings.labels.input block:
In the previous versions, those properties configured label embedding learning. Lingo4G 2.1.0 comes with an updated learning algorithm in which those properties are not required.
max​Labels
min​Df
min​Top​Df
min​Labels​Percent​Per​Document
- Strict JSON parsing
-
Updates required. Starting with version 2.1.0, Lingo4G requires strictly valid JSON. Lingo4G no longer accepts unquoted properties, comments and single-quoted strings.
Strict JSON parsing applies across all Lingo4G resources, including project descriptor, analysis API v1 and API v2 requests and external resources. If any of those files in your project contains non-standard JSON syntax, remove that syntax for Lingo4G 2.1.0 to accept the files.
- Reindexing
-
Required. Lingo4G 2.1.0 changes the internal format of the index files and will not work with indices created by Lingo4G 2.0.x.
- Default heap size
-
Lingo4G 2.1.0 changes the default value of the
L4​G_​O​P​T​S
variable from-​Xmx4g
(setting an explicit heap limit of 4 gigabytes) to an empty string. This causes Java to determine the default and maximum heap size according to garbage colloector ergonomics as a dynamically computed fraction of the memory available to the process.This change does not require any action. If you experience problems, set the
L4​G_​O​P​T​S
environment variable explicitly.
Improvements
- Suppression of truncated phrases
-
As of version 2.1.0, Lingo4G improves the quality of labels by suppressing incomplete phrases at indexing time. Previous versions were likely to extract sub-phrases of longer phrases, such as Association for Computing, in addition to the more meaningful longer phrases, such as Association for Computing Machinery.
Version 2.1.0 by default extracts only the longer and more meaningful phrases. Setting the
skip​Subphrases
property tofalse
reverts indexing to the previous behaviour. However, we recommend leaving the property at its default value oftrue
for higher-quality labels. - Label embedding improvements
-
Version 2.1.0 comes with significant improvements to learning label embedding vectors. Lingo4G now splits learning into two phases: direct learning of vectors for high-frequency labels and estimation of vectors for low-frequency labels.
Lingo4G 2.1.0 can compute high-quality embeddings for the long tail of low-frequency labels the previous versions would ignore.
The following properties configure the new label embedding learning algorithm:
max​Labels​For​Direct​Learning
,min​Label​Tf​For​Direct​Learning
andmin​Label​Tf​For​Estimated​Learning
. - Faster query highlighting
-
Version 2.1.0 improves the performance of label and search term highlighting in text.
- Eager document content retrieval
-
Version 2.1.0 adds an option to choose between the streaming and eager document content retrieval mode.
- Dependency updates
-
Lingo4G 2.1.0 updates Lucene to version 9.9.2.
Bug fixes
-
--output
option inrun-request
works incorrectly -
The
--output
option incorrectly saved the request's result to the parent directory of the provided location.