2.2.x release notes

Release notes for Lingo4G 2.2.x.

Version 2.2.0

Lingo4G 2.2.0 comes with significant updates to learning and storage of document embeddings vectors allowing indexes of any size to take advantage of vector-based analyses.

Compatibility

Project descriptor

Updates required. Lingo4G 2.2.0 comes with significant improvements in document embedding learning and storage. As a result, it removes support for the project descriptor properties listed below. If your project uses any of those properties, remove them to make the descriptor compatible with Lingo4G 2.2.0.

Properties to remove Explanation

In the embeddings.documents.index block:

construction​Neighborhood​Size

Lingo4G 2.2.0 computes the appropriate value automatically.

In the embeddings.labels.index block:

construction​Neighborhood​Size

Reindexing

Required. Lingo4G 2.2.0 changes the internal format of the index files and will not work with indices created by Lingo4G 2.1.x.

New features

Off-heap document embeddings

Lingo4G now memory-maps document embedding vectors rather than load them into JVM heap (Java memory). As a result, learning and analyses based on document vectors are not limited by the available Java heap size.

If you used document embeddings and were previously forced to increase JVM heap size for the embeddings to fit in Java memory, consider lowering JVM heap to give the operating system more space for memory-mapped disk buffers.

Incremental document embeddings

Lingo4G now stores document embeddings independently for each index segment and automatically computes embedding vectors for new documents added during incremental indexing.

Improvements

Improved kNN indexing

Lingo4G 2.2.0 significantly improves the indexing- and analysis-time kNN index creation while maintaining high kNN search quality.

JSON document source parsing

The JSON document source is now less lenient in parsing input JSON files and will terminate if any of the input files is corrupted or cannot be read.

You can revert to previous behavior by setting the fail​On​Broken​Json option on the document source.

Dependency updates

Lingo4G 2.2.0 updates HPPC to version 0.10.0.