2.2.x release notes
Release notes for Lingo4G 2.2.x.
Version 2.2.0
Lingo4G 2.2.0 comes with significant updates to learning and storage of document embeddings vectors allowing indexes of any size to take advantage of vector-based analyses.
Compatibility
- Project descriptor
-
Updates required. Lingo4G 2.2.0 comes with significant improvements in document embedding learning and storage. As a result, it removes support for the project descriptor properties listed below. If your project uses any of those properties, remove them to make the descriptor compatible with Lingo4G 2.2.0.
Properties to remove Explanation In the embeddings.documents.index block:
construction​Neighborhood​Size
Lingo4G 2.2.0 computes the appropriate value automatically.
In the embeddings.labels.index block:
construction​Neighborhood​Size
- Reindexing
-
Required. Lingo4G 2.2.0 changes the internal format of the index files and will not work with indices created by Lingo4G 2.1.x.
New features
- Off-heap document embeddings
-
Lingo4G now memory-maps document embedding vectors rather than load them into JVM heap (Java memory). As a result, learning and analyses based on document vectors are not limited by the available Java heap size.
If you used document embeddings and were previously forced to increase JVM heap size for the embeddings to fit in Java memory, consider lowering JVM heap to give the operating system more space for memory-mapped disk buffers.
- Incremental document embeddings
-
Lingo4G now stores document embeddings independently for each index segment and automatically computes embedding vectors for new documents added during incremental indexing.
Improvements
- Improved kNN indexing
-
Lingo4G 2.2.0 significantly improves the indexing- and analysis-time kNN index creation while maintaining high kNN search quality.
- JSON document source parsing
-
The JSON document source is now less lenient in parsing input JSON files and will terminate if any of the input files is corrupted or cannot be read.
You can revert to previous behavior by setting the
fail​On​Broken​Json
option on the document source. - Dependency updates
-
Lingo4G 2.2.0 updates HPPC to version 0.10.0.