2.0.x release notes

Release notes for Lingo4G 2.0.x.

Version 2.0.5

The 2.0.5 release squashes a few minor bugs.

Compatibility

Lingo4G 2.0.5 is backward-compatible with previous 2.0.x releases and works with indices created by any 2.0.x release.

Bug fixes

Slow parsing of requests

API V2 requests containing large arrays of primitive types were slow to parse and process.

Other changes

Version 2.0.4

The 2.0.4 release squashes a few minor bugs.

Compatibility

Lingo4G 2.0.4 is backward-compatible with previous 2.0.x releases and works with indices created by any 2.0.x release.

Bug fixes

--output option fixed in run-request

Previous versions of Lingo4G would ignore the --output option of the run-request command. Version 2.0.4 fixes the issue.

Incorrect rounding of variable values in Explorer

Previous versions of Lingo4G Explorer app may incorrectly round values of certain variables in the request variables editor. Version 2.0.4 fixes the issue.

Improvements

Truncated JSON arrays warning

Lingo4G Explorer now displays a warning if the JSON response view truncates large arrays for better display performance.

Version 2.0.3

The 2.0.3 release brings minor improvements to example data sources and fixes for display problems in the Explorer (map view).

Compatibility

Lingo4G 2.0.3 is backward-compatible with previous 2.0.x releases and works with indices created by the 2.0.0, 2.0.1 and 2.0.2 releases.

Bug fixes

Potential assertion error in duplicate detection

Duplicate detection could throw an internal assertion error stating internal iterators are not properly sorted.

Document map settings in Explorer v1 bug fix

Explorer v1 included in Lingo4G 2.0.2 would reset certain document map settings to their default on each new analysis. Version 2.0.3 fixes the issue.

Document map display fixes on fractional device pixel ratios

The document map can display truncated labels on devices with fractional device pixel ratios. This release brings a workaround for this problem.

Other changes

zstandard compression support

The JSON records document source now supports reading zstd-compressed files.

PubMed example shows up to date URLs automatically

The PubMed example will try to fetch and parse bulk data file URLs for your convenience.

Version 2.0.2

The 2.0.2 release fixes reporting of multi-value field count in the document​Content stage and updates dotAtlas to fix jittery zooming of the document map.

Compatibility

Lingo4G 2.0.2 is backward-compatible with previous 2.0.x releases and works with indices created by the 2.0.0 and 2.0.1 releases.

Bug fixes

value​Count parameter was ignored

Lingo4G ignored the value​Count property and never emitted field value count. Version 2.0.2 fixes the issue and also corrects the documentation of the property.

Jittery zooming of the document map

Zooming of the document map view was jittery in both the legacy and the current version of Lingo4G Explorer. Version 2.0.2 fixes the issue.

Version 2.0.1

Version 2.0.1 improves the initial loading time of Lingo4G documentation. Version 2.0.1 does not make any changes to Lingo4G software.

Version 2.0.0

Lingo4G 2.0.0 adds analysis API v2: a new flexible API for building and running diverse text processing pipelines. It also comes with a modernized Lingo4G Explorer v2 application.

See the Version 1.x vs 2.x article for an overview of what's changed and what remained the same.

Compatibility

Project descriptor

Update recommended. Lingo4G 2.0.0 maintains compatibility with Lingo4G 1.x project descriptors. We recommend applying one change to your existing project descriptors to make the analysis API v2 easier to use.

Reindexing

Required. Lingo4G 2.0.0 does not work with indices created with earlier versions of Lingo4G. You need to perform full indexing to open your existing projects with Lingo4G 2.0.0.

REST API v1

Available, but enters maintenance mode. Lingo4G 2.0.0 preserves the REST API available in the 1.x line. All software you created against Lingo4G 1.x will also work with Lingo4G 2.0.0.

As of version 2.0.0, the REST API v1 enters maintenance mode: it will only receive critical bug fixes. All new analysis features, such as document embeddings introduced in version 2.0.0, will be exposed only in the analysis API v2.

New features

Analysis API v2

Version 2.0.0 adds a new flexible way of executing analyses. You can use analysis API v2 to build requests of varying complexity, ranging from simple query-based document search, through clustering of documents or labels, to generating a time series of 2d document maps and finding near-duplicate documents.

For more information, see:

Lingo4G Explorer v2

Version 2.0.0 comes with a modernized Lingo4G Explorer v2. Currently, Lingo4G Explorer v2 offers the JSON Sandbox app for authoring, executing and debugging analysis API v2 requests.

Lingo4G Explorer running the JSON Sandbox app (light theme).
Lingo4G Explorer running the JSON Sandbox app (dark theme).

Lingo4G Explorer running the JSON Sandbox app.

Duplicate detection

You can use Lingo4G 2.0.0 to identify pairs of documents with overlapping content. The degree of overlap can range from entire documents (exact duplicates), almost all the content (near duplicates) or just partial overlap (sentences, paragraphs). Lingo4G can also highlight the overlapping areas of documents for easier inspection of the results.

Document embeddings

Lingo4G 2.0.0 can learn multidimensional embeddings for documents. You can use analysis API v2 to compute embedding-based similarities between documents.

See the learning embeddings article for more information and limitations of the current implementation.

API changes

analysis_v2 project descriptor section

Version 2.0.0 adds the optional analysis_v2 section to the project descriptor to specify defaults, such as feature field names, for the analysis API v2.

We recommend adding to your existing project descriptor an analysis_v2 section similar to the following:

"analysis_v2": {
  "components": {
    "featureFields": {
      "type": "featureFields:simple",
      "fields": [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields": {
      "type": "contentFields:simple",
      "fields": {
        "id": {},
        "title": {},
        "abstract": {},
        "category": {}
      }
    },
    "labelFilter": {
      "type": "labelFilter:autoStopLabels"
    }
  }
}

Adapt the two highlighted blocks based on the feature and content fields available in your project:

  • in the fields array of the feature​Fields component, provide names of the feature fields from which Lingo4G should extract labels. You can use the same feature fields as in the analysis.source.labels.fields section that should already exist in your descriptor.

    See the feature​Fields:​simple reference for more details.

  • in the fields object of the content​Fields component, provide names of content fields which you would like Lingo4G to retrieve when displaying the contents of documents.

    Objects inside the fields object configure the retrieval details, such as maximum length and label highlighting, for each field. Our example uses empty objects, sticking to default retrieval settings for each field.

    See the content​Fields:​simple reference for a complete example.

Previous releases

For Lingo4G 1.x release notes, see the v1 documentation.