2.4.x release notes

Release notes for Lingo4G 2.4.x.

Version 2.4.0

Release 2.4.0 comes with the following new features and improvements.

  • Date field improvements: more efficient indexing and searching of date-typed fields, support for date math expressions in search queries.

  • Improved document cluster labeling: instead of the cluster's most frequent labels, Lingo4G now chooses labels that are specific to the cluster and infrequent outside the cluster.

  • Label filtering improvements, including the label diversification filter for suppressing semantically-similar labels.

Compatibility

Project descriptor

Updates may be required. If your project descriptor customizes the date field format, see the date field changes for the updates you may need to apply.

Reindexing

Required. Date and time fields have a new internal representation, which increases the performance of indexing, storage and queries. This new storage is not compatible with previous versions, so full reindexing is required.

Analysis request JSONs

Updates may be required. See the document cluster labeling analysis API changes and query builders API changes for the required updates.

New features

Date field improvements

Date fields are now stored in the index as numbers (milliseconds since epoch). This greatly improves indexing and search performance.

Date values in queries are now strictly validated against the index​Format specified in the field's definition. Invalid or non-parseable values will cause request errors.

Date fields now support date math expressions in queries.

Improvements

Improved cluster labeling

Version 2.4.0 improves the cluster labels produced by the label​Clusters:​document​Cluster​Labels stage. The new implementation ensures that cluster labels are specific to the cluster and do not occur too frequently in documents from other clusters.

You can further improve the document cluster labels by applying the label​List​Filter:​diversified label list filter, which conflates the repetitive labels, such as globular clusters, globular cluster system, GC, leaving space for a broader space of meanings. See the example request for more details.

Community Detection clustering stability

When applied to the same input similarity matrix, Community Detection clustering returns the same clusters across different analysis runs.

Collection of content field values

Version 2.4.0 adds the label​Collector:​all​From​Content​Fields collector, which fetches values of documents' content fields.

You can use the new collector, for example, to label document clusters using content field values.

Label list filtering in labels:​from​Text

Version 2.4.0 adds the label​List​Filter property to the labels:​from​Text stage, so that you can apply the removal of truncated or repetitive labels to the labels Lingo4G extracts from the free text.

API changes

Date field changes

Date field values in queries support date math and validation by default now.

If your project descriptor contains a custom index​Format specification on any date fields, you may need to update the format specification for date prefix queries to work in version 2.4.0.

For example, if in your current descriptor the index​Format on date fields is yyyy-​M​M-dd, prefix queries like 2021-02 or 2021 will fail in Lingo4G 2.4.0. To allow date prefix searches, change the index​Format to yyyy[-​M​M][-dd].

Document cluster labeling

Improvements to document cluster labeling require a small change to the properties of the label​Clusters:​document​Cluster​Labels stage. The 2.4.0 release removes the label​Aggregator property of that stage and instead introduces the label​Collector property.

Updates are required for all your requests that:

Typically, your requests may use the label​Aggregator property to apply additional filtering to the labels Lingo4G uses to describe clusters:

{
  "documentClusterLabels": {
    "type": "labelClusters:documentClusterLabels",
    "maxLabels": {
      "@var": "max_cluster_labels"
    },
    "labelAggregator": {
      "type": "labelAggregator:topWeight",
      "labelCollector": {
        "type": "labelCollector:topFromFeatureFields",
        "labelWeighting": "EMBEDDING",
        "labelFilter": {
          "type": "labelFilter:composite",
          "labelFilters": {
            "default": {
              "type": "labelFilter:reference",
              "auto": true
            },
            "wordCount": {
              "type": "labelFilter:tokenCount",
              "maxTokens": 8,
              "minTokens": 2
            }
          }
        }
      }
    }
  }
}
          

To update the request for the 2.4.0 release, remove the label​Aggregator property and pull-up the label​Collector configuration to the top level of the label​Clusters:​document​Cluster​Labels stage specification:

{
  "documentClusterLabels": {
    "type": "labelClusters:documentClusterLabels",
    "maxLabels": {
      "@var": "max_cluster_labels"
    },
    "labelCollector": {
      "type": "labelCollector:topFromFeatureFields",
      "labelWeighting": "EMBEDDING",
      "labelFilter": {
        "type": "labelFilter:composite",
        "labelFilters": {
          "default": {
            "type": "labelFilter:reference",
            "auto": true
          },
          "wordCount": {
            "type": "labelFilter:tokenCount",
            "maxTokens": 8,
            "minTokens": 2
          }
        }
      }
    }
  }
}
          
Queries from query builders

Lingo4G 2.4.0 renames the query:​from​Query​Builder component into query:​for​Document​Fields to better reflect what the component does.

If your requests use the query:​from​Query​Builder component, replace the component type with query:​for​Document​Fields.

Additionally, version 2.4.0 changes the implementation of the query:​from​Query​Builder component to invoke the query builder with an empty set of inputs. The primary use case of the new implementation is combining multiple user-provided variable values into a single more complex query or reusing the same user input to build multiple queries.