2.4.x release notes
Release notes for Lingo4G 2.4.x.
Version 2.4.0
Release 2.4.0 comes with the following new features and improvements.
-
Date field improvements: more efficient indexing and searching of date-typed fields, support for date math expressions in search queries.
-
Improved document cluster labeling: instead of the cluster's most frequent labels, Lingo4G now chooses labels that are specific to the cluster and infrequent outside the cluster.
-
Label filtering improvements, including the label diversification filter for suppressing semantically-similar labels.
Compatibility
- Project descriptor
-
Updates may be required. If your project descriptor customizes the date field format, see the date field changes for the updates you may need to apply.
- Reindexing
-
Required. Date and time fields have a new internal representation, which increases the performance of indexing, storage and queries. This new storage is not compatible with previous versions, so full reindexing is required.
- Analysis request JSONs
-
Updates may be required. See the document cluster labeling analysis API changes and query builders API changes for the required updates.
New features
- Date field improvements
-
Date fields are now stored in the index as numbers (milliseconds since epoch). This greatly improves indexing and search performance.
Date values in queries are now strictly validated against the
index​Format
specified in the field's definition. Invalid or non-parseable values will cause request errors.Date fields now support date math expressions in queries.
Improvements
- Improved cluster labeling
-
Version 2.4.0 improves the cluster labels produced by the
label​Clusters:​document​Cluster​Labels
stage. The new implementation ensures that cluster labels are specific to the cluster and do not occur too frequently in documents from other clusters.You can further improve the document cluster labels by applying the
label​List​Filter:​diversified
label list filter, which conflates the repetitive labels, such as globular clusters, globular cluster system, GC, leaving space for a broader space of meanings. See the example request for more details. - Community Detection clustering stability
-
When applied to the same input similarity matrix, Community Detection clustering returns the same clusters across different analysis runs.
- Collection of content field values
-
Version 2.4.0 adds the
label​Collector:​all​From​Content​Fields
collector, which fetches values of documents' content fields.You can use the new collector, for example, to label document clusters using content field values.
- Label list filtering in
labels:​from​Text
-
Version 2.4.0 adds the
label​List​Filter
property to thelabels:​from​Text
stage, so that you can apply the removal of truncated or repetitive labels to the labels Lingo4G extracts from the free text.
API changes
- Date field changes
-
Date field values in queries support date math and validation by default now.
If your project descriptor contains a custom
index​Format
specification on any date fields, you may need to update the format specification for date prefix queries to work in version 2.4.0.For example, if in your current descriptor the
index​Format
on date fields isyyyy-​M​M-dd
, prefix queries like2021-02
or2021
will fail in Lingo4G 2.4.0. To allow date prefix searches, change theindex​Format
toyyyy[-​M​M][-dd]
. - Document cluster labeling
-
Improvements to document cluster labeling require a small change to the properties of the
label​Clusters:​document​Cluster​Labels
stage. The 2.4.0 release removes thelabel​Aggregator
property of that stage and instead introduces thelabel​Collector
property.Updates are required for all your requests that:
-
perform document cluster labeling using the
label​Clusters:​document​Cluster​Labels
stage, and at the same time -
provide a custom
label​Aggregator
property to thelabel​Clusters:​document​Cluster​Labels
stage.
Typically, your requests may use the
label​Aggregator
property to apply additional filtering to the labels Lingo4G uses to describe clusters:{ "documentClusterLabels": { "type": "labelClusters:documentClusterLabels", "maxLabels": { "@var": "max_cluster_labels" }, "labelAggregator": { "type": "labelAggregator:topWeight", "labelCollector": { "type": "labelCollector:topFromFeatureFields", "labelWeighting": "EMBEDDING", "labelFilter": { "type": "labelFilter:composite", "labelFilters": { "default": { "type": "labelFilter:reference", "auto": true }, "wordCount": { "type": "labelFilter:tokenCount", "maxTokens": 8, "minTokens": 2 } } } } } } }
To update the request for the 2.4.0 release, remove the
label​Aggregator
property and pull-up thelabel​Collector
configuration to the top level of thelabel​Clusters:​document​Cluster​Labels
stage specification:{ "documentClusterLabels": { "type": "labelClusters:documentClusterLabels", "maxLabels": { "@var": "max_cluster_labels" }, "labelCollector": { "type": "labelCollector:topFromFeatureFields", "labelWeighting": "EMBEDDING", "labelFilter": { "type": "labelFilter:composite", "labelFilters": { "default": { "type": "labelFilter:reference", "auto": true }, "wordCount": { "type": "labelFilter:tokenCount", "maxTokens": 8, "minTokens": 2 } } } } } }
-
- Queries from query builders
-
Lingo4G 2.4.0 renames the
query:​from​Query​Builder
component intoquery:​for​Document​Fields
to better reflect what the component does.If your requests use the
query:​from​Query​Builder
component, replace the component type withquery:​for​Document​Fields
.Additionally, version 2.4.0 changes the implementation of the
query:​from​Query​Builder
component to invoke the query builder with an empty set of inputs. The primary use case of the new implementation is combining multiple user-provided variable values into a single more complex query or reusing the same user input to build multiple queries.