Features

Lingo3G offers the following features.

  • Quality. Lingo3G delivers high-quality semantic clustering with special emphasis placed on making cluster labels meaningful, concise and varied.
  • Performance. Lingo3G's internal architecture is designed to ensure ultra-fast input document preprocessing and clustering. Lingo3G can cluster hundreds of documents under sub-second times.
  • Ease of deployment. Lingo3G is a simple stateless HTTP/REST service (or a single-method Java API call) and requires no steep learning curve or complex integration layers.
  • Scalability. When short response time is not a concern, Lingo3G can scale to thousands of documents and can cluster them on commodity hardware. If this is still not enough, we recommend taking a look at Lingo4G which provides large-scale clustering and analytical capabilities.
  • Tuning. A wide range of parameters is provided to tweak and tune the algorithm for desired balance between clustering quality and performance.
  • 100% Java. The software is written in Java, with no native components. Lingo3G can be easily embedded into any Java application. If interoperability is required, a ready-to-use document clustering HTTP/REST service is also provided.
  • Domain language tuning. Lingo3G can (and should) be tweaked to handle document domain-specific vocabulary to increase the quality of output clusters. This can be achieved with word exclusion lists, label filtering and boosting dictionaries and synonym sets.
  • Integration with Open Source. Lingo3G plugs seamlessly into the Carrot2 project, adding the full power of a commercial clustering algorithm to all applications that support Carrot2, including integration layers with Apache Solr and Elasticsearch.