You can use Clustering Workbench to quickly try and tune Lingo3G clustering on your data. If Workbench suits your needs, you can use it as a text mining and research tool.
Installation and running
Install Lingo3G on your machine.
Start the Lingo3G Document Clustering Server (DCS) application located in the
dcs/folder of your Lingo3G installation.
- On Windows, run the
- On Linux and Mac, run the
If the DCS starts successfully, you should see a terminal window with messages similar to the following:
16:59:55: DCS context initialized [algorithms: [Lingo3G], templates: [frontend-default]] 16:59:55: Service started on port 8080. 16:59:55: The following contexts are available: http://localhost:8080/ DCS Root http://localhost:8080/doc Documentation http://localhost:8080/frontend End-user apps http://localhost:8080/javadoc Java API Javadoc http://localhost:8080/service REST API
- On Windows, run the
Open http://localhost:8080/frontend/#/workbench in a modern browser.
User interface highlights
If you'd like to learn how to use Workbench to cluster your own data, see the Trying Lingo3G section. For some Clustering Workbench tips and tricks, see below.
- Data source choice
Use the data source choice section to choose the data for clustering and to run the clustering process.
The Cluster button turns blue once you modify any parameters in the parameters panel to let you know to re-run clustering for the parameter changes to take effect.
- Clusters view
Use the list, treemap and pie-chart tabs to choose cluster presentation.
Use the icons to invoke additional tools for the current view, such as treemap interaction help, exporting of the visualization to JPEG and configuration of the visualization display.
- Documents view
The documents view shows the documents belonging to the cluster you select. Press the icon in the top right corner to configure which documents fields to show for each documents.
- Documents view configuration
If the documents you submit for clustering contain multiple fields, you can use the document view configuration to choose which fields to show.
For each field you can choose one of the following display roles:
- Shows in bold at the top of the document, works well for document title and other short textual fields.
- Shows under title fields, works well for document body and other longer textual fields. Workbench truncates body fields if they exceed the maximum number of characters per document you configure.
- Shows under body fields, use for document identifiers.
- Shows under the id fields, use for short multi-valued document fields, such as tags or list of authors.
- Shows under the tag fields, use for short single-valued document fields, such as dates, numbers or booleans.
Workbench tries to determine the best document display configuration based on the distribution of the field values in your data set. You can tune that configuration if needed.
- Parameters panel
Use the parameters panel to change parameters of the data source and the clustering algorithm. Click the (?) icon for a description of a specific parameter.
Once you finish changing parameter values, press the Cluster button to re-run clustering.
- Results export
Use the export tool to save the current documents and clusters in Excel, OpenOffice, CSV or JSON format.
- Label dictionaries
Use the text box in the Dictionaries section to edit label exclusion dictionaries. Use the glob, exact and regexp tabs to choose the label matching mechanism. The glob syntax should serve most label filtering needs.
Click syntax for a syntax overview. See Label matchers for the complete documentation.
Click Copy JSON to copy to clipboard the JSON representation of the dictionaries, ready for pasting into JSON request and code.
- Parameter search
Press the Filters button to enable filtering of parameters. Type part of parameter name to the search box to show matching parameters.
- Advanced parameters
Toggle the advanced parameters button to access all the available parameters, including those designed for expert users.
- Parameters JSON export
Use the Parameters Export tool to copy the current Lingo3G parameters as JSON ready to paste into REST API requests.