Commands

The l4g (Linux/macOS) or l4g.cmd (Windows) script is the single entry point to all Lingo4G commands.

The l4g bash script (or l4g.cmd batch script on Windows) takes care of locating Lingo4G installation directory, setting up Java defaults and launching Lingo4G. Most interactions with Lingo4G, such as indexing, starting Lingo4G REST API server or running analyses in batch mode, is done via various l4g commands.

Lingo4G does not come with any special packaging for containers, service layers or other environments. If you need to embed Lingo4G in such an environment, it should be relatively simple using the existing l4g commands.

Multiple l4g commands may run in parallel. For example, one command may run the REST API server, while another is adding or indexing a fresh set of data.

l4g

Launch script for all Lingo4G commands. Usage:

l4g [options] command [command options]
Note for Cygwin users

When running Lingo4G in Cygwin, use the l4g script (Bash script). Windows-specific l4g.cmd will leave stray processes running in the background when ctrl-c is received in the terminal.

Running Lingo4G under mingw or any other (non-CygWin) posix shell under Windows is not officially supported.

options

The list of launcher options, optional.

-h, --help
Display the list of available commands.
command
The command to run, required. See the rest of this chapter for the available commands and their options.
command options
The list of command-specific options, optional.
Tip: reading command parameters from a file.

If your invocation of the l4g script contains a long list of parameters, such as when selecting documents to cluster by identifier, you may need to put all your parameters in a file, one per line:

cluster
-p
datasets/dataset-ohsumed
-v
-s
id=101416,101417, 101418, 101419, 10142, 101420,101421, 101422, 101423, 101424, 101425,101426, 101427, 101428, 101429, 10143,101430, 101431, 101432, 101433, 101434,101435, 101436, 101437, 101438, 101439,10144, 101440, 101441, 101442, 101443,101444, 101445, 101446, ...
      

and provide the file path to l4g launcher script using the @ syntax:

l4g @parameters-file

l4g analyze

Deprecated command.

This command is kept for backward compatibility and uses analysis API v1. To run an API v2 analysis request from command-line, use the run-request command.

Runs an analysis on the provided project using Lingo4G API v1. Usage:

l4g analyze [options]

The following options are supported:

-p, --project
Location of the project descriptor file, required.
-s, --select

A query that selects documents for analysis, optional. The syntax of the query depends on the analysis scope.type defined in the project descriptor.

  • For the by​Query scope type, Lingo4G will analyze all documents matching the provided query. The query must follow the syntax of the Lucene query parser configured in the project descriptor.

  • For the by​Field​Values scope type, Lingo4G will select all documents whose specified field is equal to any of the provided values. The syntax in this case must be the following:

    <field-name>=<value1>,<value2>,...

If this parameter is not provided, the query specified in the project descriptor is used.

-m, --max-labels
The maximum number of labels to select, optional. If not provided, the default maximum number of labels defined in the project descriptor file will be assumed.
-ff, --feature-fields
The space-separated list of feature fields to use for analysis.
--format
Override the default format option specified in the descriptor.
-j, --analysis-json-override

The JSON override to apply to the analysis section of the project descriptor. You can use this option to temporarily change certain analysis parameters from their default values. The provided string must be a valid JSON object following the syntax of the analysis section of the project descriptor. The override JSON may contain only those parameters you wish to override. Make sure you properly quote the double quote characters being part of your JSON override value. An easy way to get the proper override JSON string is to use Lingo4G Explorer JSON export option.

Some example JSON overrides:

l4g analyze -j "{ labels: { surface: { minLabelTokens: 2 } } }"
l4g analyze -j "{ labels: { frequencies: { minAbsoluteDf: 5 }, scorers: { idfScorerWeight: 0.4 } } }"
l4g analyze -j "{ output: { format: \"excel\" } }"
-o, --output

Target file name (or directory) to which analysis results should be saved, optional. The default value points at the project's results folder.

If the provided path points to an existing directory, the result will be written as a file in that directory. The file will follow this naming convention: analysis-{timestamp}.{format}.

If the provided path is not a directory, the result will be saved to that path, overwriting any previous content. All parent directories of the provided file path must exist.

--pretty
Override the default pretty option specified in the descriptor.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

l4g delete

Removes one or more documents from the index (based on a Lucene query). Usage:

l4g delete [options]

The following options are available:

-p, --project
Location of the project descriptor file, required.
--query
A Lucene query which should be used to select all documents to be deleted from the index. The query text will be parsed using the project's default query parser or one indicated by the --query-parser option.
--query-parser
The query parser to use for parsing the --query text (document selector).
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

Document deletions and Lingo4G REST API.

When you run l4g delete to remove documents from the index, the deletions are not immediately visible to Lingo4G REST API server. To make the deletions visible, call l4g index --incremental (incremental index update) or l4g reindex (full feature reindexing), followed by an index reload REST API call.

Note that the incremental indexing and especially full feature reindexing carry a significant computational cost. Therefore, you may want to defer incremental indexing or reindexing until the index accumulates a larger number of modifications (document additions, updates and deletions).

l4g index

Performs indexing of the provided project's data. Usage:

l4g index [options]

The following options are supported:

-p, --project
Location of the project descriptor file, required.
-f, --force
Lingo4G requires an explicit confirmation before clearing the contents of an existing index (in non-incremental mode). This option permits deletion of all documents from the index prior to running a full indexing pipeline.
--max-docs N
If present, Lingo4G will index only the provided number of documents. If the document source returns more than N documents, the extra documents will be ignored. For uniform random sampling of documents to index, see the --sampling-frequency option.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--incremental
Enables incremental indexing mode if the document sources supports it (or displays an error otherwise).
--sampling-frequency P
Indexes a random sample of the source documents. The P parameter, which must fall in the (0, 1] range, determines the sampling probability. For example, if P is 0.25, each source document has a 25% probability of making it into the index. As a result, the index will contain about 25% of the source documents.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

l4g learn-embeddings

Learns or updates label and/ or document embeddings in an existing Lingo4G index with an existing feature commit.

l4g learn-embeddings [options]

You can pass the following options to this command:

-p, --project
Location of the project descriptor file, required.
--recompute-label-embeddings
Learns or updates label embeddings.
--recompute-document-embeddings
Learns or updates document embeddings.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

l4g reindex

Performs from-scratch reindexing of all documents present the search index. This performs full feature extraction, extracts labels, updates the set of stop labels and recomputes embeddings, if the project descriptor defaults are set up to compute them. The newly created feature commit will also exclude any documents that have been deleted so far.

l4g reindex [options]

The following options are supported:

-p, --project
Location of the project descriptor file, required.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.

l4g run-request

Execute one or more API v2 JSON requests without starting the HTTP REST API server. Usage:

l4g run-request [options] file [file ...]

The input consists of one or more API v2 JSON request files or directories. If you provide a directory, Lingo4G processes all files matching *.json in that directory and all subdirectories (recursively).

Lingo4G saves the output of each request to the request source's sibling file, with the .result suffix appended. You can use the --output option to write responses to a separate directory.

If any request results in an error, the entire command will return an error.

The following options are supported:

--no-output
Do not write any outputs, just run the requests.
--output
An output directory to write results to.
-p, --project
Location of the project descriptor file, required.
-q, --quiet
Limit the amount of logging information.
-v, --verbose
Output detailed logs, useful for problem solving.
--work-dir
Override the default work directory location.

l4g server

Starts Lingo4G REST API and REST API v2 server.

l4g server [options]

The following options are supported:

-p, --project

Location of the project descriptor file to expose in the REST API, required.

You can repeat this option more than once (with different project descriptors) to serve multiple projects from the same server instance. Static resources and REST API endpoints are then prefixed with the corresponding project's identifier.

For example:

l4g server -p project1 -p project2

starts two project contexts at /project1/ and /project2/.

-r, --port
The port number the server will bind to, 8080 by default. When port number 0 is provided, a free port will be assigned automatically.
--host
The network interface the server will bind to. The server binds to all interfaces on the provided port by default.
-w, --web-server

Controls the built-in web server, enabled by default.

The HTTP server will return content from ${l4g.project.dir}/web and L4​G_​H​O​M​E/web. The first location to contain a given resource will be used.

Please take security into consideration when leaving this option enabled in production.

--cors origin

Enables serving CORS headers, for the provided origin, disabled by default. If a non-empty origin value is provided, Lingo4G REST API will serve the following headers:

Access-Control-Allow-Origin: origin
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Content-Type, Origin, Accept
Access-Control-Expose-Headers: Location
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS

Please take security into consideration when enabling this option in production.

--idle-time
Sets the default idle time on socket connections, in milliseconds. If synchronous, large REST requests expire before results are received then bumping idle time with this option may solve the problem (alternatively, use asynchronous API).
--so-linger-time
Sets socket lingering to a given amount of milliseconds.
--shutdown-token
An optional shutdown authorization token for the shutdown-server command (to close the server process gracefully).
--pid-file
An optional path to which the PID of the launched server process is written.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
--use-content-compression
Enable or disable HTTP response content compression. This option requires a boolean argument (--use-content-compression false). Content compression is enabled by default.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

Heads up, public HTTP server!

Lingo4G's REST API starts and runs on top of a HTTP server. There is no way to configure limited access or HTTP authorization to this server — you should ensure the server's security externally, for example by restricting public access to the HTTP port designated for Lingo4G on the machine or by layering a proxy server with proper authorization methods on top of the Lingo4G API.

The above remark is particularly important when l4g server is used together with the -w option, as then the entire content of the L4​G_​H​O​M​E/web folder is made publicly available.

l4g server-shutdown

Attempts to stop a running Lingo4G REST API server.

l4g server-shutdown [options]

The following options are supported:

-r, --port
The port number the command will try to connect to, 8080 by default.
--host
The network interface of the server, if the --host option was used to bind it to a particular network interface address.
--shutdown-token
The shutdown token to send to the running server. For the shutdown to succeed, token value must be equal to the one passed at server startup.

l4g show

Shows the project descriptor JSON with all default and resolved values. You can use this command to

  • verify the syntax of a project descriptor file,
  • check if all variables are correctly resolved,
  • view all option values that apply to the project, including the default ones that were not explicitly defined in the project file.
l4g show [options]

The following options are supported:

-p, --project
Location of the project descriptor file to show, required.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

l4g stats

Shows some basic statistics of the Lingo4G index associated with the provided project, including the size of the index, histogram of document lengths and term vectors, histogram of phrase frequencies.

l4g stats [options]

The following options are supported:

-p, --project
Location of the project descriptor file to generate the statistics for, required.
-a, --accuracy
Accuracy of document statistics fetching, optional, default: 0.1. You can increase the accuracy for more accurate but slower computation of document length and term vector size histogram estimates. Use the value of 1.0 for an accurate computation.
-tf, --text-fields
The list of fields to use when computing document length histogram, optional, default: all available text fields. Computation of document length histogram is disabled by default, use the --analyze-text-fields to enable it.
--analyze-text-fields
When provided, the histogram of the lengths of raw document text will be computed.
-ff, --feature-fields
The list of feature fields to use when computing phrase frequency histogram.
-ff-all, --feature-fields-all
Include all feature fields to use when computing phrase frequency histogram. Overrides any explicit fields provided by the -ff option.
-t, --threads
The number of threads to use for processing, optional, default: the number CPU cores available.
-v, --verbose
Output detailed logs, useful for problem solving.
-q, --quiet
Limit the amount of logging information.
--work-dir
Override the default work directory location.
-D

Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.

Use JVM syntax to provide the values: -​Dproperty=value, for example -​Dinput.dir=/mnt/ssd/data/pubmed.

l4g unpack

Extracts files from ZIP and 7z archives. This command may be useful if automatic download and extraction process does not work behind a firewall.

l4g unpack [options] [archive archive ...]

The following options are supported:

-f, --force
Overwrite any existing files, if they already exist.
--delete
Deletes the source archive after the files are successfully extracted. Default value: false.
-o, --output-dir
Output folder to expand files from each archive to. If not specified, file are extracted relative to their source archive file.

l4g version

Prints Lingo4G version, revision and release date information.