Commands
The l4g
(Linux/macOS) or l4g.cmd
(Windows) script is the single entry point to all Lingo4G
commands.
The l4g
bash script (or l4g.cmd
batch script on Windows) takes care of locating Lingo4G
installation directory, setting up Java defaults and launching Lingo4G. Most interactions with Lingo4G, such as
indexing, starting Lingo4G REST API server or running analyses in batch mode, is done via various
l4g
commands.
Lingo4G does not come with any special packaging for containers, service layers or other environments. If you need
to embed Lingo4G in such an environment, it should be relatively simple using the existing
l4g
commands.
Multiple l4g
commands may run in parallel. For example, one command may run the REST API server, while
another is adding or indexing a fresh set of data.
l4g
Launch script for all Lingo4G commands. Usage:
l4g [options] command [command options]
When running Lingo4G in Cygwin, use the l4g
script (Bash script). Windows-specific
l4g.cmd
will leave stray processes running in the background when ctrl-c
is received
in the terminal.
Running Lingo4G under mingw
or any other (non-CygWin) posix shell under Windows is not officially
supported.
- options
-
The list of launcher options, optional.
- -h, --help
- Display the list of available commands.
- command
- The command to run, required. See the rest of this chapter for the available commands and their options.
- command options
- The list of command-specific options, optional.
If your invocation of the
l4g
script contains a long list of parameters, such as when selecting documents to cluster by
identifier, you may need to put all your parameters in a file, one per line:
cluster -p datasets/dataset-ohsumed -v -s id=101416,101417, 101418, 101419, 10142, 101420,101421, 101422, 101423, 101424, 101425,101426, 101427, 101428, 101429, 10143,101430, 101431, 101432, 101433, 101434,101435, 101436, 101437, 101438, 101439,10144, 101440, 101441, 101442, 101443,101444, 101445, 101446, ...
and provide the file path to l4g
launcher script using the @
syntax:
l4g @parameters-file
l4g analyze
This command is kept for backward compatibility and uses analysis API v1. To run an
API v2 analysis request from command-line, use the
run-request
command.
Runs an analysis on the provided project using Lingo4G API v1. Usage:
l4g analyze [options]
The following options are supported:
- -p, --project
- Location of the project descriptor file, required.
- -s, --select
-
A query that selects documents for analysis, optional. The syntax of the query depends on the analysis
scope.type
defined in the project descriptor.-
For the
by​Query
scope type, Lingo4G will analyze all documents matching the provided query. The query must follow the syntax of the Lucene query parser configured in the project descriptor. -
For the
by​Field​Values
scope type, Lingo4G will select all documents whose specified field is equal to any of the provided values. The syntax in this case must be the following:<field-name>=<value1>,<value2>,...
If this parameter is not provided, the query specified in the project descriptor is used.
-
- -m, --max-labels
- The maximum number of labels to select, optional. If not provided, the default maximum number of labels defined in the project descriptor file will be assumed.
- -ff, --feature-fields
- The space-separated list of feature fields to use for analysis.
- --format
- Override the default
format
option specified in the descriptor. - -j, --analysis-json-override
-
The JSON override to apply to the
analysis
section of the project descriptor. You can use this option to temporarily change certain analysis parameters from their default values. The provided string must be a valid JSON object following the syntax of theanalysis
section of the project descriptor. The override JSON may contain only those parameters you wish to override. Make sure you properly quote the double quote characters being part of your JSON override value. An easy way to get the proper override JSON string is to use Lingo4G Explorer JSON export option.Some example JSON overrides:
l4g analyze -j "{ labels: { surface: { minLabelTokens: 2 } } }"
l4g analyze -j "{ labels: { frequencies: { minAbsoluteDf: 5 }, scorers: { idfScorerWeight: 0.4 } } }"
l4g analyze -j "{ output: { format: \"excel\" } }"
- -o, --output
-
Target file name (or directory) to which analysis results should be saved, optional. The default value points at the project's
results
folder.If the provided path points to an existing directory, the result will be written as a file in that directory. The file will follow this naming convention:
analysis-{timestamp}.{format}
.If the provided path is not a directory, the result will be saved to that path, overwriting any previous content. All parent directories of the provided file path must exist.
- --pretty
- Override the default
pretty
option specified in the descriptor. - -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
l4g delete
Removes one or more documents from the index (based on a Lucene query). Usage:
l4g delete [options]
The following options are available:
- -p, --project
- Location of the project descriptor file, required.
- --query
-
A Lucene query which should be used to select all documents to be deleted from the index. The query text will be
parsed using the project's default query parser or one indicated by the
--query-parser
option. - --query-parser
- The query parser to use for parsing the
--query
text (document selector). - -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
When you run l4g delete
to remove documents from the index, the deletions are not immediately
visible to Lingo4G REST API server. To make the deletions visible, call
l4g index --incremental
(incremental index update) or l4g reindex
(full feature
reindexing), followed by an index reload REST
API call.
Note that the incremental indexing and especially full feature reindexing carry a significant computational cost. Therefore, you may want to defer incremental indexing or reindexing until the index accumulates a larger number of modifications (document additions, updates and deletions).
l4g index
Performs indexing of the provided project's data. Usage:
l4g index [options]
The following options are supported:
- -p, --project
- Location of the project descriptor file, required.
- -f, --force
- Lingo4G requires an explicit confirmation before clearing the contents of an existing index (in non-incremental mode). This option permits deletion of all documents from the index prior to running a full indexing pipeline.
- --max-docs N
-
If present, Lingo4G will index only the provided number of documents. If the document source returns more than
N
documents, the extra documents will be ignored. For uniform random sampling of documents to index, see the--sampling-frequency
option. - -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --incremental
- Enables incremental indexing mode if the document sources supports it (or displays an error otherwise).
- --sampling-frequency P
-
Indexes a random sample of the source documents. The
P
parameter, which must fall in the (0, 1] range, determines the sampling probability. For example, ifP
is0.25
, each source document has a 25% probability of making it into the index. As a result, the index will contain about 25% of the source documents. - --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
l4g learn-embeddings
Learns or updates label and/ or document embeddings in an existing Lingo4G index with an existing feature commit.
l4g learn-embeddings [options]
You can pass the following options to this command:
- -p, --project
- Location of the project descriptor file, required.
- --recompute-label-embeddings
- Learns or updates label embeddings.
- --recompute-document-embeddings
- Learns or updates document embeddings.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
l4g reindex
Performs from-scratch reindexing of all documents present the search index. This performs full feature extraction, extracts labels, updates the set of stop labels and recomputes embeddings, if the project descriptor defaults are set up to compute them. The newly created feature commit will also exclude any documents that have been deleted so far.
l4g reindex [options]
The following options are supported:
- -p, --project
- Location of the project descriptor file, required.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
l4g run-request
Execute one or more API v2 JSON requests without starting the HTTP REST API server. Usage:
l4g run-request [options] file [file ...]
The input consists of one or more API v2 JSON request files or directories. If you provide a directory, Lingo4G
processes all files matching
*.json
in that directory and all subdirectories (recursively).
Lingo4G saves the output of each request to the request source's sibling file, with the
.result
suffix appended. You can use the --output
option to write responses to a
separate directory.
If any request results in an error, the entire command will return an error.
The following options are supported:
- --no-output
- Do not write any outputs, just run the requests.
- --output
- An output directory to write results to.
- -p, --project
- Location of the project descriptor file, required.
- -q, --quiet
- Limit the amount of logging information.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- --work-dir
- Override the default work directory location.
l4g server
Starts Lingo4G REST API and REST API v2 server.
l4g server [options]
The following options are supported:
- -p, --project
-
Location of the project descriptor file to expose in the REST API, required.
You can repeat this option more than once (with different project descriptors) to serve multiple projects from the same server instance. Static resources and REST API endpoints are then prefixed with the corresponding project's identifier.
For example:
l4g server -p project1 -p project2
starts two project contexts at
/project1/
and/project2/
. - -r, --port
- The port number the server will bind to, 8080 by default. When port number 0 is provided, a free port will be assigned automatically.
- --host
- The network interface the server will bind to. The server binds to all interfaces on the provided port by default.
- -w, --web-server
-
Controls the built-in web server, enabled by default.
The HTTP server will return content from
${l4g.project.dir}/web
andL4​G_​H​O​M​E/web
. The first location to contain a given resource will be used.Please take security into consideration when leaving this option enabled in production.
- --cors origin
-
Enables serving CORS headers, for the provided origin, disabled by default. If a non-empty origin value is provided, Lingo4G REST API will serve the following headers:
Access-Control-Allow-Origin: origin Access-Control-Allow-Credentials: true Access-Control-Allow-Headers: Content-Type, Origin, Accept Access-Control-Expose-Headers: Location Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Please take security into consideration when enabling this option in production.
- --idle-time
- Sets the default idle time on socket connections, in milliseconds. If synchronous, large REST requests expire before results are received then bumping idle time with this option may solve the problem (alternatively, use asynchronous API).
- --so-linger-time
- Sets socket lingering to a given amount of milliseconds.
- --shutdown-token
-
An optional shutdown authorization token for the
shutdown-server
command (to close the server process gracefully). - --pid-file
- An optional path to which the PID of the launched server process is written.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- --use-content-compression
-
Enable or disable HTTP response content compression. This option requires a boolean argument (
--use-content-compression false
). Content compression is enabled by default. - -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
Lingo4G's REST API starts and runs on top of a HTTP server. There is no way to configure limited access or HTTP authorization to this server — you should ensure the server's security externally, for example by restricting public access to the HTTP port designated for Lingo4G on the machine or by layering a proxy server with proper authorization methods on top of the Lingo4G API.
The above remark is particularly important when
l4g server
is used together with the -w
option, as then the entire content of the
L4​G_​H​O​M​E/web
folder is made publicly available.
l4g server-shutdown
Attempts to stop a running Lingo4G REST API server.
l4g server-shutdown [options]
The following options are supported:
- -r, --port
- The port number the command will try to connect to, 8080 by default.
- --host
-
The network interface of the server, if the
--host
option was used to bind it to a particular network interface address. - --shutdown-token
- The shutdown token to send to the running server. For the shutdown to succeed, token value must be equal to the one passed at server startup.
l4g show
Shows the project descriptor JSON with all default and resolved values. You can use this command to
- verify the syntax of a project descriptor file,
- check if all variables are correctly resolved,
- view all option values that apply to the project, including the default ones that were not explicitly defined in the project file.
l4g show [options]
The following options are supported:
- -p, --project
- Location of the project descriptor file to show, required.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
l4g stats
Shows some basic statistics of the Lingo4G index associated with the provided project, including the size of the index, histogram of document lengths and term vectors, histogram of phrase frequencies.
l4g stats [options]
The following options are supported:
- -p, --project
- Location of the project descriptor file to generate the statistics for, required.
- -a, --accuracy
- Accuracy of document statistics fetching, optional, default: 0.1. You can increase the accuracy for more accurate but slower computation of document length and term vector size histogram estimates. Use the value of 1.0 for an accurate computation.
- -tf, --text-fields
-
The list of fields to use when computing document length histogram, optional, default: all available text
fields. Computation of document length histogram is disabled by default, use the
--analyze-text-fields
to enable it. - --analyze-text-fields
- When provided, the histogram of the lengths of raw document text will be computed.
- -ff, --feature-fields
- The list of feature fields to use when computing phrase frequency histogram.
- -ff-all, --feature-fields-all
-
Include all feature fields to use when computing phrase frequency histogram. Overrides any explicit fields
provided by the
-ff
option. - -t, --threads
- The number of threads to use for processing, optional, default: the number CPU cores available.
- -v, --verbose
- Output detailed logs, useful for problem solving.
- -q, --quiet
- Limit the amount of logging information.
- --work-dir
- Override the default work directory location.
- -D
-
Sets a system property to the provided value. You can refer to such system properties in the project descriptor file.
Use JVM syntax to provide the values:
-​Dproperty=value
, for example-​Dinput.dir=/mnt/ssd/data/pubmed
.
l4g unpack
Extracts files from ZIP and 7z archives. This command may be useful if automatic download and extraction process does not work behind a firewall.
l4g unpack [options] [archive archive ...]
The following options are supported:
- -f, --force
- Overwrite any existing files, if they already exist.
- --delete
- Deletes the source archive after the files are successfully extracted. Default value: false.
- -o, --output-dir
- Output folder to expand files from each archive to. If not specified, file are extracted relative to their source archive file.
l4g version
Prints Lingo4G version, revision and release date information.