Full indexing

Before Lingo4G can analyze any documents, it must import them into the index.

To perform full document indexing, run the index command pointing to your project's descriptor:

l4g index -p <project-descriptor-path>

A full indexing cycle runs all indexing steps from scratch. These include:

  • deleting any existing index data (see note below),
  • importing documents,
  • creating search indexes,
  • performing feature discovery,
  • extracting stop labels,
  • building label and document embeddings (optional).

If the document index already exists in the project or the index is incompatible with the version of Lingo4G you are running, the index command will abort with an error message. Use the --force option to force Lingo4G to delete any existing index data.

Once Lingo4G completes indexing, it creates a new feature commit and the project is ready to serve analysis requests through the HTTP REST API. You can inspect all feature commits in the project using the stats command.

A full indexing cycle may be time-consuming because it always scans through all documents and recreates the index's search structures. The incremental indexing strategy may be a better fit for dynamic document collections, where documents are added or deleted over time.