/v2/project/*

A group of endpoints for accessing sections of the project descriptor and controlling the document index.

These endpoints can be useful for building user interfaces like the JSON Sandbox app.

/v2/project/index/reload

Issues a request for the server to advance to the newest available index commit, including any document updates or the latest set of indexed features. This step is required to serve documents and features from a new commit after incremental indexing or feature reindexing.

Any currently active asynchronous analyses will be completed and served referencing the commit they were created against. Index reload only affects analyses created after the endpoint completes.

This API endpoint should not be called too frequently because it may result in multiple open index commits, leading to increased memory consumption and memory-mapped index files. It may be desirable to clear server caches to expunge any stale analyses and release some memory.

Index reloading is currently only possible when there is no active indexing process running in the background (the index is not write-locked). If the index is write-locked, this method will return HTTP status code 503 (service unavailable).

Access Methods

P​O​S​T

URL Parameters

None

Request Body

None

Response

This endpoint returns a JSON object describing the now-current index:

{
  "numDocs": ...,
  "numDeleted": ...,
  "metadata": {
    ...
  }
}
num​Docs

The number of documents currently in the index.

num​Deleted

The number of documents marked as deleted (an update to a document counts as a deletion of the old one and addition of a new document).

metadata

Diagnostic metadata describing the index (not formally defined).

Errors

HTTP 503 (Service Unavailable) if the index is locked and cannot be reloaded.

Examples

This API endpoint can be invoked by running the following curl command:

curl -XPOST http://localhost:8080/api/v2/project/index/reload

which results in the following HTTP message sent to the server:

POST /api/v2/project/index/reload HTTP/1.1

and the server responds:

HTTP/1.1 200 OK
cache-control: no-transform, no-store
content-length: 242
content-type: application/json;charset=utf-8

{
  "numDocs" : 500001,
  "numDeleted" : 0,
  "metadata" : {
    "date-created" : "2023-03-13T09:42:55.726827400Z",
    "lucene-commit" : "segments_5",
    "feature-commit" : "data/commits/_3",
    "feature-set" : "data/feature_sets/_2"
  }
}

/v2/project/defaults/analysis

Returns the project descriptor's analysis_v2 block, specifying default components you can reference in analysis requests.

Access Methods

G​E​T

URL Parameters

None

Request Body

None

Response

This endpoint returns a JSON object identical to that declared in the analysis_v2 block of the project descriptor.

Errors

No errors are returned from this endpoint.

Examples

This API endpoint can be invoked by running the following curl command:

curl http://localhost:8080/api/v2/project/defaults/analysis

which results in the following HTTP message sent to the server:

GET /api/v2/project/defaults/analysis HTTP/1.1

This is an example response from the arxiv example dataset's project descriptor:

HTTP/1.1 200 OK
content-length: 1646
content-type: application/json;charset=utf-8

{
  "components" : {
    "fields" : {
      "type" : "featureFields:simple",
      "fields" : [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields" : {
      "type" : "contentFields:simple",
      "fields" : {
        "id" : {
          "maxValues" : 3,
          "maxValueLength" : 250,
          "truncationMarker" : "…",
          "valueCount" : false,
          "highlighting" : {
            "enabled" : true,
            "startMarker" : "⁌%s⁍",
            "endMarker" : "⁌\\%s⁍"
          }
        },
        "title" : {
          "maxValues" : 3,
          "maxValueLength" : 250,
          "truncationMarker" : "…",
          "valueCount" : false,
          "highlighting" : {
            "enabled" : true,
            "startMarker" : "⁌%s⁍",
            "endMarker" : "⁌\\%s⁍"
          }
        },
        "abstract" : {
          "maxValues" : 3,
          "maxValueLength" : 250,
          "truncationMarker" : "…",
          "valueCount" : false,
          "highlighting" : {
            "enabled" : true,
            "startMarker" : "⁌%s⁍",
            "endMarker" : "⁌\\%s⁍"
          }
        },
        "category" : {
          "maxValues" : 3,
          "maxValueLength" : 250,
          "truncationMarker" : "…",
          "valueCount" : false,
          "highlighting" : {
            "enabled" : true,
            "startMarker" : "⁌%s⁍",
            "endMarker" : "⁌\\%s⁍"
          }
        }
      }
    },
    "labelFilter" : {
      "type" : "labelFilter:autoStopLabels",
      "minCoverage" : 0.4,
      "removalStrength" : 0.35
    }
  }
}

/v2/project/defaults/index/fields

Returns the fully resolved field definitions from the current project descriptor's fields section.

Access Methods

G​E​T

URL Parameters

None

Request Body

None

Response

This endpoint returns a JSON object with fully resolved properties of each document field, as declared in the project descriptor.

Errors

No errors are returned from this endpoint.

Examples

This API endpoint can be invoked by running the following curl command:

curl http://localhost:8080/api/v2/project/defaults/index/fields

which results in the following HTTP message sent to the server:

GET /api/v2/project/defaults/index/fields HTTP/1.1

This is an example response from the arxiv example dataset's project descriptor:

HTTP/1.1 200 OK
content-length: 1756
content-type: application/json;charset=utf-8

{
  "title" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "english",
    "featureAnalyzer" : "english"
  },
  "abstract" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "english",
    "featureAnalyzer" : "english"
  },
  "author_name" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "person",
    "featureAnalyzer" : "person"
  },
  "author_and_inst" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "person",
    "featureAnalyzer" : "person"
  },
  "created" : {
    "type" : "date",
    "inputFormat" : "yyyy-MM-dd'T'HH:mm:ss[.SSS][X]",
    "indexFormat" : "yyyy-MM-dd"
  },
  "updated" : {
    "type" : "date",
    "inputFormat" : "yyyy-MM-dd'T'HH:mm:ss[.SSS][X]",
    "indexFormat" : "yyyy-MM-dd"
  },
  "set" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "keyword",
    "featureAnalyzer" : "keyword"
  },
  "category" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "keyword",
    "featureAnalyzer" : "keyword"
  },
  "msc" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "keyword",
    "featureAnalyzer" : "keyword"
  },
  "acm" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "keyword",
    "featureAnalyzer" : "keyword"
  },
  "doi" : {
    "type" : "text",
    "id" : false,
    "indexPositions" : true,
    "analyzer" : "keyword",
    "featureAnalyzer" : "keyword"
  },
  "id" : {
    "type" : "text",
    "id" : true,
    "indexPositions" : true,
    "analyzer" : "literal",
    "featureAnalyzer" : "literal"
  }
}