Analysis (API v2)

The analysis_v2 section configures the default components you can reference in API v2 analysis request JSONs.

This section can be used to conveniently store shared definitions of components that are frequently referenced in analytical requests. For example, such defaults could include a set of content fields or feature fields. This is particularly useful with auto references because the defaults would be resolved automatically, without providing any explicit reference links.

Here is an example shared component section from the dataset-arxiv project.

"analysis_v2": {
  "components": {
    "fields": {
      "type": "featureFields:simple",
      "fields": [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields": {
      "type": "contentFields:simple",
      "fields": {
        "id": {},
        "title": {},
        "abstract": {},
        "category": {}
      }
    },
    "labelFilter": {
      "type": "labelFilter:autoStopLabels"
    }
  }
}

It defines three shared components, with the following identifiers: fields, content​Fields and label​Filter. An analysis request could reference any of these components by their identifier or, perhaps more intuitively, rely on automatic reference resolution based on the required component type, as shown in the example below.

This request uses the document​Content stage to fetch content fields for the top three documents matching the query photon. Note the absence of the fields property: it is resolved automatically and points at the content​Fields component declared in the analysis_v2 section of the project descriptor.

{
  "stages": {
    "documentContent": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      }
    }
  },
  "output": {
    "request": true
  }
}

The above request returns the following output:

"documentContent": {
  "documents": [
    {
      "id": 188201,
      "fields": {
        "id": {
          "values": [
            "1602.04692"
          ]
        },
        "title": {
          "values": [
            "⁌q⁍Photons⁌\\q⁍, ⁌q⁍Photon⁌\\q⁍ Jets and Dark ⁌q⁍Photons⁌\\q⁍ at 750 GeV and Beyond"
          ]
        },
        "abstract": {
          "values": [
            "…p → S → a a → 4γ, where S is a new scalar with a mass of 750 GeV and a is a light pseudoscalar decaying to two collinear ⁌q⁍photons⁌\\q⁍. ⁌q⁍Photon⁌\\q⁍ jets can be distinguished from isolated ⁌q⁍photons⁌\\q⁍ by exploiting the fact that a large fraction of ⁌q⁍photons⁌\\q⁍ convert…"
          ]
        },
        "category": {
          "values": [
            "hep-ph",
            "hep-ex"
          ]
        }
      }
    },
    {
      "id": 62168,
      "fields": {
        "id": {
          "values": [
            "hep-ex/9810011"
          ]
        },
        "title": {
          "values": [
            "Final States in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ and ⁌q⁍Photon⁌\\q⁍-Proton Interactions"
          ]
        },
        "abstract": {
          "values": [
            " The total hadronic ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ cross-section measured by L3 and OPAL and the apparent discrepancy between the results are discussed. OPAL measurements of jet and charged hadron production in ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ scattering and preliminary H1 results on…"
          ]
        },
        "category": {
          "values": [
            "hep-ex"
          ]
        }
      }
    },
    {
      "id": 252264,
      "fields": {
        "id": {
          "values": [
            "hep-ph/0205301"
          ]
        },
        "title": {
          "values": [
            "Two-⁌q⁍Photon⁌\\q⁍ Processes and ⁌q⁍Photon⁌\\q⁍ Structure"
          ]
        },
        "abstract": {
          "values": [
            " In this article aspects of ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ physics related to the structure of real and virtual ⁌q⁍photons⁌\\q⁍ are reviewed. A re-calculation of the virtual ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ box is performed and some discrepancies in the literature are clarified. A useful…"
          ]
        },
        "category": {
          "values": [
            "hep-ph"
          ]
        }
      }
    }
  ]
}

The analysis_v2 section has no particular restrictions on the type or number of components declared in its body. However, this feature should be used with care: moving too many component definitions to project descriptor will make individual requests more difficult to understand (because the request will no longer be self-contained).

If in doubt, one can request the fully resolved request for inspection using the output.request property. This is what the fully resolved request looks like for the above example (note common component definitions copied from the project descriptor):

"request": {
  "components": {
    "labelFilter": {
      "type": "labelFilter:autoStopLabels",
      "minCoverage": 0.4,
      "removalStrength": 0.35
    },
    "fields": {
      "type": "featureFields:simple",
      "fields": [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields": {
      "type": "contentFields:simple",
      "fields": {
        "id": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "title": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "abstract": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "category": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        }
      }
    }
  },
  "stages": {
    "documentContent": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "limit": 3,
        "query": {
          "type": "query:string",
          "queryParser": "",
          "query": "photon"
        },
        "accurateHitCount": false,
        "requireScores": true
      },
      "fields": {
        "type": "contentFields:reference",
        "use": "contentFields",
        "auto": false
      },
      "queries": {},
      "start": 0,
      "limit": "unlimited"
    }
  },
  "output": {
    "progress": true,
    "request": true
  }
}

Finally, note that if the analysis request contains component identifiers identical to those contained in the shared analysis_v2 block, the components declared in the request take precedence.