Analysis (API v2)

The analysis_v2 section configures the default components you can reference in API v2 analysis request JSONs.

This section can be used to conveniently store shared definitions of components that are frequently referenced in analytical requests. For example, such defaults could include a set of content fields or feature fields. This is particularly useful with auto references because the defaults would be resolved automatically, without providing any explicit reference links.

Here is an example shared component section from the dataset-arxiv project.

"analysis_v2": {
  "components": {
    "fields": {
      "type": "featureFields:simple",
      "fields": [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields": {
      "type": "contentFields:simple",
      "fields": {
        "id": {},
        "title": {},
        "abstract": {},
        "category": {}
      }
    },
    "labelFilter": {
      "type": "labelFilter:composite",
      "labelFilters": {
        "auto": {
          "type": "labelFilter:autoStopLabels"
        },
        "project": {
          "type": "labelFilter:dictionary",
          "exclude": [
            {
              "type": "dictionary:all"
            }
          ]
        }
      }
    }
  }
}

It defines three shared components, with the following identifiers: fields, content​Fields and label​Filter. An analysis request could reference any of these components by their identifier or, perhaps more intuitively, rely on automatic reference resolution based on the required component type, as shown in the example below.

This request uses the document​Content stage to fetch content fields for the top three documents matching the query photon. Note the absence of the fields property: it is resolved automatically and points at the content​Fields component declared in the analysis_v2 section of the project descriptor.

{
  "stages": {
    "documentContent": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "query": {
          "type": "query:string",
          "query": "photon"
        },
        "limit": 3
      }
    }
  },
  "output": {
    "request": true
  }
}

The above request returns the following output:

"documentContent": {
  "documents": [
    {
      "id": 482237,
      "fields": {
        "id": {
          "values": [
            "hep-ph/9406370"
          ]
        },
        "title": {
          "values": [
            "Jets in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collisions"
          ]
        },
        "abstract": {
          "values": [
            " We study jet production in ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ reactions at the next-to-leading logarithm accuracy. The discussion of the theoretical uncertainties and the role of the quark and gluon distributions in the ⁌q⁍photon⁌\\q⁍ is emphasized. The phenomenology at TRISTAN…"
          ]
        },
        "category": {
          "values": [
            "hep-ph"
          ]
        }
      }
    },
    {
      "id": 298152,
      "fields": {
        "id": {
          "values": [
            "1601.01144"
          ]
        },
        "title": {
          "values": [
            "Studying 750 GeV Di-⁌q⁍photon⁌\\q⁍ Resonance at ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collider"
          ]
        },
        "abstract": {
          "values": [
            " Motivated by the recent LHC discovery of the di-⁌q⁍photon⁌\\q⁍ excess at the invariant mass of   750 GeV, we study the prospect of investigating the scalar resonance at a future ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ collider. We show that, if the di-⁌q⁍photon⁌\\q⁍ excess observed at the…"
          ]
        },
        "category": {
          "values": [
            "hep-ph",
            "hep-ex"
          ]
        }
      }
    },
    {
      "id": 275187,
      "fields": {
        "id": {
          "values": [
            "1607.03678"
          ]
        },
        "title": {
          "values": [
            "Two-⁌q⁍photon⁌\\q⁍ interference of temporally separated ⁌q⁍photons⁌\\q⁍"
          ]
        },
        "abstract": {
          "values": [
            " We present experimental demonstrations of two-⁌q⁍photon⁌\\q⁍ interference involving temporally separated ⁌q⁍photons⁌\\q⁍ within two types of interferometers: a Mach-Zehnder interferometer and a polarization-based Michelson interferometer. The two-⁌q⁍photon⁌\\q⁍ states are…",
            "…arms by introducing a large time delay between two input ⁌q⁍photons⁌\\q⁍; this state is composed of two temporally separated ⁌q⁍photons⁌\\q⁍, which are in two different or the same spatial modes. We then observe two-⁌q⁍photon⁌\\q⁍ interference fringes involving both…",
            "…the interference of path-entangled two-⁌q⁍photon⁌\\q⁍ states simultaneously in a single interferometric setup. The observed two-⁌q⁍photon⁌\\q⁍ interference fringes provide simultaneous observation of the interferometric properties of the single-⁌q⁍photon⁌\\q⁍ and…"
          ]
        },
        "category": {
          "values": [
            "quant-ph"
          ]
        }
      }
    }
  ]
}

The analysis_v2 section has no particular restrictions on the type or number of components declared in its body. However, this feature should be used with care: moving too many component definitions to project descriptor will make individual requests more difficult to understand (because the request will no longer be self-contained).

If in doubt, one can request the fully resolved request for inspection using the output.request property. This is what the fully resolved request looks like for the above example (note common component definitions copied from the project descriptor):

"request": {
  "components": {
    "labelFilter": {
      "type": "labelFilter:autoStopLabels",
      "minCoverage": 0.4,
      "removalStrength": 0.35
    },
    "fields": {
      "type": "featureFields:simple",
      "fields": [
        "title$phrases",
        "abstract$phrases"
      ]
    },
    "contentFields": {
      "type": "contentFields:simple",
      "fields": {
        "id": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "title": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "abstract": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        },
        "category": {
          "maxValues": 3,
          "maxValueLength": 250,
          "truncationMarker": "…",
          "valueCount": false,
          "highlighting": {
            "enabled": true,
            "startMarker": "⁌%s⁍",
            "endMarker": "⁌\\%s⁍"
          }
        }
      }
    }
  },
  "stages": {
    "documentContent": {
      "type": "documentContent",
      "documents": {
        "type": "documents:byQuery",
        "limit": 3,
        "query": {
          "type": "query:string",
          "queryParser": {
            "type": "queryParser:project",
            "queryParserKey": ""
          },
          "query": "photon"
        },
        "accurateHitCount": false,
        "requireScores": true
      },
      "fields": {
        "type": "contentFields:reference",
        "use": "contentFields",
        "auto": false
      },
      "queries": {},
      "start": 0,
      "limit": "unlimited"
    }
  },
  "output": {
    "progress": true,
    "request": true
  }
}

Finally, note that if the analysis request contains component identifiers identical to those contained in the shared analysis_v2 block, the components declared in the request take precedence.