Analysis (API v2)
The analysis_v2
section configures the default components you can
reference in
API v2 analysis request JSONs.
This section can be used to conveniently store shared definitions of components that are frequently referenced in analytical requests. For example, such defaults could include a set of content fields or feature fields. This is particularly useful with auto references because the defaults would be resolved automatically, without providing any explicit reference links.
Here is an example shared component section from the dataset-arxiv project.
"analysis_v2": {
"components": {
"fields": {
"type": "featureFields:simple",
"fields": [
"title$phrases",
"abstract$phrases"
]
},
"contentFields": {
"type": "contentFields:simple",
"fields": {
"id": {},
"title": {},
"abstract": {},
"category": {}
}
},
"labelFilter": {
"type": "labelFilter:composite",
"labelFilters": {
"auto": {
"type": "labelFilter:autoStopLabels"
},
"project": {
"type": "labelFilter:dictionary",
"exclude": [
{
"type": "dictionary:all"
}
]
}
}
}
}
}
It defines three shared components, with the following identifiers:
fields
, contentFields
and labelFilter
. An analysis request could reference
any of these components by their identifier or, perhaps more intuitively, rely on automatic reference resolution
based on the required component type, as shown in the example below.
This request uses the documentContent
stage to fetch
content fields for the top three documents matching the query photon. Note the absence of the
fields
property: it is resolved automatically and points at the contentFields
component declared in the
analysis_v2
section of the project descriptor.
{
"stages": {
"documentContent": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
}
}
},
"output": {
"request": true
}
}
The above request returns the following output:
"documentContent": {
"documents": [
{
"id": 482237,
"fields": {
"id": {
"values": [
"hep-ph/9406370"
]
},
"title": {
"values": [
"Jets in ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collisions"
]
},
"abstract": {
"values": [
" We study jet production in ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ reactions at the next-to-leading logarithm accuracy. The discussion of the theoretical uncertainties and the role of the quark and gluon distributions in the ⁌q⁍photon⁌\\q⁍ is emphasized. The phenomenology at TRISTAN…"
]
},
"category": {
"values": [
"hep-ph"
]
}
}
},
{
"id": 298152,
"fields": {
"id": {
"values": [
"1601.01144"
]
},
"title": {
"values": [
"Studying 750 GeV Di-⁌q⁍photon⁌\\q⁍ Resonance at ⁌q⁍Photon⁌\\q⁍-⁌q⁍Photon⁌\\q⁍ Collider"
]
},
"abstract": {
"values": [
" Motivated by the recent LHC discovery of the di-⁌q⁍photon⁌\\q⁍ excess at the invariant mass of 750 GeV, we study the prospect of investigating the scalar resonance at a future ⁌q⁍photon⁌\\q⁍-⁌q⁍photon⁌\\q⁍ collider. We show that, if the di-⁌q⁍photon⁌\\q⁍ excess observed at the…"
]
},
"category": {
"values": [
"hep-ph",
"hep-ex"
]
}
}
},
{
"id": 275187,
"fields": {
"id": {
"values": [
"1607.03678"
]
},
"title": {
"values": [
"Two-⁌q⁍photon⁌\\q⁍ interference of temporally separated ⁌q⁍photons⁌\\q⁍"
]
},
"abstract": {
"values": [
" We present experimental demonstrations of two-⁌q⁍photon⁌\\q⁍ interference involving temporally separated ⁌q⁍photons⁌\\q⁍ within two types of interferometers: a Mach-Zehnder interferometer and a polarization-based Michelson interferometer. The two-⁌q⁍photon⁌\\q⁍ states are…",
"…arms by introducing a large time delay between two input ⁌q⁍photons⁌\\q⁍; this state is composed of two temporally separated ⁌q⁍photons⁌\\q⁍, which are in two different or the same spatial modes. We then observe two-⁌q⁍photon⁌\\q⁍ interference fringes involving both…",
"…the interference of path-entangled two-⁌q⁍photon⁌\\q⁍ states simultaneously in a single interferometric setup. The observed two-⁌q⁍photon⁌\\q⁍ interference fringes provide simultaneous observation of the interferometric properties of the single-⁌q⁍photon⁌\\q⁍ and…"
]
},
"category": {
"values": [
"quant-ph"
]
}
}
}
]
}
The analysis_v2
section has no particular restrictions on the type or number of components declared in
its body. However, this feature should be used with care: moving too many component definitions to project
descriptor will make individual requests more difficult to understand (because the request will no longer be
self-contained).
If in doubt, one can request the fully resolved request for inspection using the
output.request
property. This is what the fully resolved request looks like for the above example (note common component
definitions copied from the project descriptor):
"request": {
"components": {
"labelFilter": {
"type": "labelFilter:autoStopLabels",
"minCoverage": 0.4,
"removalStrength": 0.35
},
"fields": {
"type": "featureFields:simple",
"fields": [
"title$phrases",
"abstract$phrases"
]
},
"contentFields": {
"type": "contentFields:simple",
"fields": {
"id": {
"maxValues": 3,
"maxValueLength": 250,
"truncationMarker": "…",
"valueCount": false,
"highlighting": {
"enabled": true,
"startMarker": "⁌%s⁍",
"endMarker": "⁌\\%s⁍"
}
},
"title": {
"maxValues": 3,
"maxValueLength": 250,
"truncationMarker": "…",
"valueCount": false,
"highlighting": {
"enabled": true,
"startMarker": "⁌%s⁍",
"endMarker": "⁌\\%s⁍"
}
},
"abstract": {
"maxValues": 3,
"maxValueLength": 250,
"truncationMarker": "…",
"valueCount": false,
"highlighting": {
"enabled": true,
"startMarker": "⁌%s⁍",
"endMarker": "⁌\\%s⁍"
}
},
"category": {
"maxValues": 3,
"maxValueLength": 250,
"truncationMarker": "…",
"valueCount": false,
"highlighting": {
"enabled": true,
"startMarker": "⁌%s⁍",
"endMarker": "⁌\\%s⁍"
}
}
}
}
},
"stages": {
"documentContent": {
"type": "documentContent",
"documents": {
"type": "documents:byQuery",
"limit": 3,
"query": {
"type": "query:string",
"queryParser": {
"type": "queryParser:project",
"queryParserKey": ""
},
"query": "photon"
},
"accurateHitCount": false,
"requireScores": true
},
"fields": {
"type": "contentFields:reference",
"use": "contentFields",
"auto": false
},
"queries": {},
"start": 0,
"limit": "unlimited"
}
},
"output": {
"progress": true,
"request": true
}
}
Finally, note that if the analysis request contains component identifiers identical to those contained in the shared
analysis_v2
block, the components declared in the request take precedence.