documentLabels
The documentLabels
stage retrieves labels contained in each of the documents you provide on input. Use
the output of this stage for display purposes. To collect or aggregate labels from multiple documents, use
labels:fromDocuments
instead.
documentLabels
stage results only for presentation purposes.
If you need to collect an aggregate list of labels occurring in a set of documents, use the
labels:fromDocuments
stage.
documentLabels
Selects and returns the best-ranking labels for each document.
{
"type": "documentLabels",
"documents": {
"type": "documents:reference",
"auto": true
},
"labelCollector": {
"type": "labelCollector:topFromFeatureFields",
"fields": {
"type": "featureFields:reference",
"auto": true
},
"labelFilter": {
"type": "labelFilter:reference",
"auto": true
},
"labelListFilter": {
"type": "labelListFilter:truncatedPhrases"
},
"minTf": 0,
"minTfMass": 1,
"tieResolution": "AUTO"
},
"limit": "unlimited",
"maxLabels": 10,
"start": 0
}
For example, the following request selects the top 3 documents matching the query photon and retrieves up to 3 most frequent labels contained in each document:
{
"stages": {
"documents": {
"type": "documents:byQuery",
"query": {
"type": "query:string",
"query": "photon"
},
"limit": 3
},
"documentLabels": {
"type": "documentLabels",
"maxLabels": 3
}
}
}
The response for the above request contains an array with entries referencing each document and listing top-scoring labels in that document:
"documentLabels": {
"documents": [
{
"id": 188201,
"labels": [
{
"label": "photon",
"weight": 16
},
{
"label": "photon-jet",
"weight": 5
},
{
"label": "e⁺e",
"weight": 5
}
]
},
{
"id": 62168,
"labels": [
{
"label": "photon-photon",
"weight": 3
},
{
"label": "photon-proton",
"weight": 3
}
]
},
{
"id": 252264,
"labels": [
{
"label": "photon",
"weight": 4
},
{
"label": "virtual",
"weight": 4
}
]
}
]
}
documents
A mandatory reference to any documents:* component or stage, providing documents for which labels should be retrieved.
labelCollector
A labelCollector:* component used to collect and score labels in each document.
limit
The maximum number of documents to return labels for.
The value must be an integer >= 0 or the string
unlimited
, in which case the stage will return labels for all documents returned by the
documents component.
maxLabels
Maximum number of labels to return (per document). The actual number of labels may exceed this limit if the tail of the label list has ranking score ties: in this case all labels with the same score will be returned.
start
If greater than zero, skips over the initial number of documents returned by the documents component.