Representation of a textual document to be clustered.

Namespace: Org.Carrot2.Core
Assembly: Org.Carrot2.Core.NET (in Org.Carrot2.Core.NET.dll) Version:


public sealed class Document


A document is a collection of fields. For the needs of clustering, a document will typically consist of a document title and content.

Document content will usually vary based on what data is available: document abstract, first few paragraphs, contextual snippet returned by the search engine, full document text. Please note that full-text clustering may take significantly more time than e.g. snippet- or abstract-based clustering, while not always providing better results.

Providing document titles is optional, but, if available, highly recommended. Clustering algorithms usually give more weight to document titles to improve the clustering quality.

Optionally, an URL pointing to the source of the document and the document's language can also be provided as hints for clustering algorithms.

Inheritance Hierarchy


See Also