Corpus Reading Guide

The corpus contains clauses of spatial language, not entire documents.  This is because in large documents, spatial language may only make up a small proportion of the document, and it can be difficult to pinpoint the sentences that contain spatial language.  However, we provide links to the entire documents through which they can be accessed, provided the content remains on the web site from which it was harvested.

The corpus contains the following columns:

Identifier A unique identifier for the clause.
Score The score assigned by our automated method, using the scoring method described in this paper.  This only provides an indication of the spatial content of the clause, and the clauses are manually filtered, as our automated method has some shortcomings, discussed in the paper.
Source The web site from which the content was harvested.
Date Harvested The date on which the content was harvested.
Text The actual geospatial clause.