Creating the Corpus
As part of the development of a method to weight the ‘spatial-ness’ of language, we developed a set of syntactic templates to describe the patterns of words that are commonly found in spatial language, based on a number of categories or spatial words.
The idea to develop a spatial corpus came out of existing research into natural language, together with the outcomes of our previous work under the natural language discovery and query interface strand of the EuroGEOSS research. We wanted to extend our previous work that used a restricted natural language to consider a more complete view of natural language that would be more intuitive for human interaction with geographic information systems, specifically in the context of spatial querying. In order to inform and focus this new work, we are collecting a set of examples of the ways in which spatial location is described using everyday natural language. The corpus only contains English language examples, but has been developed using an approach that could be extended to other languages in the future. A range of English dialects are covered, including American, British, Australian and New Zealand English.
The corpus content only contains written language, and is harvested from a number of web sources ranging from news, tourism and travel to heritage. Over time, we aim to broaden the representativeness of the corpus content in order to create a resource to support a range of geospatial language research, and we welcome any contributions from other parties.
The process of creating the corpus is described in more detail here:
Stock, K., Pasley, R.C., Gardner, Z., Brindley, P., Morley, J. and Cialone, C. (2013) Creating a corpus of geospatial language. COSIT 2013: Conference on Spatial Information Theory, Scarborough, UK, 2-6 September 2013. [View draft PDF]