Geospatial Language
Corpus Selection Criteria
Creating the Corpus
Corpus Reading Guide
Who We Are
Search the Corpus
Search Help
Contact Us

Search Help

The corpus search tool (Concordancer) allows you to search, sort and view the contents of the corpus.

The corpus input form includes the following fields:

Identifier A unique identifier for the clause.
Search Text The text you would like to search for.  You can enter a single word, part of a word or a group of words.  The form will then return occurrences of that word in the corpus.  The exact results returned depend on whether you tick the 'Also find clauses with longer words that include the search term' (see below).
Also find clauses with longer words that include the search text

If the box is not ticked (default), the search will only return clauses that include the search text as an individual word or group of words.  If the box is ticked, it will also include clauses in which the search text forms only part of the word.  For example, if the search text is side and the box is not ticked, only clauses that contain the word side will be returned.  If the box is ticked, clauses that contain any word with side in it (e.g. beside, outside, alongside, considerable) will be returned. 

If a multiple word search text is used and the text box is ticked, clauses in which the search text appears are returned, including any words of which the text forms a part, either at the beginning or end of the search text.  For example, the search text side the would also return beside them, beside there, etc.

Number of words to display on either side of the search text You can choose to view 10 (default), 20 or 30 words on either side of the search text, or the full clause, depending on your requirements.
Sort by

You can choose to sort the displayed results by:

-identifier (default): the unique identifier for the clause, allocated consecutively when clauses areare harvested;

-score: the score calculated to represent the spatial content of the clause (higher scores indicate that the clause is more likely to contain a high number of geospatial words in grammatical constructs that indicate location descriptions;

-(0)matched term: the term that was matched to the search text, which will be identical to the search text if the 'Also find clauses with longer words that include the search text' box is not ticked, otherwise it is the longer term that matched the search text, possibly including longer words at the start or end of the search text.

-1L (etc): the word that is 1 word to the left of the matched term.  The other listed options nx indicate n words from the matched term in x direction (L=left, R=right).

The results of the search are presented as follows:


Column Description
Identifier The unique identifier for the clause. Identifiers are allocated in order of clause harvesting.

The text, including the search term and the number of words on either side specified in the 'Number of words to display on either side of the search text' drop down list.  By default, 10 words are displayed on either side of the search term.  The matched item (search term or longer words including the search term) is shown in red.  The sort term is shown in green, unless is is the matched item, in which case it is shown in red.

Full Clause The symbol displayed beside each returned clause has the full clause as a tooltip.  You can hover over the symbol to see the full clause.  Thus there are two ways to view the full clause: (1) by choosing to display full clauses in the 'Number of words to display on either side of the search text' drop down list and (2) by hovering over the full clause symbol.  It is also possible to see the full clause by clicking on the Source URL, but this does not always display the full clause as discussed below under the 'Source' column.

The score calculated during the harvesting process, and calculated according to the method described here.  In summary, clauses are scored for their geospatial content based on the number of words that are likely to be geospatial that appear in particular grammatical structures, giving an indication of the likelihood of geospatial content of the sentence, but not a guaranteed quantitative value, since most words can be used in both geospatial and non-geospatial contexts.

Date Harvested The date on which the clause was harvested from its web site.
Source The web site from which the clause was harvested.  In some cases, the clause may still be found on the web site, so the link can be used to view the clause in its full context.  However, in some cases web pages have moved or been changed, and the clause no longer appears on the page, or the page is not available.