Dataset: Gale Cengage. NEL and geoparsing results: Asko Nivala.


Geoparsing NCCO: The Corvey Collection


Most texts are spatial implying a network of places. This was especially typical of the Romantic era (1790s–1840s) that was characterised by the growing interest in historical and natural sites. What kind of imagined geography was constructed in the Romantic era fiction? In this example, I have explored this question by applying the methods of the digital humanities. This interactive map is based on 7,456 English and 2,006 German digitised volumes originally published between the 1790s to 1840s – almost 10,000 documents in total.

Territory of the Princely Abbey of Corvey in the 18th century. (Wikimedia Commons)
Fürstliche Bibliothek Corvey

The Fürstliche Bibliothek Corvey is located in the Princely Abbey of Corvey near Höxter (North Rhine-Westphalia, Germany). The Corvey collection is one of the world's largest collections of Romantic literature and English early nineteenth-century popular fiction including many uncommon and unique works. Its discovery in the 1980s caused a sensation among the book historians. Gale Cengage has digitised a portion of the Corvey Collection, which has been included as a module of their Nineteenth Century Collections Online. NCCO: European Literature, the Corvey Collection, 1790–1840 dataset used in this map was licensed from Gale Cengage by the University of Turku.

Named-entity linking

The interactive map shows named entities in NCCO: European Literature, the Corvey Collection. I have run DBpedia Spotlight NEL (named-entity linking) algorithm on the OCR'ed English and German documents included in the Corvey Collection. NEL is able to recognise many kinds of entities from unstructured texts, but I have filtered only the toponyms mentioned in the corpus. After that, I have linked these place names with DBpedia and Wikidata. Finally, I have geocoded the toponyms by fetching their coordinates from DBpedia and plotted the results on the map.

Although I believe that this data provides a reasonably accurate overview of space imagined in the Corvey Collection, it should be noted that the NEL data includes some errors. The texts in the corpus have been digitised using OCR (optical character recognition), which produces noise (i.e., false characters). Furthermore, the accuracy of DBpedia Spotlight is far from perfect, which poses limitations to the results: some toponyms have been linked to a wrong entity. This is reflected in the "support" value of the hits: a low support value signifies a high probability that the entity might be wrongly recognised.

In the future, I am planning to add more facets to this maps, which will enable filtering the results for a specific date range, entity class, text title or author.

Romantic Cartographies (ROMCAR) is a digital humanities project by Asko Nivala. The project provides new interpretations on English and German Romantic texts by focusing on spatiality.

(C) 2022 Asko Nivala