Polish historical corpus

The corpus uses data made available by PSNC Digital Libraries Team, namely a set of full text versions of selected Polish historical documents from four digital libraries in Poland. The texts has been prepared in the framework of the IMPACT project and used as so called Ground-Truth for evaluation and training of OCR programs.

The Poliqarp search engine provides access to two versions of the IMPACT Polish GT corpus: so called one-dimensional and two-dimensional. Together with some dictionaries of Polish they are available on the Poliqarp server at http://poliqarp.wbl.klf.uw.edu.pl/.

PPEA (Piers Plowman Electronic Archive)

The Piers Plowman Electronic Archive is publishing a collective edition and archive of Piers Plowman.

Project Publications

Adams, Robert, Hoyt N. Duggan, Eric Eliason, Ralph Hanna III, John Price-Wilkin, and Thorlac Turville-Petre, eds. 2000. The Piers Plowman Electronic Archive, vol. 1: Corpus Christi College, Oxford MS 201 (F). SEENET, series A.1. Ann Arbor: SEENET and University of Michigan Press.

Duggan, Hoyt N., and Ralph Hanna, eds. 2004. The Piers Plowman Electronic Archive, Vol. 4: Oxford, Bodleian Library MS Laud Misc. 581 (S. C. 987) (L) SEENET, series A.6. Ann Arbor: SEENET and University of Michigan Press.

Turville-Petre, Thorlac, and Hoyt N. Duggan, eds. 2000. The Piers Plowman Electronic Archive, vol. 2: Cambridge, Trinity College, MS B.15.17 (W). SEENET, series A.2. Ann Arbor: SEENET and University of Michigan Press.

OTA (Oxford Text Archive)

The Oxford Text Archive hosts AHDS Literature, Languages and Linguistics. The OTA works closely with members of the Arts and Humanities academic community to collect, catalogue, and preserve high-quality electronic texts for research and teaching. We actively support the aims of the Digital Medievalist Project. The OTA is always interested in deposits of electronic resources from medievalists (and other subject areas). For more information e-mail info@ota.ox.ac.uk

Nomen et Gens



Nomen et gens is an interdisciplinary research project in which historians and linguists work together. The centerpiece of the project is a database which contains information concerning the prosopography and the onomastics of continental Europe in the Early Middle Ages.

  • Geographical coverage: continental Europe
  • Date range: 4th century to the 8th century AD
  • Material included: narrative and documentary sources, inscriptions, names on coins

The data has been gathered peripherally since the mid-nineties and shall now be made publicly available online. Currently, the publicly accessible areas of this database feature only a portion of the material. Published data so far:

• date range: persons and names stemming from the period between 650 to 750 AD • over 10.000 single references of personal names in ca. 300 sources • ca. 3.800 particular identified persons • ca. 1.700 linguistic lemmata of personal names

You may find a list of the already processed sources on the project homepage ([1]). In contrast, the internal sections do not only contain considerably more references, but also provide more details. Of course, the publicly accessible area is going to increase since we are processing selected data for online publication. The primary objective of the project is to enhance our comprehension of the transformation of the Roman World at the transition from Late Antiquity to the Early Middle Ages. For that purpose we make available personal names, which have not yet been taken into account as historic-cultural or etymological sources. Furthermore, a prosopography of the continental European gentes ranging from the 4th century to the 8th century AD is being developed.


  • Languages: German, English
  • Disciplines: History, Prosopography, Linguistic

Links and references

  • H. Ebling, J. Jarnut, G. Kampers: Nomen et gens. Untersuchungen zu den Führungsschichten des Franken-, Langobarden- und Westgotenreiches, in: Francia 8 (1980), S. 687-745.
  • D. Geuenich, I. Runde (Hgg.): Name und Gesellschaft im Frühmittelalter. Personennamen als Indikatoren für sprachliche, ethnische, soziale und kulturelle Gruppenzugehörigkeiten ihrer Träger (Deutsche Namenforschung auf sprachgeschichtlicher Grundlage, Bd. 2), Hildesheim 2006


  • Philippe Depreux
  • Dieter Geuenich
  • Hans Werner Goetz
  • Wolfgang Haubrichs
  • Jörg Jarnut
  • Gerhard Lubich
  • Steffen Patzold


Nomen et Gens Seminar für mittelalterliche Geschichte Eberhard Karls Universität Tübingen Wilhelmstraße 36 72074 Tübingen


The virtual archive Monasterium is the largest archive for medieval documents, containing more than 250 000 documents (as of Apr. 2012), as plain text, image or both.


The project Monasterium took off in the Austrian province of Lower Austria, which is rich in monasteries. From their founding in the high Middle Ages, these monasteries have stood without interruption, so that this region can boast an unbroken archival tradition. As a result of their great historical meaning, these archives guard the better part of the tradition and history of this country from the Middle Ages and early Modern Period. The strong historical relations between the monasteries and throughout the surrounding country establish the ideal conditions to realize the possibility of a virtual retrieval system of these broadly distributed sources. Spreading out from the St. Pölten episcopal archive, work on this project began with the energetic support of government and the monasteries themselves.

From project to institution The logical consequence of the project with the Lower Austrian monasteries was reaching out to the other Austrian provinces and the countries neighboring Austria. With the support of the Austrian State Ministry for Education, Art and Culture (Bundesministerium für Unterricht, Kunst und Kultur) and the European Union, Monasterium has succeeded in finding the financial support to manage a further out-reach effort. With this, the many already existing connections between the archives could finally be merged in June 2006. The Memorandum created for this has since then presented the underlying basis for collaboration in the Consortium. However, the Consortium did not intend to stand on this document permanently, and has striven to further develop itself. This lead to the November 2007 creation of a basic declaration of intent, in which the emerging network and its connected virtual archive established a more enduring common union in ICARus (International Centre for Archival Research).

Content and originality

The virtual archive Monasterium contains more than 250 000 documents (as of Apr. 2012) from more than 98 European archives. The documents are organized in 540 archival fonds and research collections. The content of the virtual archive depends on the decisions of the participants. It can vary from archive to archive, from collection to collection. Each document (mostly charters) have at least minimal metadata such as shelf mark and date and abstract. Each institution can download on Monasterium’s servers:

  • digitized images (387 000 images as of Apr. 2012, since more than one image can be related to one document)
  • full text (22 000 charters)

This platform specific features are:

  • Technical
    • Mutualized infrastructure and development for many institutions,
    • Hosting of digital images
    • Long-term preservation
  • Scientific
    • Collaboration & crowdsourcing
    • Scientific moderation through qualified experts
  • Administrative
    • Large scale visibility of local and small archives
    • Free of charge for participating institutions

The development is done at the University of Cologne for ICARUS (International Centre for Archival Research), which gathers more than 130 members in 25 countries in Europe and the Canada.

Crowdsourcing tool

Users have the possibility to transcribe the documents and to correct the plain text or descriptions. The system is moderated (expert users have to review the transcriptions before the publication) The editing tool currently is migrating from Java to Ajax (Apr. 2012).

The software behind the platform (the “Monasterium Collaborative Archive” MOM-CA) is open source. You can find the documentation at [1].

Source(s): Crowdsourcing tool