Raffaela Tabacco, University of Eastern Piedmont, Humanistic Department – Vercelli
The aim of the project is the creation of an open collection of textual data and the publication of such data on the Web. The project is designed to be open to the contribution of other scholars, not simply in the three years of its development, but also in the future.
The project digilibLT aims to continue a similar project by the Packard Humanities Institute (PHI): a digital library of Latin literary texts. The so-called PHI CD-ROM 5.3 offers a very important but selective collection of Latin texts and is the most widely used database of Latin texts for the earlier period (from the origins to the I century AD). In order to complete the database, our project is planning to digitize the Latin literary texts written in the period from the second to the fifth century AD. The texts will be tagged according to both stateof-the-art standards (XML), and to the standard adopted in the PHI CD-ROM 5.3. The digital library digilibLT will also include a number of important critical studies and editions which are now out of copyright.
The final product is going to be a rich, accessible website, which will offer: digitized latin texts; bio-bibliographical notes about authors and works; digitized historical, critical, literary studies about the late-antique Latin authors and works; full text search in the digitized texts (also with an interface specifically designed for mobile devices); selection and download of any text, by content criteria (word search) or by filing criteria (genre, period, and so on).
Technical Scientific Objectives
The project digilibLT aims to continue a similar project by the Packard Humanities Institute (PHI): a digital library of Latin literary texts.
The so-called PHI CD-ROM 5.3 offers a very important but selective collection of Latin texts and is the most widely used database of Latin texts for the earlier period (from the origins to the I century AD). The Packard project has been closed and the digitization of Latin texts has stopped at the II century A.D. For late-antique Latin texts (II-V century AD) no comprehensive textual database exists and the scholarly community must depend on incomplete and unreliable digital resources which have been sometimes prepared without critical study of the texts. Late antique culture in particular is understudied, even if it was especially important in transmitting the cultural, scientific, political, philosophical, literary and religious thought of antiquity to modern Europe.
The project DigilibLT will then digitize the texts of II-V century AD and will tag them according to both state-of-the-art standards (XML), and to the standard adopted in the PHI CD-ROM 5.3 in order to allow the scholars to continue using the existing PHI-oriented text analysis tools (such as Diogenes, Musaios, and others). The digital library will also include a number of important critical studies and editions which are now out of copyright.
The scholars who study the classical word, who make use of digital tools for their research and who are interested in the development of such tools constitute a large and very active community, and are especially attentive to the communication and the exchange of experiences and information. The scholarly community will see the continuation and completion of the collection as an essential tool for the study and the valorisation of Latin culture. Latin civilization originated in the geographical setting we now call „Italy‟, but has a worldwide relevance for its cultural, literary, and political impact.
The project is potentially capable of influencing research and teaching in a very significant way at the international level, both for the importance of the texts that will be offered to the scholarly community and for the ease of use provided by the technology we are planning to use.
A very large worldwide community of scholars makes everyday use of Latin digitized texts. This is clear for instance from the very large debate on Latin text that can be found in the web-based discussion group ‘Humanist’ (http://www.digitalhumanities.org/humanist/). The lively debate on the quality, the characteristics, and the coding standards of digitized texts witnesses the importance of the use of digital texts in everyday research and teaching. Conferences on these topics take places every year (Digital Humanities) or every two years (JADT); journals such as ‘Literary and Linguistic Computing’ demonstrate the centrality of this aspect of research on literary texts. The availability of digital texts will open up a number of possibilities for research on late antiquity. A thorough linguistic analysis of late-antique Latin texts is at present lacking, and is made very difficult by the lack of a reliable textual database. Such an analysis is especially important because of the great changes the Latin language underwent in that period. Editors of late-antique Latin texts have often chosen to normalize the language, making late-antique Latin similar to the classical language. While it is true that several late-antique authors aimed at reproducing classical style and language, many did not conform to it. The existence of a database will make it easier for future editors to recognize linguistic change and to find linguistic parallels for non-classical language. The literary study of late-antique authors will also be affected: scholars will find it much easier to trace literary influences and intertextual relationships.
The availability of digital texts will also be important for higher-level undergraduate students (Laurea specialistica) and for PhD students. They will be able to prepare dissertations and theses making use of the scholarly texts that will be offered on line; they will also use the textual database in order to obtain a better knowledge of the Latin language, and of the peculiarities of the Latin language of late antiquity. They will also be able to use the XML texts, modifying them according to the results of their research, if needed. They can prepare new editions of the texts, study the style and language of the authors, and prepare indexes and concordances of the texts. Digital editions will be also very useful in the case of texts of disputed authorship: statistical linguistic analysis can be applied to make progress in the dispute.
The diffusion of texts and of the critical studies on them is the best form of promotion. It is also a form of conservation: worldwide diffusion of digital texts among scholars will preserve their survival. Finally, the operative process, as defined in this digitization project, is potentially important for other types of texts and other literary traditions. The method used for scanning and for preparing optical character recognition (OCR) of scanned texts has the potential of being fruitfully used for other texts. Our research team will perform the task of optical character recognition (OCR) twice, using two different techniques (starting from a single scanned image of each text). The two versions will be checked against each other. The machinery and the software can be used again in the future for other digitization projects, not just by our research team but also by other researchers.
Description of Resources
The team will make use of two scanners uniquely dedicated to speedy and accurate digitization of printed books. State-of-the-art scanners offer truly excellent performances and do not damage the physical integrity of printed books. The scanned images will be analysed using two different types of optical character recognition (OCR) software. The texts will be thus corrected using a method similar to the one called ‘double-keying’.
The research team will display the results of its work on a rich, accessible website which will offer:
- digitized Latin texts;
- bio-bibliographical notes about authors and works;
- digitized historical, critical, literary studies (not covered by DRM) about the late-antique latin authors and works;
- full-text search (also with an interface specifically designed for mobile devices);
- download of any text, selecting it by content criteria (word search) or by filing criteria (genre, period, and so on), or others.