Corpus Release: Corpus OVI dell’Italiano antico

A new version of Corpus OVI dell’italiano antico is now available online! After this update, this corpus consists of 1978 texts with 21,817,929 words, 443,810 different word forms, 116,224 lemmas and 3,615,478 lemmatized occurrences.

Corpus TLIO aggiuntivo

For not yet lemmatized texts awaiting inclusion in the Corpus OVI, an additional corpus has been created, the Corpus TLIO aggiuntivo, which at present contains 306 texts with 1,189,808 words and 71,900 different word forms.

Archivio Datini

In collaboration with the Archivio di Stato of the Tuscan town of Prato, OVI has developed a lemmatized database containing all published letters (3000 texts with 1,100,987 words and 50,139 different word forms, 7,591 lemmas and 146,741 lemmatized occurrences) in the archive of the great Tuscan merchant Francesco di Marco Datini (1335-1410).


Corpus ARTESIA, created by University of Catania, is hosted on the OVI server. It consists of 239 early Sicilian texts, with currently 1,025,367 words.

Further informations

Consiglio Nazionale delle Ricerche

Institute Opera del Vocabolario Italiano

Firenze, via di Castello 46


tel. +39 055 452841

fax +39 055 452843


Posted by: Giulio Vaccaro (

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s