Corpus Release: Corpus OVI dell’Italiano antico

A new version of Corpus OVI dell’italiano antico is now available online! After this update, this corpus consists of 1978 texts with 21,817,929 words, 443,810 different word forms, 116,224 lemmas and 3,615,478 lemmatized occurrences.

Corpus TLIO aggiuntivo

For not yet lemmatized texts awaiting inclusion in the Corpus OVI, an additional corpus has been created, the Corpus TLIO aggiuntivo, which at present contains 306 texts with 1,189,808 words and 71,900 different word forms.

Archivio Datini

In collaboration with the Archivio di Stato of the Tuscan town of Prato, OVI has developed a lemmatized database containing all published letters (3000 texts with 1,100,987 words and 50,139 different word forms, 7,591 lemmas and 146,741 lemmatized occurrences) in the archive of the great Tuscan merchant Francesco di Marco Datini (1335-1410).

Corpus ARTESIA

Corpus ARTESIA, created by University of Catania, is hosted on the OVI server. It consists of 239 early Sicilian texts, with currently 1,025,367 words.

Further informations

http://www.vocabolario.org

Consiglio Nazionale delle Ricerche

Institute Opera del Vocabolario Italiano

Firenze, via di Castello 46

I-50141

tel. +39 055 452841

fax +39 055 452843

e-mail ovi@ovi.cnr.it

Posted by: Giulio Vaccaro (piovanoarlotto@gmail.com).

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s