Polish historical corpus

The corpus uses data made available by PSNC Digital Libraries Team, namely a set of full text versions of selected Polish historical documents from four digital libraries in Poland. The texts has been prepared in the framework of the IMPACT project and used as so called Ground-Truth for evaluation and training of OCR programs.

The Poliqarp search engine provides access to two versions of the IMPACT Polish GT corpus: so called one-dimensional and two-dimensional. Together with some dictionaries of Polish they are available on the Poliqarp server at http://poliqarp.wbl.klf.uw.edu.pl/.