Poliqarp for DjVu

Poliqarp for DjVu is an open-source search engine software for DjVu corpora available on GNU GPL license, developped by Janusz S. Bień at the University of Warsaw. It relies on the DjVu format and allows to present end-users with results of advanced language technologies.

Conceived as a modification of the Poliqarp (Polyinterpretation Indexing Query and Retrieval Procesor) corpus query tool, it inherits from its origin the powerfull search facilities based on two-level regular expressions, which can be used in the queries to circumvent the OCR errors, but also the ability to represent low-level ambiguities and other linguistic phenomena. It delivers highlighted results and KWIC search results.

Although at present the tool is used mainly to facilitate access to the results of dirty OCR, it is ready to handle also more sophisticated output of linguistic technologies.

Poliqarp for DjVu is in particular used for a non-medieval corpus (corpus of historical Polish (since 1570 to 1756), with issues related to medieval corpus (spelling, abbreviations, etc.)

The software can be used for the scans of “Lexicon Mediae et Infimae Latinitatis Polonorum” (http://rcin.org.pl/publication/15584) prepared at the cost of the European Fund of Regional Development under the framework of Operational Programme – Innovative Economy, Priority Ax 2. Investment projects relating to development of information infrastructure of science within the 2.3.2 sub-action – The Projects in the area of development of information resources of science in a digital form. Unfortunately, although at first the scans have been available freely “to all for their own use, for scientific, educational or teaching purposes”, since February 2013 the access to them is severely limited: “Publication accessible in the Institute of Polish Language of the Polish Academy of Sciences network for their [?] own use, for scientific, educational or teaching purposes”.
Source(s): Software solutions

Presentation of the tool