COLLATE Text editing software

Collate was developed by Peter Robinson for the collation, analysis and publication of texts preserved in multiple witnesses. The current version of the software can handle up to 2000 versions of a different text. Collate has a regularization tool which can be used to produce a file containing word equivalences without altering the original transcription files. The software uses a light tagging system which can, at a later stage, be converted to XML. Collate can produce output files for paper-based editions or electronic publications. The following are examples of projects which are currently using Collate:

  • Canterbury Tales Project, directed by Peter Robinson.
  • Monarchia Project, directed by Prue James.
  • Commedia Project, directed by Prue James.
  • Cancioneros Project, directed by Dorothy Severin.
  • Nestle-Aland 28 (the electronic version of the Nestle-Aland Greek New Testament), based at the Institut für neutestamentliche Textforschung INTF.
  • Parzival Project, directed by Michael Stolz



eLaborate is a content management system (CMS) for collaborative work on digital editions of texts. Addtional to common web editing functionality found in content management systems in general, eLaborate offers specialized content objects to create transcriptions for uploaded facsimilé and annotations on the transcription text.

eLaborate let’s individual users or user groups create collaborative projects around digitized texts. To apply for a new project space please refer to joris_van_zundert@huygensinstituut_knaw_nl. (Note: for anti spam reasons you will need to replace all underscores (‘_’) in the emailaddress with ‘.’).

eLaborate is an initiative of the Huygens Institute for literary and intellectual history in the Low Countries and is funded by the Dutch Royal Academy for Arts and Sciences (KNAW).

eLaborate is created as a 100% on line application. This means that there’s no need for additional installs or off line components besides a common web browser (preferably Firefox 1.x, but Internet Explorer 5.0+ is also fully supported).

eLaborate may be used from it’s originating server at (mind the dash in the web address). Alternatively the eLaborate software and components may be installed on any server to function as a separately administered instance of the collaboratory.

eLaborate has been creating and will be further developed using only open source components. Currently eLaborate is a unix/Java/MySql/AJAX (Web 2.0) solution. Whenever possible and applicable we adhere to common open standards (XML, OAI etc.). eLaborate is build using an agile developement proces closely oriented towards eXtreme Programming.


Multidoc SGML Browser

The Multidoc SGML browser was a commercial browser by Citec that was used to display and style SGML documents on the fly. The browser was discontinued in 2000, when the licence for the Synex Viewport SGML/HyTime browser engine upon which it was based expired. Before it was discontinued, the browser was used by several humanities computing projects, including the first two CD-ROMs in SEENET’s Piers Plowman Electronic Archive.

The browser was quite advanced for its time. It had sophisticated searching and styling capabilities: searching could be done by text or SGML context; SGML documents were styled using external stylesheets that were themselves SGML documents (thus anticipating subsequent developments in XSL). Elements in the SGML document were styled using a template model, moreover, and could be controlled using variable and contextual expressions including a primitive precursor to XPATH. Although it was not originally designed to do so, the stylesheet language and engine proved adept at effecting SGML to HTML translations (See for a description of this method).

Although it can be considered in some sense an SGML precursor of XSL, the Multidoc Stylesheet model differed from XSL differed from the later language in several important respects. Like CSS, it was conceived primarily of as a means of associated style with specific elements. It did not construct a model of the input or export documents, and, as a result, could not be used to effect true ‘transformations: with a few exceptions (mainly for note-type elements), elements could not be moved, copied, or otherwise reordered from their position in the input document. There was also no requirement that the output text be valid SGML, XML, or HTML–or indeed direct method of exporting output in any of these formats (in actual practice, SGML, XML, or HTML could be exported by printing-to-file from a generic/plain text print driver and saving the result with the correct extension.

TUSTEP (TUebingen System of TExt processing Programs)

TUSTEP is a professional toolbox for scholarly processing textual data (including those in non-latin scripts) with a strong focus on humanities applications.

Designed in cooperation with many humanities projects by the Division of Literary and Documentary Data Processing at the Computing Center of the University of Tübingen and first implemented more than 25 years ago, TUSTEP is constantly being improved and expanded in order to facilitate solutions for new problems and to take advantage of new hardware and operating systems. It contains modules for all stages of scholarly text data processing, starting from data capture and including information retrieval, text collation, text analysis, sorting and ordering, rule-based text manipulation, and output in electronic or conventional form (including typesetting in professional quality).
Beyond the University of Tübingen, TUSTEP is currently used in roughly 100 other universities and research institutions (a list is available on the web page of the International TUSTEP User Group ITUG)

Articles and tutorials

Modularity, Professionality, Integration: Design principles for TUSTEP

Text data processing with TUSTEP: overview, hints

Current Version


Home pages



T-PEN (Transcription for Paleographical and Editorial Notation)

T‑PEN (transcription for paleographical and editorial notation) is a web-based tool for working with images of manuscripts. Users attach transcription data (new or uploaded) to the actual lines of the original manuscript in a simple, flexible interface.



T-PEN automatically recognizes columns and lines. This automatical layout segmentation can be modified by the users before transcribing.



  • is an open and general tool for scholars of any technical expertise level
  • allows transcriptions to be created, manipulated, and viewed in many ways
  • collaborate with others through simple project management
  • exports transcriptions as a pdf, XML(plaintext) for further processing, or contribute to a collaborating institution with a click
  • respects existing and emerging standards for text, image, and annotation data storage
  • avoids prejudice in data, allowing users to find new ways to work

As of April 2014, it provides access to more than 4000 manuscripts (e.g. links with e-codices), either publicly available or on restricted access within specific projects.


T-PEN version 2.0 was launched in May 2012, with new features (1. Users can now upload their own image set for transcriptions; 2. T-PEN now fully supports crowd-sourcing projects; 3. T-PEN has been providing access to support tools for transcribers; 4. an additionnal feature is still experimental: Glyph matching, a paleographical analytical tool into T-PEN).