The ANR/COSMAT project seeks a 14 month developer with strong expertise in XML document processing and web based services to work on the specification, development and deployment of an XML document workflow allowing the test of automatic translation software together with scientific documents as available in publication repositories.
COSMAT is a collaborative project of INRIA, together with the Systran Company and the Université du Maine (Le Mans) aiming at improving the quality of translation services for scientific documents.
The successful candidate will closely work with the person responsible at INRIA for the Cosmat project in Berlin (DE) as well as the IT support group of INRIA in Lyon-Grenoble (FR).
The designated person will have to carry out the following tasks:
· specification for the Cosmat interchange format by means of a TEI/ODD representation;
· design of a web service to the HAL publication repository to generate meta-data and full text information in the appropriate format (XML/TEI for meta-data);
· deploy and adapt an existing pdf to XML module to a) allow it to interoperate with the publication repository, and b) to be trained according to new data samples (e.g. document collections of a given format or research domain);
· study integration mechanisms to allow the usage of the pdf to XML processor within the publication archive in the context of author’s deposit;
· contribute to the maintenance of the technical TEI infrastructure that will be used for the project.
We are looking for a person with a double profile in computer science and semi-structured document processing. The candidate must have a fluent knowledge of Java and past experience in service oriented architectures. Acquaintance with OAI/PMH interfaces and PHP (Send platform) would help. A strong understanding of XML modelling methods and related technologies is essential. Knowledge of the TEI is a plus.
The position is administratively situated in Saclay near Paris, regular stays in Berlin are to be expected. A close interaction with international standardisation activities is likely to make the job attractive for anyone wishing to acquire a wide view on document representation and management.
According to experience net salaries may range between 1 881,06 € and 2 484,83 € monthly.
· INRIA: Institut National de la Recherche en Informatique et Automatique, the French research institution dedicated to computer science and applied mathematics (www.inria.fr)
· ANR: Agence Nationale de la Recherche, French national funding agency (www.agence-nationale-recherche.fr/)
· HAL: Main publication repository for the French academic environment (hal.archives-ouvertes.fr/)
· Systran: private company supplying language translation software (www.systran.fr/)
· TEI: Text Encoding Initiative of the major standardisation initiatives for the representation of textual documents (www.tei-c.org)
Laurent Romary: email@example.com
Posted by: Roberto Rosselli Del Turco (rosselli at ling dot unipi dot it)