Venue: IST Austria, Am Campus 1, 3400 Klosterneuburg
Time: 4th of June 2014, 10:00 a.m. - 4 p.m. (ballroom)
Participants: 10 places are still available (first come, first serve)
Registration: send an email (incl. first name, surname, institution, email) asap but until 30/5/2014 to falk.reckling(at)

Workshop Description
The workshop will be suitable for anyone interested in biological science and not frightened of installing and running pre-prepared programs and data (following written guidance and with support from those present in the room). The aim is to introduce computational methods for processing scientific papers, enabling analysis of multiple papers in a rapid fashion. These techniques include how to download multiple files, extract concepts and facts from the literature and figures, using Natural Language Processing and Computer Vision.

Technical expertise required
Very little expertise is required beyond general use of a computer. Much more important is a willingness to learn and experiment. However we will ensure options are made available for those who are confident/technically able, including providing opportunities to develop their own tools for analysis.


  1. Overview on the concept of Open, licensing aspects, and etiquette.

  2. We shall then form into small groups, with a deliberate mixture of skills and work through the following prep-prepared tasks:
    • Using crawlers for a subset of the literature, creating a daily feed of literature.

    • Using scrapers to retrieve, fulltext, XML, epub and supplemental information files from this feed.

    • Use regular expressions to extract facts and concepts from these files.

    • Extract data from simple graphs, chemistry and phylogenetic trees - showing how this may be adapted to other areas of biology.

All software, materials and data are completely Open (Apache2, CC-BY, CC0).

We'll assume a basic competence:

  • know how to download programs

  • know how to manage directories (e.g. copy between them and delete and rename)

  • run from the commandline

  • edit text files (with a text editor, not Word)

Towards the end, if things go well, we'll suggest that small groups can try these on extended problems. This could be other papers (in the same journals), or if people know how to use packages such as R, iPython, that will help in analysing results. A typical task might be to find the number of species listed in a group of papers, and maybe link them to Wikipedia entries.Please bring your own laptop with you.Peter Murray-Rust's visit to Vienna starts with a lecture "Open Science. Realising the Value of Published Scientific Research"

