Data + Language Processing Technologies Impact Session


Date: 6th October 2016, 16:45-17:45.

Location: Spain, Madrid, Conventions center IFEMA NORTE, Room B.


Bring together some relevant views and experiences on Open Data for Machine Translation and Natural Language Processing.

The experts will present their views and use cases of Open Data and Language Technologies in turn, followed by an open discussion. A moderator will introduce the experts and draw some conclusions at the end.

Session schedule

  1. Introduction (Moderator): 5’
  2. Views and use cases (Panel of experts): 32’ (7 minutes each speaker).
  3. Open Discussion: 15’
  4. Conclusions and final messages (Moderator): 5’


Márta Nagy-Rothengass

  • “Data policy & Innovation” Head of Unit, European Commission.
  • Title: Open Data and Language Technologies. European Commission view. CEF Automated Translation platform use case.
  • Abstract: In the framework of the Digital Single Market strategy, the goal is to put into motion a virtuous cycle of data. This includes a Multilingual Open Data Infrastructure. A relevant example of the value of Open Data for Language Technologies is the translation memory of European Commission Directorate-General for Translation (, which the most downloaded dataset at European Union Open Data Portal. CEF.AT is an automated translation platform that will make European public online services multilingual to make public digital services equally usable by all EU users, irrespective of their working language and language skills, and to facilitate cross-border information exchange in public administration.

Bente Maegaard

  • University of Copenhagen Centre for Language Technology Copenhagen, Denmark.
  • Title: Common Language Resources and Technology Infrastructure: CLARIN.
  • Abstract: CLARIN provides easy and sustainable access to digital language data (in written, spoken, or multimodal form) for scholars in the social sciences and humanities. CLARIN also offers advanced tools to discover, explore, exploit, annotate, analyse or combine such data sets, wherever they are located. For this, CLARIN is building a networked federation of language data repositories, service centres and knowledge centres, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work.

Martin Krallinger

  • Spanish National Cancer Research Centre (CNIO) main researcher.
  • Title: OpenMinted: an open, service-oriented ep-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content.
  • Abstract:Recent years witness an upsurge in the quantities of digital research data, offering new insights and opportunities for improved understanding. Text and data mining is emerging as a powerful tool for harnessing the power of structured and unstructured content and data, by analysing them at multiple levels and in several dimensions to discover hidden and new knowledge. However, text mining solutions are not easy to discover and use, nor are they easily combinable by end users. OpenMinTeD aspires to enable the creation of an infrastructure that fosters and facilitates the use of text mining technologies in the scientific publications world, builds on existing text mining tools and platforms, and renders them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively. It supports training of text mining users and developers alike and demonstrates the merits of the approach through several use cases identified by scholars and experts from different scientific areas, ranging from generic scholarly communication to literature related to life sciences, food and agriculture, and social sciences and humanities. Through its infrastructural activities, OpenMinTeD’s vision is to make operational a virtuous cycle in which a) primary content is accessed through standardised interfaces and access rules b) by well-documented and easily discoverable text mining services that process, analyse, and annotate text c) to identify patterns and extract new meaningful actionable knowledge, which will be used d) for structuring, indexing, and searching content and, in tandem, e) acting as new knowledge useful to draw new relations between content items and firing a new mining cycle.

David Pérez

  • Adviser to the Secretary of State for Telecommunications and Information Society, Spain.
  • Title: Spanish Plan to foster Language Technologies.
  • Abstract:The Spanish Plan to foster Language Technologies aims to develop the language processing and automated translation sector in Spain. It arises from an assessment that can be summarise as follow: high potential for growth and development, unique opportunity and resources available but scattered. It comprises a broad spectrum of measures. Among them is the generation, standardisation and dissemination of open language resources suit for language technologies mainly, but not exclusively, in the framework of the reuse of Public Sector Information.

More information

For any information regarding this event, please contact: