Workshop on open data and language processing technologies: An opportunity not to be missed


Date: 5th October 2016, 15:30-19:30.

Locationr: Spain, Madrid, Conventions center IFEMA NORTE, Room F.


Bring together in a workshop relevant experts in different sides of this polyhedral issue to share and discuss among them and with the audience their different but revealing views and experiences in a collective effort to shed new and enriched light on it.

The workshop will be articulated in several sections where different faces of the issue will be addressed in turn. Each section will be followed by an open discussion where the experts, and the audience, will have the opportunity to exchange, clarify, expand and build on the ideas already on the floor.

In the last section the experts will summarise their recommendations for the future.

The workshop will be preceded by a short presentation of the issue and the speakers, setting the stage for the incoming debate, and end with a wrap up session that will try to collect the main syntheses achieved during the workshop.

Draft Agenda

  1. Setting the stage (Presentation): 10’
  2. Mayor challenges: societal, economic, legal and technical (Panel): 40’
  3. Open Discussion: 25’
  4. Best practices (Panel): 50’
  5. (break: 20’)
  6. Open Discussion: 20’
  7. Policies (Panel): 20’
  8. Open Discussion: 20’
  9. What’s next (Panel): 35’
  10. Conclusions and Final messages (Presentation): 5’


Mayor Challenges

Asunción Gómez

  • Vice-Chancellor for Research at the Technical University of Madrid (UPM), director of the Ontology Engineering Group and leader of ODI Madrid.
  • Títle: The conjunction of Open Data and Language Technologies. Reflections and experiences.
  • Abstract: As an outstanding international expert in Open Data and Language Technologies, Asunción Gómez will share with us well informed reflections on its challenges and opportunities from different angles (technical, economic, societal, legal), illustrated with numerous use cases in which she has been directly involved.

Fernando Ramos, Félix del Valle y Ignacio Miró- Charbonnier

  • Publiodc-UPM research group members, Complutense Univertsity of Madrid.
  • Títle: Research on legal challenges of the conjunction of Open Data and Language Technologies.
  • Abstract: Publidoc-UPM research group has the task of researching, particularising and systematising the legal framework for the implementation of the Spanish Plan to foster Language Technologies, facing their challenges, and contribute to devise the best strategy to facilitate the conjunction of Open Data and Language Technologies and the sustainability models for Open resources for Language Technologies.

Gema Ramírez

  • CEO of Prompsit Language Engineering.
  • Títle: Economic challenges of the conjunction of Open Data and Language Technologies: Prompsit, a case of sucess.
  • Abstract: Prompsit is a language technology provider with a strong focus on tailored machine translation services and multilingual applications based on open-source software and data. It is a very successful example. From this rich experience she will address the economic challenges of combining Open Data with Language Technologies.

Blanca Rodríguez

  • LT_Observatory project, Zabala Innovation Consulting.
  • Títle: Language Resources and Open Data. LT_Observatory.
  • Abstract: : Open data and Language Resources (LRs). Challenges and opportunities from and industrial point of view based on stakeholders dialogue and LR collection carried out in the LT observatory European project.

Best Practices

Bente Maegaard

  • University of Copenhagen Centre for Language Technology Copenhagen, Denmark.
  • Títle: Common Language Resources and Technology Infrastructure: CLARIN.
  • Abstract: CLARIN provides easy and sustainable access to digital language data (in written, spoken, or multimodal form) for scholars in the social sciences and humanities. CLARIN also offers advanced tools to discover, explore, exploit, annotate, analyse or combine such data sets, wherever they are located. For this, CLARIN is building a networked federation of language data repositories, service centres and knowledge centres, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work.

Iván Vladimir Meza

  • Researcher at Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM).
  • Títle: Spanish speech as open data.
  • Abstract: It is recent in the human history that we have been able to record sounds and speech. Speech recordings are a window to the linguistic knowledge that never before we had, it gives us a glance to the purest form of language: how we sound, how we use those sounds, how we construct utterances, how we affect the meaning of those utterances with inflections of in our speech. All these aspects are available to us to study not in a specific time but now to be study from now on by future generations. Additionally, the potential of speech to enhance the interaction with technology is endless, right now speech recognition is a reality, speech synthesis is an standard in industry. All these progress had its cornerstone on speech as data. Collections of speech are used to train models and to advance the technology. These situations, speech as linguistic knowledge and speech as data, makes speech an invaluable resource for the current generation and its technology and the future generations to get to know us as never before. In this talk we will explore the current state of speech as open data for Spanish, the global efforts to preserve the language, the efforts that hold open speech, and the current platforms that hold the greatest collection of speech and what we can do to open them. Finally, we will present our experience to collect speech and the related practices.

Martin Krallinger

  • Spanish National Cancer Research Centre (CNIO) main researcher.
  • Títle:OpenMinted: an open, service-oriented ep-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content.
  • Abstract: Recent years witness an upsurge in the quantities of digital research data, offering new insights and opportunities for improved understanding. Text and data mining is emerging as a powerful tool for harnessing the power of structured and unstructured content and data, by analysing them at multiple levels and in several dimensions to discover hidden and new knowledge. However, text mining solutions are not easy to discover and use, nor are they easily combinable by end users. OpenMinTeD aspires to enable the creation of an infrastructure that fosters and facilitates the use of text mining technologies in the scientific publications world, builds on existing text mining tools and platforms, and renders them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively. It supports training of text mining users and developers alike and demonstrates the merits of the approach through several use cases identified by scholars and experts from different scientific areas, ranging from generic scholarly communication to literature related to life sciences, food and agriculture, and social sciences and humanities. Through its infrastructural activities, OpenMinTeD’s vision is to make operational a virtuous cycle in which a) primary content is accessed through standardised interfaces and access rules b) by well-documented and easily discoverable text mining services that process, analyse, and annotate text c) to identify patterns and extract new meaningful actionable knowledge, which will be used d) for structuring, indexing, and searching content and, in tandem, e) acting as new knowledge useful to draw new relations between content items and firing a new mining cycle.

Antonia Ferrer Sapena y Tony Hernández

  • MAREDATA research group members.
  • Títle: Challenges and opportunities to open access to scientific documents and data.
  • Abstract: Since the “Republic of Letters”, Science flourished in the open exchange of ideas and data. However, today this open exchange faces major societal, economic, legal and technical challenges. At the same time, new opportunities arise, as those offered by Language Technologies.

Jorge Gracia

  • Post-doctoral researcher at the Artificial Intelligence Department at Universidad Politécnica de Madrid (Spain). Member of the Open Knowledge Foundation’s Working Group on Open Data in Linguistics.
  • Títle: Towards a cloud of linguistic linked open data.
  • Abstract: The benefits of sharing linguistic information as linked data (LD) on the Web have been recently recognised by the language resources community. As a result of interlinking multilingual and open language resources, the Linguistic Linked Open Data (LLOD) cloud is emerging, that is, a new linguistic ecosystem based on the LD principles that will allow the open exploitation of such data at global scale. This talk will treat about the emergence of the LLOD cloud, as well as about the set of guidelines and best practises built by the community to support such an initiative.


Márta Nagy-Rothengass

  • Head of Unit “Data Value Chain“, European Commission.
  • Títle: Open Data and Language Technologies. The European Commission view. The CEF Automated Translation platform use case.
  • Abstract: In the framework of the Digital Single Market strategy, the goal is to put into motion a virtuous cycle of data. This includes a Multilingual Open Data Infrastructure.A relevant example of the value of Open Data for Language Technologies is the translation memory of European Commission Directorate-General for Translation (, which the most downloaded dataset at European Union Open Data Portal.CEF.AT is an automated translation platform that will make European public online services multilingual to make public digital services equally usable by all EU users, irrespective of their working language and language skills, and to facilitate cross-border information exchange in public administration.

David Pérez

  • Adviser to the Secretary of State for Telecommunications and Information Society, Spain.
  • Títle: Spanish Plan to foster Language Technologies.
  • Abstract: The Spanish Plan to foster Language Technologies aims to develop the language processing and automated translation sector in Spain. It arises from an assessment that can be summarise as follow: high potential for growth and development, unique opportunity and resources available but scattered. It comprises a broad spectrum of measures. Among them is the generation, standardisation and dissemination of open language resources suit for language technologies mainly, but not exclusively, in the framework of the reuse of Public Sector Information.