Press [ esc ] or close+
The CLARIN workflow in the EOSC Portal

From language data to insight: the CLARIN demonstrator

 

This use case explains how the integration of the CLARIN infrastructure into the EOSC portal can facilitate the study of language data and how the Portal itself can support the Social Sciences and Humanities (SSH) community at large in the future.
 
CLARIN is the European Research Infrastructure providing access to language resources and tools for researchers that work with language data in the form of text, speech and mixed modalities.

The problem 

Human language is ambiguous and often complex to interpret. One sentence can have multiple meanings. A great source of wordplays and also a great source of confusion both for humans, and even more for machines. Textual and spoken data constitute a key source for humanities and social science researchers in Europe. Historians, economists, linguists, philosophers, anthropologists all rely on language material as fertile substrate for their research.

They require advanced computer systems to assist with the analysis process, to address the following issues:
 
(1)   Every language has its own specificities and thus requires specialised analysis software.
 
(2)   While there are many language data collections and processing tools, it is difficult to find out which tool is best suited for a specific data set and task.
 
(3)   With the increasing computational requirements and complexity of language analysis algorithms, local processing has become very difficult.
 
Transforming the language into real, directly usable research data, requires deep insight in the linguistic content, e.g. via dictionaries, grammars, speech and language models.

The research question

How the EOSC portal can support a political scientist who studies the use of nouns by female and male members of parliament – to find out whether there is a difference in the topics brought forward by both groups.

The solution 

Through the EOSC portal, the scientist searches for language analysis tools and discovers the CLARIN Language Resource Switchboard. With this tool you can be guided automatically towards an application that can help to analyse a specific language data set – making the idea of actionable data a reality.  It avoids time-consuming manual searching for the right application.

The CLARIN Language Resource Switchboard is fully integrated with B2DROP, a trusted solution to store and exchange data, where the scientist has uploaded a selection of debate transcripts. From here, he can directly invoke the Switchboard – taking away the need to upload the data again separately and allowing easy collaborative editing of the data before the analysis.

Once in the Switchboard, he chooses a Stylometry application to perform a comparative analysis. The results of this analysis can be accessed directly, in tabular form, or can be easily visualised in different ways.

Using these results, the researcher can conclude that indeed there is a significant difference in the topics that the female MPs are addressing. They are talking more than their male colleagues on topics like healthcare and family structures.

These outcomes can in turn be published with B2SHARE. This makes the results discoverable and also allows easy access, re-use of the data and stimulates replication.

The EOSC portal can support the over 550.000 humanities and social science researchers in Europe with similar automated and high-quality analysis of language data, especially by:

  1. Providing easy access. If any authentication is required, federated login can be used.
  2. Bridging the gap between researchers and the data sets they are interested in:
  • Through the data discovery platforms that are integrated into the EOSC portal (e.g. the CLARIN Virtual Language Observatory and EUDAT’s B2FIND), data can be found within seconds.
  • This data is also actionable, matching processing applications can be suggested and invoked right away.

     3. Supplying powerful processing tools:

  • User-friendly interfaces enable access to a fast analysis.
  • All the necessary language know-how is included in the tools.
  • Insightful visualisations help scholars to better understand – and to share – the outcomes of the research.
  • The processing is well documented and can be repeated easily. This advances the replicability of the research steps in the humanities and social sciences.

Watch the video of the CLARIN demonstrator.

Learn more about the use case in the CLARIN Portal.

The use case 'From language data to insight: the CLARIN demonstrator' was presented by Maciej Ogrodniczuk, assistant professor at the Department of Artificial Intelligence, Institute of Computer Science, Polish Academy of Sciences on the EOSC Launch Event, 23 November, 2018, Vienna, Austria.


The EOSC portal has been jointly developed and maintained by the eInfraCentral, EOSC-hub, EOSCpilot and OpenAIRE-Advance projects funded by the European Union’s Horizon 2020 research and innovation programme with contribution of the European Commission.