This use case explains how the integration of the CLARIN infrastructure into the EOSC portal can facilitate the study of language data and how the Portal itself can support the Social Sciences and Humanities (SSH) community at large in the future.
CLARIN is the European Research Infrastructure providing access to language resources and tools for researchers that work with language data in the form of text, speech and mixed modalities.
Human language is ambiguous and often complex to interpret. One sentence can have multiple meanings. A great source of wordplays and also a great source of confusion both for humans, and even more for machines. Textual and spoken data constitute a key source for humanities and social science researchers in Europe. Historians, economists, linguists, philosophers, anthropologists all rely on language material as fertile substrate for their research.
They require advanced computer systems to assist with the analysis process, to address the following issues:
(1) Every language has its own specificities and thus requires specialised analysis software.
(2) While there are many language data collections and processing tools, it is difficult to find out which tool is best suited for a specific data set and task.
(3) With the increasing computational requirements and complexity of language analysis algorithms, local processing has become very difficult.
Transforming the language into real, directly usable research data, requires deep insight in the linguistic content, e.g. via dictionaries, grammars, speech and language models.
The research question
How the EOSC portal can support a political scientist who studies the use of nouns by female and male members of parliament – to find out whether there is a difference in the topics brought forward by both groups.
Through the EOSC portal, the scientist searches for language analysis tools and discovers the CLARIN Language Resource Switchboard. With this tool you can be guided automatically towards an application that can help to analyse a specific language data set – making the idea of actionable data a reality. It avoids time-consuming manual searching for the right application.
The CLARIN Language Resource Switchboard is fully integrated with B2DROP, a trusted solution to store and exchange data, where the scientist has uploaded a selection of debate transcripts. From here, he can directly invoke the Switchboard – taking away the need to upload the data again separately and allowing easy collaborative editing of the data before the analysis.
Once in the Switchboard, he chooses a Stylometry application to perform a comparative analysis. The results of this analysis can be accessed directly, in tabular form, or can be easily visualised in different ways.
Using these results, the researcher can conclude that indeed there is a significant difference in the topics that the female MPs are addressing. They are talking more than their male colleagues on topics like healthcare and family structures.
These outcomes can in turn be published with B2SHARE. This makes the results discoverable and also allows easy access, re-use of the data and stimulates replication.
The EOSC portal can support the over 550.000 humanities and social science researchers in Europe with similar automated and high-quality analysis of language data, especially by:
3. Supplying powerful processing tools:
Learn more about the use case in the CLARIN Portal.
The use case 'From language data to insight: the CLARIN demonstrator' was presented by Maciej Ogrodniczuk, assistant professor at the Department of Artificial Intelligence, Institute of Computer Science, Polish Academy of Sciences on the EOSC Launch Event, 23 November, 2018, Vienna, Austria.