Research Projects

Research Projects at the Department of Corpus and Computational Linguistics, English Philology

Project Duration: 2021 – 2024

Project Partners:

- Sabine Bartsch (Institute for Linguistics and Literary Studies, TU Darmstadt)

- Tobias Hecking (Institute for Software Technology, DLR)

- Wolfgang Stille (hessian.ai)

Project Team Members:

- Debajyoti Paul Chowdhury

- Changxu Duan

- Sherry Tan

- Elena Volkanovska

Development of a meta-methodology and a conceptual framework for transdisciplinary in-depth exploration and analysis of multimodal digital objects. Demonstrated through the use cases of AI and climate change discourses

Project Funding: BMBF (Federal Ministry of Education and Research) joint project under the guidelines for the funding of research and development projects aimed at the theoretical, methodological, and technical advancement of digital humanities, as announced in the Federal Gazette on July 22, 2019

Project Overview:

The goal of the overall project is the development and testing of a concept for the in-depth exploration of multimodal data collections. The foundation of this project lies in connecting different types of digital objects to enable genuine knowledge generation based on digital collections. It leverages current theories and methods from the humanities and information technology with the aim of transdisciplinary expansion and sharing of knowledge, which has previously been hindered by a lack of interconnection among collections and the absence of possibilities for enrichment through annotation and commentary.

To test the developed concepts, transdisciplinary multimodal corpora (TMK) will be created, manually and automatically annotated, interconnected, and analyzed for the exemplary use cases—discourses on climate change and artificial intelligence. These corpora will also be discussed and evaluated in expert workshops.

Objectives:

The analysis and provision of interconnected multimodal corpora aim to develop and test corpus-based and computational linguistic methods for the construction, annotation, and analysis of multimodal corpora. This involves preparing a corpus focused on two thematic areas in such a way that, through a combination of automatic and manual annotation methods, and the subsequent analysis of a corpus from two example domains—climate change and artificial intelligence—features can be identified that serve as the basis for the semantic interconnection of textual and intertextual linguistic and multimodally encoded concepts. This will expand the access possibilities to text and data corpora.

The developed corpus data and analysis scenarios will be tested and iteratively improved in expert workshops as well as in workshops for scientists and the interested public.

The project linguisticsweb.org addresses the development and creation of tutorials, how-tos, links, tools, and approaches to a corpus – focusing on research in the fields of Linguistics, Corpus and Computational Linguistics, and other digital philologies.

The aim of linguisticsweb.org is to support students and researchers in corpus- and computer-based research by providing materials and guidance for self-study and teaching, and to further the independent use of technologies and methods of Linguistics and other philological sciences.

The portal linguisticsweb-org is used by international researchers and teachers, in the fields of research and teaching as well as in workshops.

linguisticsweb.org was created as an independent online project in 2008-09, and it has been developed further ever since.

To the website

The goal of this shared task is was to encourage the developers of NLP applications to adapt their tools and resources for the processing of written German discourse in genres of computer-mediated communication (CMC). Examples for CMC genres are chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues.

Processing CMC discourse is a desideratum and a relevant task in different research fields and application contexts in the Digital Humanities – e.g.:

- in the context of building, processing and analyzing corpora of computer-mediated communication / social media (chat corpora, news corpora, whatsapp corpora, …)

- in the context of collecting, processing and analyzing large, genre-heterogenous web corpora as resources in the field of Language Technology / Data Mining

- in the context of dealing with CMC data in corpus-based analyses on contemporary written language, language variation and language change

- in all research fields beyond linguistics which address social, cultural and educational aspects of social media and CMC technologies using language data from CMC genres

The shared task consisted of two subtasks:

- Tokenization of CMC discourse

- Part-of-speech tagging of CMC discourse

The two subtasks made use of two different data sets:

- CMC data set: a selection of data from different CMC genres (social chat, professional chat, Wikipedia talk pages, blog comments, tweets, WhatsApp dialogues).

- Web corpora data set: a selection of data which represents written discourse from heterogenuous WWW genres. It consists of crawled websites including small portions of CMC discourse (e.g. webpages, blogs, news sites, blog commentary etc.).

Learn more

The LOEWE-Schwerpunkt Digital Humanities is a collaboration of the University of Frankfurt, the Technical University of Darmstadt, and the Freie Deutsche Hochstift / Goethe Museum Frankfurt. Objective: to connect basic research in the humanistic disciplines involved, focusing on information technology procedures.

LOEWE Schwerpunkt Digital Humanities – Integrated editing and evaluation of text-based corpora, co-applicant and PI in the project area “Contemporary Corpora”, January 2011 to December 2013

Partner: Prof. Dr. Iryna Gurevich, Prof. Dr. Gert Webelhuth, January 2011 to December 2013

Funded by the State of Hesse as part of the LOEWE initiative of excellence.

Funded by the Initiative of Excellence of the state of Hesse, LOEWE.

To the website

Within the scope of PACE – PARTNERS FOR THE ADVANCEMENT OF COLLABORATIVE ENGINEERING EDUCATION

Subproject: „Scientific and technical literacy – Untersuchungen natürlichsprachlicher Kommunikation in der kollaborativen Produktentwicklung“;

Partner: Prof. Dr.-Ing. Reiner Anderl und Prof. Dr. Elke Teich;

Funded by the Innovationsfonds des Landes Hessen 07.2004 – 01.2006