Department of German Studies – Digital Linguistics

Digital Linguistics deals with digital or digitized voice data – aiming to develop methods for collecting, processing, structuring, annotating, and analyzing speech in digital form.

About the Department

Head of Department: Prof. Dr. Marcus Müller

"You shall know a word by the company it keeps" (John R. Firth 1957: 11)

What is Digital Linguistics?

We are mainly concerned with digital corpora which can represent very different genera, e.g. newspaper texts, political debates, scientific essays, tweets, online forums, blogs, and much more.

To this end, we aim to develop and implement digital infrastructures that we can use to jointly perform digital analytics and compare intermediate results. Since our main area of work is Digital Discourse Analysis, our collaborative infrastructure is called Discourse Lab.

In this way, we analyze language at the level of individual words (lexis, word formation), phrases and sentences (syntax), as well as spoken statements (text and conversation). Our main interest is the relationship between language, knowledge, and society (as documented, for example, in Felder, Müller & Vogel 2012). Possible questions in this context are:

  • How does a scientific term emerge and how does it change over time?
  • What happens in interdisciplinary debates when researchers from different scientific disciplines use the same word – but linked to different concepts?
  • To what extent and by what linguistic means is a new technology defined as “risky” in different national discourses?
  • To what extent are different areas of knowledge such as economics, environment, or politics linked to the aspect of climate change in German and British media?
  • Why do journalists often use expressions of the type “thus also”, “but still” or “nevertheless” in the scope of media debates on bioethics?


In order to be able to process such linguistic questions by means of corpus analysis, we have to derive working hypotheses from them, which we can then test – since it is true what the corpus linguist Stefan Gries (2009: 1226) described impressively:

„… there are no meanings, no functions, no concepts in corpora – corpora are (usually text) files and all you can get out of such files is distributional (or quantitative ⁄ statistical) information.”

Since we are usually not interested in the distribution of expressions in data populations, but in the social sense of language, we have to derive – for each research question – a verifiable hypothesis about the statistical distribution of linguistic expressions in a corpus. This procedure is called “operationalization”. For this purpose, we set up e.g. annotation categories or schemas that can be used to capture various linguistic phenomena in texts and to categorize and analyze them according to specific criteria.

For example, it is possible to operationalize the question how valuations are expressed in specific texts by systematically sorting and distinguishing possible expressions of valuation, before assigning these categories to the respective passages in texts in order to analyze these enriched texts – for example with regard to the distribution and occurrence of different forms of valuation – and to interpret them with regard to the initial question.

Since research projects like this often require different skills (for example in the field of linguistics, computer science, statistics, sociology, political science, or philosophy), digital linguistics is a highly interdisciplinary field of work. For digital linguists, one of the most important qualities is therefore the ability to work in teams and the desire to always learn something new.

Therefore, collaboration is essential for us because we can bring together perspectives from different disciplines in mixed teams. While working on a subject, exchange between the researchers of different scientific fields is also of great importance for us because our work often consists of assigning linguistic expressions to certain characteristics – for example, we determine that a word is a noun or that a sentence is an argument. Such assignments are called annotations. Some annotations (such as categories of types of words) are usually done automatically, others are done manually before aiming at automation. In any case, it is important to ensure that such assignments are not simply based on an individual person's sense of language, but on consensus within a team. Such a consensus among annotators is called “inter-annotator agreement”.

Listed literature

  • Felder, Ekkehard, Marcus Müller & Friedemann Vogel (Hgg.) (2012): Korpuspragmatik. Thematische Korpora als Basis diskurslinguistischer Analysen von Texten und Gesprächen. Berlin / Boston: De Gruyter.
  • Gries, S. Th. (2009), What is Corpus Linguistics?. Language and Linguistics Compass, 3: 1225–1241. doi:10.1111/j.1749-818X.2009.00149.x, p. 1226.
  • Firth, John R. (1957): Papers in Linguistics (1934–1951). Oxford: University Press.

Selected publications of the subject

  • Bender, Michael (2016): Forschungsumgebungen in den Digital Humanities: Nutzerbedarf, Wissenstransfer, Textualität. Reihe: Sprache und Wissen (SuW) 22. Berlin, Boston: de Gruyter. 2016.
  • Harald Lordick, Rainer Becker, Michael Bender, Luise Borek, Canan Hastik, Thomas Kollatz, Beata Mache, Andrea Rapp, Ruth Reiche, Niels-Oliver Walkowski: Digitale Annotationen in der geisteswissenschaftlichen Praxis. In: Bibliothek Forschung und Praxis. Hrsg. Bonte, Achim et al. Band 40, Heft 2 (Juli 2016). Berlin, Boston: de Gruyter. S.186-199.
  • Marcus Müller (2015): Sprachliches Rollenverhalten: Korpuspragmatische Studien zu divergenten Kontextualisierungen in Mündlichkeit und Schriftlichkeit. Berlin / Boston: De Gruyter (Sprache und Wissen).
  • Andreas Lösch & Marcus Müller (Hgg.) (2014): Risikodiskurse/Diskursrisiken – Sprachliche Formierungen von Technologierisiken und ihre Folgen. Themenschwerpunkt (2/2014) der Zeitschrift „Technikfolgenabschätzung – Theorie und Praxis“ ( Karlsruhe: Institut für Technikfolgenabschätzung und Systemanalyse (ITAS).
  • Marcus Müller (2012): Vom Wort zur Gesellschaft: Kontexte in Korpora: Ein Beitrag zur Methodologie der Korpuspragmatik. In: Ekkehard Felder / Marcus Müller / Friedemann Vogel (Hgg.): Korpuspragmatik. Thematische Korpora als Basis diskurslinguistischer Analysen., S. 33–82.