Research Projects at the Department of German Studies - Digital Literary Studies – Institute of Linguistics and Literary Studies

Project lead: Prof. Dr. Thomas Weitin

German classes are supposed to inspire enthusiasm for literature and teach students to make independent judgments as well as to be empathetic. However, empirical evidence for such effects of literature is lacking. This project aims to examine whether there is a level of emotionality that is optimal for understanding literary texts. It combines emotion-oriented methods of text analysis with the measurement of emotional reactions during reading. The aim is to provide concrete guidance for teaching German at the upper secondary level.

For its bold approach, the TU project ”Evidence-based Literary Comprehension in German Classes“ will receive around 300,000 euros from the ”LOEWE-Exploration“ funding program. A total of four research teams at universities were selected by the Hessian Ministry of Science for unconventional innovative research. Funds totaling around one million euros are available for them.

The project combines emotion-oriented methods of text analysis with the measurement of emotional reactions during reading. For example, movements of the eye, activities of the brain and certain reactions of the body are measured. In addition, questionnaires will be used to record self-perceived emotionality during reading. The test subjects are high school students. For the analyses and experiments, contemporary literary prose from three thematic areas will be used: ecological crises and sustainability, human rights and international conflicts, and living contemporary history. Thus, according to the researchers, it is possible to check whether the requirements of teaching German are met using the example of a current reading. The aim is to provide concrete orientation aids for German lessons in the upper level of secondary school.

The Hessian Minister of Science, Angela Dorn, expressed her delight at the funding of four projects with ”LOEWE-Exploration“ The researchers were given ”the freedom to pursue new, highly innovative research ideas,“ she said: ”With up to 300,000 euros per project for up to two years, they can test an unconventional hypothesis, a radically new approach. Such freedom has become rare in research funding."

Project lead: Thomas Weitin

Funding: DFG

The 19th century, in which our current system of scientific disciplinary cultures emerged, was an epoch of collecting and organizing bodies of knowledge, which often lay between these disciplines.

Criminal cases were used by law to learn the individual presentation of proofs for criminal proceedings, and by literature to learn the realistic art of narration for a modern mass audience. As the first globally comparative and also most comprehensive collection of criminal cases in the German-speaking world, the Neue Pitaval (1842-1890) played a decisive role in the emergence of a general legal consciousness. In order to reconstruct the inaugural discourse of this collection, it has to be analyzed in all its diversity as a corpus of 540 case studies. Our project follows an approach based on historical research questions, which adapts the different methods of digital corpus analysis to the respective epistemic goal. Semantic progress analyses examine which topics were in vogue and when, and how the stories were put into perspective in terms of legal policy under changing circumstances. Narratological analyses get to the bottom of thenarrative patterns that are developed in this process. To distinguish between legal and literary modes of representation, digital methods are combined with varying degrees of context sensitivity. In stylometric analyses, which operate at the level of sentences and words, different levels of abstraction can be analyzed algorithmically. Narrative analyses, on the other hand, require collaborative annotations of larger sections of text, making questions of automatability themselves appear as hermeneutic problems, which in turn give insight into the research object. Between law and literature, the legal case study is a popular medium of knowledge in the 19th century, in which the normative orientations and the understanding of law of bourgeois society become observable. The genre poetics of this text type is therefore of particular importance. Thus, we will be employing comparative analyses to contrast the Pitaval with other contemporary corpora that can be described as its direct competitors in the media system: The crime stories in the family journal Die Gartenlaube and the crime novellas of the Deutscher Novellenschatz potentially entertained the same audience. Focusing on global and local thematic conjunctures in relation to the categorization of determined signal strengths in the classification, we will also be able to examine aspects of aesthetic response by including the analysis of affect strengths of the texts in our analyses. On the technical basis of an advancement of the collaborative annotation tool WebAnno, this will also involve possibilities of crowd sourcing to not only achieve further text improvements, but also to motivate the contributors to provide annotations. Experience has shown that a key search criteria are often entities such as persons and place names, but also the coding of grammatical structures. The goal is to develop a high-quality body, both in scope and quality. The collaboration between the library and the institute is intended as a prototype of a collaborative structure that combines specifically scientific and interdisciplinary interests in a productive manner.

Project lead: Prof. Dr. Thomas Weitin (TU Darmstadt), Ulrik Brandes (ETH Zürich)

Funding: Volkswagen Stiftung

The 'Reading at Scale' project is based on the following approach: If hermeneutic and statistical methods each have their own strengths regarding detailed individual analyses and in dealing with large amounts of data, a mixed methods approach is better suited for the middle level than the two methods alone.

Literary texts and text corpora allow analyzes at different levels of resolution, from the level of letters/characters in individual works to entire literature collections, whereby many of the research questions in literary studies and literary history aim at the middle level. The focus of our studies is a historical collection of 86 novellas, published under the title “Der deutsche Novellenschatz” (24 volumes, 1871-1876) by the editors Paul Heyse and Hermann Kurz. We already transferred this realistic anthology into a TEI/XML corpus, and similar collections will follow. Thanks to its manageable size, the novella collection is still within the reach of individual reading, but still has a promising size for statistical analysis. Our text corpus is being investigated in the scope of three dissertation projects at different levels of operationalization: (1) a stylometric corpus analysis aims to investigate the problem of realistic style; (2) a network analysis deals with problems of distinction within popular literature; (3) A comparative study examines the “Deutsche Novellenschatz” as an effective instrument of canonization and as a programmatic attempt at non-narrative literary history. Further, the two project leaders aim to integrate the individual studies from the perspective of methodological basic research: an algorithmic subproject investigates concepts of position in network research, while a literary science subproject focuses on problems of validation in digital analysis.

Publications concerning the project

Brandes, Ulrik, Weitin, Thomas, Päpcke, Simon, Pupynina, Anastasia, Herget, Katharina (2019): Distance measures in a non-authorship context. The effect on the „Deutsche Novellenschatz“ (im Erscheinen).
Weitin, Thomas (2019): Burrows‘s Delta und Z-Score-Differenz im Netzwerkvergleich. Analysen zum Deutschen. Novellenschatz von Paul Heyse und Hermann Kurz (1871-1876), in: Digitale Literaturwissenschaft. Beiträge des DFG-Symposiums, hrsg. v. Fotis Jannidis, Stuttgart (im Erscheinen).
Weitin, Thomas (2017): (Hg.): Scalable Reading. Zeitschrift für Literaturwissenschaft und Linguistik, 47.1.
Weitin, Thomas (2017): Literarische Heuristiken: Die Novelle des Realismus, in: Komplexität und Einfachheit. DFG-Symposion 2015, hrsg. v. Albrecht Koschorke, Stuttgart, S. 422–442.
Weitin, Thomas, Herget, Katharina (2017): Falkentopics: Über einige Probleme beim Topic Modeling literarischer Texte, in: Zeitschrift für Literaturwissenschaft und Linguistik, 47.1, S. 29–48.
Weitin, Thomas (2016): Heuristik des Wartens. Literatur lesen unter dem Eindruck von big data, in: Warten als Kulturmuster, hrsg. v. Julia Kerscher, Xenia Wotschal, Würzburg, S. 180–196.
Weitin, Thomas (2016): Selektion und Distinktion. Paul Heyses und Hermann Kurz ́Deutscher Novellenschatz als Archiv, Literaturgeschichte und Korpus, in: Archiv/Fiktionen. Verfahren des Archivierens in Literatur und Kultur des langen 19. Jahrhunderts, hrsg. v. Daniela Gretz, Nicolas Pethes, Freiburg 2016, S. 385–408.
Weitin, Thomas, Gilli, Thomas, Kunkel, Nico (2016): Auslegen und Ausrechnen: Zum Verhältnis hermeneutischer und quantitativer Verfahren in den Literaturwissenschaften, in: Zeitschrift für Literaturwissenschaft und Linguistik, 46,1, S. 103–115.

Corpora

Weitin, Thomas (2016): Volldigitalisiertes Textkorpus. Der Deutsche Novellenschatz. Herausgegeben von Paul Heyse, Hermann Kurz. 24 Bände, 1871-1876. Darmstadt/Konstanz, http://www.deutschestextarchiv.de/doku/textquellen#novellenschatz.
Weitin, Thomas (2018): Volldigitalisiertes Textkorpus. Der Neue Deutsche Novellenschatz. Herausgegeben von Paul Heyse, Ludwig Laistner. 24 Bände, 1884-1887. Darmstadt, im Erscheinen.

More information

Project lead: Prof. Dr. Thomas Weitin

Literary corpus projects

Sustainable and high-quality digital applications and operationalization of (literary) questions are necessarily based on appropriate and stable corpora. Many canonized classics and works are now freely available on the Internet and can be freely downloaded from websites such as the Gutenberg DE project. From a philosophical and corpus critical perspective, however, these digital texts are often unreliable: sometimes, the underlying text source and editions are unmarked, and the files are often only available in simple txt format – without formatting or in-depth text labeling. The error rates of the OCR software used (i.e. programs for optical character recognition – for example, the machine-readable text from PDF files) vary widely, which in turn has significant influence on the quality of the corpora. Initiatives such as the German Text Archive are trying to counteract this trend by striving for a historical reference corpus based on strict guidelines and high quality standards (based on, among other aspects, the use of first editions).

At the same time, digital literary scholarship also aims to disassociate its analyzes and research subjects from the traditional canon of literature – or to extend it. Accordingly, the constant creation and expansion of literary corpora is an integral part of many research projects.

The corpus workflow using the example of the new “Novellenschatz”

In June 2015 – in the scope of the preparations for the workshop “Scalable Reading. Paul Heyses Deutscher Novellenschatz zwischen Einzeltext und Makroanalyse”, under the auspices of Thomas Weitin – the first TEI-XML-Corpus of the “Novellenschatz” was created: a historical collection of 86 novellas, published by Paul Heyse and Hermann Kurz (24 volumes, 1871-1876). This corpus was continually improved and enriched with metadata in order to allow for further research into the popular novella collection of the 19th century.

Since then, the corpus workflow has been continuously expanded and professionalized. The corpora are created by means of a corrected OCR method:

The digital representation of the text (typically in the form of PDF formats) is first converted into machine-readable text using the Abbyy FineReader software, which is particularly well-suited for reading Gothic typefaces. In a second step, the digitized text is then manually checked, corrected, and stored in TXT-format by specially trained assistants, and some corpora are also transferred into a TEI-compliant XML schema.

Other corpus projects

In addition to the “Deutscher Novellenschatz”, the “Neuer Deutscher Novellenschatz” by Paul Heyse and Ludwig Laistner (70 novellas in 24 volumes, 1884-1887) was digitized and reviewed as well. In addition, we started to work on the last missing novella treasure, the “Novellenschatz des Auslandes”, which consists of 57 translated foreign novellas, also published by Paul Heyse and Hermann Kurz (14 volumes, 1872-1876). Thus, our novella corpus is almost complete and ready for analysis. Parallel to this, other historical sources are prepared and digitized as well, e.g. the extensive letter correspondence between Paul Heyse and Hermann Kurz (1858-1873, over 700 letters), which was created during the publication process of the “Novellenschatz” collection.

With “Der neue Pitaval”, edited by Julius Eduard Hitzig and Willibald Alexis (Wilhelm Häring) (60 volumes, 1842-1890), we also digitized “a collection of the most interesting international crime stories, from the past to the present”.

The resulting digital corpora are published in the “Deutsches Textarchiv”, made available for research in the sense of Open Access.

Project publications

• Weitin, Thomas (2016). Volldigitalisiertes Textkorpus. Der Deutsche Novellenschatz. Paul Heyse, Hermann Kurz (ed.), 24 volumes, 1871-1876. Darmstadt/Konstanz.

http://www.deutschestextarchiv.de/doku/textquellen#novellenschatz

• Weitin, Thomas (2018). Volldigitalisiertes Textkorpus. Der Neue Deutsche Novellenschatz. Paul Heyse, Ludwig Laistner(ed.), 24 volumes, 1884-1887. Darmstadt (forthcoming).

Further links

• Deutsches Textarchiv. Grundlage für ein Referenzkorpus der neuhochdeutschen Sprache. Berlin-Brandenburgische Akademie der Wissenschaften (ed.), Berlin 2019 http://www.deutschestextarchiv.de

• Project Gutenberg. Project Gutenberg Literary Archive Foundation (ed.) www.gutenberg.org

• Projekt Gutenberg-DE. Hille & Partner GbR (ed.) http://gutenberg.spiegel.de

Research Projects