Research at the Institute of Linguistics and Literatury Studies

Funding: DFG

The 19th century, in which our current system of scientific disciplinary cultures emerged, was an epoch of collecting and organizing bodies of knowledge, which often lay between these disciplines.

Criminal cases were used by law to learn the individual presentation of proofs for criminal proceedings, and by literature to learn the realistic art of narration for a modern mass audience. As the first globally comparative and also most comprehensive collection of criminal cases in the German-speaking world, the Neue Pitaval (1842-1890) played a decisive role in the emergence of a general legal consciousness. In order to reconstruct the inaugural discourse of this collection, it has to be analyzed in all its diversity as a corpus of 540 case studies.

Our project follows an approach based on historical research questions, which adapts the different methods of digital corpus analysis to the respective epistemic goal. Semantic progress analyses examine which topics were in vogue and when, and how the stories were put into perspective in terms of legal policy under changing circumstances. Narratological analyses get to the bottom of thenarrative patterns that are developed in this process. To distinguish between legal and literary modes of representation, digital methods are combined with varying degrees of context sensitivity.

In stylometric analyses, which operate at the level of sentences and words, different levels of abstraction can be analyzed algorithmically. Narrative analyses, on the other hand, require collaborative annotations of larger sections of text, making questions of automatability themselves appear as hermeneutic problems, which in turn give insight into the research object.

Between law and literature, the legal case study is a popular medium of knowledge in the 19th century, in which the normative orientations and the understanding of law of bourgeois society become observable. The genre poetics of this text type is therefore of particular importance. Thus, we will be employing comparative analyses to contrast the Pitaval with other contemporary corpora that can be described as its direct competitors in the media system: The crime stories in the family journal Die Gartenlaube and the crime novellas of the Deutscher Novellenschatz potentially entertained the same audience. Focusing on global and local thematic conjunctures in relation to the categorization of determined signal strengths in the classification, we will also be able to examine aspects of aesthetic response by including the analysis of affect strengths of the texts in our analyses.

On the technical basis of an advancement of the collaborative annotation tool WebAnno, this will also involve possibilities of crowd sourcing to not only achieve further text improvements, but also to motivate the contributors to provide annotations. Experience has shown that a key search criteria are often entities such as persons and place names, but also the coding of grammatical structures. The goal is to develop a high-quality body, both in scope and quality. The collaboration between the library and the institute is intended as a prototype of a collaborative structure that combines specifically scientific and interdisciplinary interests in a productive manner.

Project lead: Prof. Dr. Thomas Weitin

Funding: „LOEWE-Exploration“

German classes are supposed to inspire enthusiasm for literature and teach students to make independent judgments as well as to be empathetic. However, empirical evidence for such effects of literature is lacking. This project aims to examine whether there is a level of emotionality that is optimal for understanding literary texts. It combines emotion-oriented methods of text analysis with the measurement of emotional reactions during reading. The aim is to provide concrete guidance for teaching German at the upper secondary level.

For its bold approach, the TU project ”Evidence-based Literary Comprehension in German Classes“ will receive around 300,000 euros from the ”LOEWE-Exploration“ funding program. A total of four research teams at universities were selected by the Hessian Ministry of Science for unconventional innovative research. Funds totaling around one million euros are available for them.

The project combines emotion-oriented methods of text analysis with the measurement of emotional reactions during reading. For example, movements of the eye, activities of the brain and certain reactions of the body are measured. In addition, questionnaires will be used to record self-perceived emotionality during reading. The test subjects are high school students. For the analyses and experiments, contemporary literary prose from three thematic areas will be used: ecological crises and sustainability, human rights and international conflicts, and living contemporary history. Thus, according to the researchers, it is possible to check whether the requirements of teaching German are met using the example of a current reading. The aim is to provide concrete orientation aids for German lessons in the upper level of secondary school.

The Hessian Minister of Science, Angela Dorn, expressed her delight at the funding of four projects with ”LOEWE-Exploration“ The researchers were given ”the freedom to pursue new, highly innovative research ideas,“ she said: ”With up to 300,000 euros per project for up to two years, they can test an unconventional hypothesis, a radically new approach. Such freedom has become rare in research funding."

Project by: Prof. Dr. Thomas Weitin (TU Darmstadt), Ulrik Brandes (ETH Zurich)

Funding: Volkswagen Foundation

The 'Reading at Scale' project is based on the following approach: If hermeneutic and statistical methods each have their own strengths in detailed individual analysis and in dealing with large amounts of data, a mixed methods approach is better suited for the middle level than the two methods alone. Literary texts and text corpora allow analyzes at different levels of resolution – from the level of the characters in individual pieces of work to entire literatures, whereby Literary Studies and Literary History traditionally address many research questions at the middle level. The focus of our studies is a historical collection of 86 novellas, published under the title “Der deutsche Novellenschatz” (24 volumes, 1871-1876) by the editors Paul Heyse and Hermann Kurz. We already prepared this realism-oriented anthology as a TEI/XML corpus, and more similar collections will follow. As it is of medium size, the novella collection is still suitable for individual reading – but still has a promising size for statistical analysis. Our text corpus was examined in two dissertations at different levels of operationalization: (1) a network analysis addresses problems of distinction within popular literature while, (2) a comparative study examines the “deutscher Novellenschatz” as an effective instrument of canonization and as a programmatic attempt at non-narrative Literary History. Both project managers contribute individual studies from the perspective of basic methodological research, while an algorithmic subproject investigates concepts of position in network research – and a literary science subproject focuses on problems of validation in digital analysis.

Publications in the project context

• Brandes, Ulrik, Weitin, Thomas, Päpcke, Simon, Pupynina, Anastasia, Herget, Katharina (2019): Distance measures in a non-authorship context. The effect on the “Deutsche Novellenschatz” (forthcoming).

• Weitin, Thomas (2019): Burrows‘s Delta und Z-Score-Differenz im Netzwerkvergleich. Analysen zum Deutschen. Novellenschatz von Paul Heyse und Hermann Kurz (1871-1876), in: Digitale Literaturwissenschaft. Beiträge des DFG-Symposiums, Fotis Jannidis (ed.), Stuttgart (forthcoming).

• Weitin, Thomas (2017): (ed.): Scalable Reading. Zeitschrift für Literaturwissenschaft und Linguistik, 47.1.

Weitin, Thomas (2017): Literarische Heuristiken: Die Novelle des Realismus, in: Komplexität und Einfachheit. DFG-Symposion 2015, Albrecht Koschorke (ed.), Stuttgart, p. 422-442

• Weitin, Thomas, Herget, Katharina (2017): Falkentopics: Über einige Probleme beim Topic Modeling literarischer Texte, in: Zeitschrift für Literaturwissenschaft und Linguistik, 47.1, pp. 29–48.

• Weitin, Thomas (2016): Heuristik des Wartens. Literatur lesen unter dem Eindruck von big data, in: Warten als Kulturmuster, Julia Kerscher, Xenia Wotschal (ed.), Würzburg, pp. 180-196.

• Weitin, Thomas (2016): Selektion und Distinktion. Paul Heyses und Hermann Kurz ́Deutscher Novellenschatz als Archiv, Literaturgeschichte und Korpus, in: Archiv/Fiktionen. Verfahren des Archivierens in Literatur und Kultur des langen 19. Jahrhunderts, Daniela Gretz, Nicolas Pethes (ed.), Freiburg 2016, pp. 385-408.

• Weitin, Thomas, Gilli, Thomas, Kunkel, Nico (2016): Auslegen und Ausrechnen: Zum Verhältnis hermeneutischer und quantitativer Verfahren in den Literaturwissenschaften, in: Zeitschrift für Literaturwissenschaft und Linguistik, 46,1, pp. 103-115.

Corpora

• Weitin, Thomas (2016): Volldigitalisiertes Textkorpus. Der Deutsche Novellenschatz. Paul Heyse, Hermann Kurz (ed.), 24 volumes, 1871-1876. Darmstadt/Konstanz,

• Weitin, Thomas (2018): Volldigitalisiertes Textkorpus. Der Neue Deutsche Novellenschatz. Paul Heyse, Ludwig Laistner (ed.), 24 volumes, 1884-1887. Darmstadt, forthcoming.

Project lead: Prof. Dr. Thomas Weitin

Sustainable and high-quality digital applications and operationalization of (literary) questions are necessarily based on appropriate and stable corpora. Many canonized classics and works are now freely available on the Internet and can be freely downloaded from websites such as the Gutenberg DE project. From a philosophical and corpus critical perspective, however, these digital texts are often unreliable: sometimes, the underlying text source and editions are unmarked, and the files are often only available in simple txt format – without formatting or in-depth text labeling. The error rates of the OCR software used (i.e. programs for optical character recognition – for example, the machine-readable text from PDF files) vary widely, which in turn has significant influence on the quality of the corpora. Initiatives such as the German Text Archive are trying to counteract this trend by striving for a historical reference corpus based on strict guidelines and high quality standards (based on, among other aspects, the use of first editions).

At the same time, digital literary scholarship also aims to disassociate its analyzes and research subjects from the traditional canon of literature – or to extend it. Accordingly, the constant creation and expansion of literary corpora is an integral part of many research projects.

The corpus workflow using the example of the new “Novellenschatz”

In June 2015 – in the scope of the preparations for the workshop “Scalable Reading. Paul Heyses Deutscher Novellenschatz zwischen Einzeltext und Makroanalyse”, under the auspices of Thomas Weitin – the first TEI-XML-Corpus of the “Novellenschatz” was created: a historical collection of 86 novellas, published by Paul Heyse and Hermann Kurz (24 volumes, 1871-1876). This corpus was continually improved and enriched with metadata in order to allow for further research into the popular novella collection of the 19th century.

Since then, the corpus workflow has been continuously expanded and professionalized. The corpora are created by means of a corrected OCR method:

The digital representation of the text (typically in the form of PDF formats) is first converted into machine-readable text using the Abbyy FineReader software, which is particularly well-suited for reading Gothic typefaces. In a second step, the digitized text is then manually checked, corrected, and stored in TXT-format by specially trained assistants, and some corpora are also transferred into a TEI-compliant XML schema.

Other corpus-projects

In addition to the “Deutscher Novellenschatz”, the “Neuer Deutscher Novellenschatz” by Paul Heyse and Ludwig Laistner (70 novellas in 24 volumes, 1884-1887) was digitized and reviewed as well. In addition, we started to work on the last missing novella treasure, the “Novellenschatz des Auslandes”, which consists of 57 translated foreign novellas, also published by Paul Heyse and Hermann Kurz (14 volumes, 1872-1876). Thus, our novella corpus is almost complete and ready for analysis. Parallel to this, other historical sources are prepared and digitized as well, e.g. the extensive letter correspondence between Paul Heyse and Hermann Kurz (1858-1873, over 700 letters), which was created during the publication process of the “Novellenschatz” collection.

With “Der neue Pitaval”, edited by Julius Eduard Hitzig and Willibald Alexis (Wilhelm Häring) (60 volumes, 1842-1890), we also digitized “a collection of the most interesting international crime stories, from the past to the present”.

The resulting digital corpora are published in the “Deutsches Textarchiv”, made available for research in the sense of Open Access.

Project publications

Weitin, Thomas (2016). Volldigitalisiertes Textkorpus. Der Deutsche Novellenschatz. Paul Heyse, Hermann Kurz (ed.), 24 volumes, 1871-1876. Darmstadt/Konstanz.
Weitin, Thomas (2018). Volldigitalisiertes Textkorpus. Der Neue Deutsche Novellenschatz. Paul Heyse, Ludwig Laistner(ed.), 24 volumes, 1884-1887. Darmstadt (forthcoming).

Further links

Deutsches Textarchiv. Grundlage für ein Referenzkorpus der neuhochdeutschen Sprache. Berlin-Brandenburgische Akademie der Wissenschaften (ed.), Berlin 2019
Project Gutenberg. Project Gutenberg Literary Archive Foundation (ed.)
Projekt Gutenberg-DE. Hille & Partner GbR (ed.)

Research Projects at the Institute of Linguistics and Literary Studies

2023-2026: HERMES

2023 – 2026: Diskursraum Wald – zu Verständnis und Vermittlung von Waldnaturschutzmaßnahmen im Spannungsfeld von Klimawandel und Biodiversitätsverlust

2023-2026: forTEXT

2023-2026: PLANS

2022-2025: Individuelle Freiheit und soziale Norm – Nachhaltigkeits- und Verantwortungsdiskurse zu Umwelt und Bildung seit 1990

2022-2025: Zwischen Recht und Literatur: Die Kriminalfallsammlung des Neuen Pitaval in der literaturwissenschaftlichen Korpusanalyse

2022-2024: Evidence-based Literary Comprehension in German Classes

2021-2026: Text+

2021 – 2024: Wissenschaftliche Politikberatung zwischen epistemischer und legitimatorischer Funktion. Textprozeduren der Relevanz-, Zuständigkeits- und Verantwortungszuschreibung

2021-2023: Prinzipiengestützte Kategorienentwicklung für die Digital Humanities (KatKit)

2021-2023: Hessisches Zentrum für alltagsorientierte Sprachförderung (HeZaS)

2021 – 2022: Zwischen Elfenbeinturm und rauer See – zum prekären Verhältnis zwischen Wissenschaft und Politik und seiner Mediatisierung am Beispiel der “Corona-Krise”

2020-2024: Bücher auf Reisen. Informationstechnologische Erschließung von Wissensbewegungen in vormodernen Kulturen

2020 – 2024: Biodiversitätskulturen in Stadt und Land – Integrative Forschung zur Förderung der Insektenvielfalt auf Grünflächen (BioDivKultur)

2020 – 2023: Förderung der Textkompetenz von Nachwuchswissenschaftler_innen in den Naturwissenschaften

2020-2021: Binnenkritik und Dynamisierung der Aufklärung. Sammlung wissenschaftlicher Aufsätze aus mehr als drei Jahrzehnten Aufklärungs-Forschung.

2020-2021: Neuer wissenschaftlicher Kommentar zu Goethes Faust-Projekt.

2020-2021: Kritische Studienausgabe von Schillers Wilhelm Tell für den universitären Lehrbetrieb.

2019-2022: Relating the Unread

2019-2021: “A Darmstadt newspaper in three centuries” – Digitization of the Darmstädter Tagblatt (1740 – 1986)

2019-2021: CLARIAH-DE

2018-2021: Förderung der Textkompetenz von Nachwuchswissenschaftler_innen in den Naturwissenschaften

2017-2021: Relevanz von Bildungssozialisation sowie von Herkunfts- und Vorfremdsprachen für den Studieneinstiegserfolg bei geflüchteten Studieninteressierten (HMWK)

2017-2020: Reading at Scale: Mixed Methods in der literaturwissenschaftlichen Korpusanalyse

2017-2020: ‘Bye, bye Biene?‘ – Zur Funktionalisierung wissenschaftlichen Nichtwissens und Wissens im Pestizid-Diskurs

2017-2020: Humanist Computer Interaction on the test bench (Humanist)

2016-2019: (Erneute Förderung in 2021) Dhoch3 (DaF-Studienmodule – DAAD)

2016-2020: Cultural history of literature. Final phase, prepress preparation.

2016–2018: (paused due to parental leave) Digitalität in den Fachdidaktiken

2016-2018: GP01 Handschriften in Bewegung: Werkzeuge zur Dokumentation, Auswertung und Visualisierung texttopographischer Dynamiken

2015-now: Literaty text corpora

2015-now: EmpiriST 2015: GSCL Shared Task: Automatic Linguistic Annotation of Computer-Mediated Communication / Social Media

2015-2038: Altägyptische Kursivschriften: Digitale Paläographie und systematische Analyse des Hieratischen und der Kursivhieroglyphen

2015-2017: MASI – Metadata Management for Applied Sciences

2013-2017: Linguistic Strategies of Knowledge and Science Mediation in Text Types and Media Formats for Children

2013–2016: ePoetics – Korpuserschließung und Visualisierung deutschsprachiger Poetiken (1770-1960) für den “Algorithmic criticism”

2013-2016: Climate Engineering im Verhältnis von Wissenschaft und Politik: Kontroverse Deutungen wissenschaftlicher und politischer Verantwortung gegenüber der globalen Herausforderung Klimawandel

2013-2016: eCodicology – Algorithmen zum automatischen Tagging mittelalterlicher Handschriften

2012-2036: Digitales Familiennamenwörterbuch Deutschlands (DFD)

2012-2016: Sustainability and spatial constitution in urban discourse.

2011-2019: DARIAH-DE: Building Research Infrastructures for e-Humanities

2011-2013: Was können wir (nicht) wissen? Was sollen wir tun?‘ Vom Umgang der Wissenschaftler und Wissenschaftsjournalisten mit Nichtwissen und unsicherem Wissen in laienadressierten Texten

2011-2013: LOEWE-Schwerpunkt Digital Humanities

2010-2014: Virtuelles Skriptorium ST. Matthias

2009-2012: Wechselwirkungen zwischen linguistischen und bioinformatischen Verfahren, Methoden und Algorithmen Modellierung und Abbildung von Varianz in Sprache und Genomen

2009-2012: Grid für die Wissenschaft: WissGrid

2009-2011: Die diskursive Aushandlung von Transdisziplinarität. Projektkommunikation im Spannungsfeld von transdisziplinärem Anspruch und disziplinären Rahmenbedingungen

2008-now: linguisticsweb

2006-2015: (since 2016 part of DARIAH-DE) TextGrid: Virtuelle Forschungsumgebung in den e-Humanities