Pilot Projects 2021-2022

In Autumn 2021, The Centre for Digital Humanities Uppsala (CDHU) released a call for Pilot Project Support, an initiative that provides technical and methodological support, and computational space for research projects in the humanities and social sciences. It is connected to the digital research infrastructure mission of the CDHU.

Pilot Project Support is a recurring and periodical initiative open to researchers and staff in the humanities and social sciences at Uppsala University. The purpose and aim is to support the digital implementation of research projects in a general sense. Applications may concern new ideas or be spin-offs of existing projects. Project proposals that can demonstrate concrete outcomes or can demonstrate importance and reusability are especially welcome. Pilot projects aimed at larger external applications in the future are also welcome. 

Pilot projects that are granted support will have access to: (1) technical support in terms of human resources (research engineer time) that can provide support with methods, tools, data processing and implementation; and (2) computational space in the digital research infrastructure for storing and processing data that is under construction at CDHU. Before implementation CDHU staff will also support and facilitate the prototype design of the project.

Automatic identification of paradata

Isto Huvila--Department of ALM

Apart from the lack of information on what research data is about (that is usually described using metadata), there is an increased awareness that the lack of understanding regarding a) the contexts and processes of how research data was created and b) how it has been manipulated (i.e. data about processes, or paradata) poses a significant hindrance to its effective reuse.The aim of the project is to explore how different text and image analysis methods can be applicable for automatic identification and extraction of paradata (data about processes of creating, curating and using data) and structured information that can be used as paradata, from a selection of archaeological datasets. Instead of merely seeking answers to particular questions regarding specific datasets, the project investigates and develops techniques that can be broadly useful for paradata identification and summarisation across domains and research contexts to support researchers independent of discipline and research questions.

Among the Wattpad Readers

Tuva Haglund--Department of Literature

With oven eighty million monthly users Wattpad is one of the largest digital reading and writing communities. The way it combines social network structures with reading and storytelling makes it a unique environment for digital social reading.

The pilot “Among the Wattpad Readers” is a collaboration between Tuva Haglund from Department of Literature and research engineer Marie Dubremetz. It is part of a larger project in which Haglund will examine reading habits, attitudes and social networks among Swedish-speaking users on Wattpad.

In the pilot we are using computational data- and network analysis for the purpose of:

  • collect data on all texts and comments made by users who somewhere on Wattpad have written in Swedish.
  • distinguish user behaviours based on for example type of activity, frequency in adding comments and publishing own stories, language use, length of comments, etcetera.
  • show connections between different users, as a way to visualize the social network.

The algorithm will be written in R-language, by using Selenium.

This quantitative analysis will be used to identifying types of readers, groups of readers or forms of communication and creativity that are representative, and therefore suitable for qualitative analysis in a continuing of the project.

Attribution Study on Frances Brooke’s Eighteenth-Century Libretto Marian

Michaela Vance--Department of English

Networked Accessibility and Discoverability of Digital Cultural Heritage Data

Agnieszka Backman--Department of Scandinavian Languages

This pilot project will address pressing issues that are emerging in the world of cultural heritage data: its compilation, aggregation, display and uses. My entry point will be scraping web links in digital repositories with medieval manuscripts and using graph visualization to analyze the networks. The project will answer urgent problems: asking how much linking is being done presently in major repositories and investigating the quality of the networks. How is interconnectivity effected between metadata entries in catalogs, both in regard to internal and external links? And what kind of information is usually linked in the catalog records? How are the linking practices connected to discoverability?

The project will be utilizing two digital humanities methods, web scraping to gather data on linking in catalog records from three repositories containing medieval Swedish manuscripts, Alvin-portal.org, manuscripta.se and handrit.is, and network analysis to categorize the information networks the repositories represent while also studying what kind of information is usually linked.

The pilot is an offshoot of my Wallenberg Foundation funded postdoc project, Materialities of Medieval Manuscripts in Digital Repositories, where I’m studying the affordances, the potential use, of manuscripts in digital repositories in comparison to the available use cases when looking at them in a person in a reading room, using social semiotic methods.

Text mining commodification: the geography of the Nordic lingonberry rush, 1860-1910

Matti La Mela--Department of ALM; Department of Business Studies

The pilot project asks: how did Nordic nature become commodified in the late nineteenth century? The project studies the historical process of commodification by combining geographic information with the public perceptions about a natural resource, wild berries. The study is part of a larger project on the history of Nordic allemansrätten, which is a public access right to nature that is widely used in the Nordic countries today. The pilot project develops a method which classifies automatically messy texts with OCR errors according to their content. Source data are historical newspaper articles about berry picking (text in Finnish and Swedish) from 1860-1910. The place names in the articles have been identified previously (La Mela, Tamper & Kettunen 2019: Finding Nineteenth-century Berry Spots). The aim is to apply the classification method to the source material, and to find articles that describe economic-industrial processes related to the wild berries, and that portray berry-picking as a non-industrial, common activity. The hypothesis is that the share of the economic-industrial articles increased and that the article groups present two different geographic patterns, where it is for example possible to distinguish export destinations and harbor cities.

Hosting Gramfinder: A search tool for Grammatical Descriptions of the Languages of the World

Harald Hammarström--Department of Linguistics and Philology

Gramfinder is a search and browse tool to explore a comprehensive collection of digitized descriptions of the 7 000 minority languages of the world. At its minimum, Gramfinder is similar to Google Books (books.google.com}, i.e., free-text search with snippet hits from a large set of copyright-protected publications. However, thanks to extensive high-quality metadata, suitable searches can amount to building a cross-linguistic databases and be aided by low-level Machine Learning techniques. For example, since the (meta-)language [language used for description, such as English] and (target-)language [language being described, such as Wambaya] of each document is known in the metadata, the search for a string like "suffix" is half-way towards a database of which languages have suffixes and to what degree. Building such databases (semi-)automatically would be a great help in large-scale quantitative studies of linguistic diversity.

Metadata for African Ephemeral and Informal Literatures

Ashleigh Harris--Department of English

The production, circulation and reception of literary forms, such as poetry, fiction and plays, on the African continent largely occurs informally. This is because the published book is too expensive, is most likely written in non-African languages, and often lacks relevance for African readers. This has led to an increase in non-book literatures on the continent, some of which builds on existing pre-colonial literary and oral traditions (spoken word poetry, storytelling, and performed dramatic arts) and some of which explores new online forms and possibilities (such as twitterature, flash fiction, GIF-novels, FaceBook-fiction and online comics). Yet, literary metadata (everything from library catalogues to ISBN numbers) is still largely organized around the book object.

This project aims to create a robust and multilingual descriptive metadata standard for informal African literary forms. To do this, the Centre for the Digital Humanities will help us design and implement an online interface during the research phase, which will enable ground-up tagging of ephemeral literatures. Towards the end of the research phase, we aim to develop an online and mobile app tool which would enable layman entry of metadata in a pre-structured and multilingual format. Through Linked Open Data this information will become the first formalized, yet constantly evolving, space of metadata capture of these ephemeral literatures, enabling searchability and visibility to readers and to scholars.

