Research with Heritage Data: Slot 1
During the first slot of Research with Heritage Data, researchers Auke Rijpma & Ruben Schalk, Nicoline van der Sijs and Willemien Sanders spoke. The abstracts, videos and slides of these presentations can be found on this page.
Social and Economic History: examining the Spanish Flu using Linked Data and CLARIAH
Auke Rijpma & Ruben Schalk
Like many other disciplines, especially within the Humanities, historians tend to produce long tails of research data that live isolated in laptops and hard drives. Not only is this inefficient, it hampers reproducibility and replicability, and makes it difficult for others to expand on earlier work. In this presentation, Auke Rijpma and Ruben Schalk from Utrecht University will demonstrate how they have applied an open science workflow by integrating digital heritage datasets with CLARIAH tooling, to study the effects of the Spanish Flu pandemic in 1918 in The Netherlands. By using Linked Data infrastructure, death certificates were linked to historical classification schemes and municipalities, making it possible to examine where the Flu hit hardest. The first results were published interactively online as a data story. Auke and Ruben will discuss the pros, and also some cons, of using these methods for historical research.
Data, tools, and humans. Three challenges for data driven research with the Clariah Media Suite
Given the enormous amounts of metadata that accompany the content offered through the Clariah Media Suite, we are developing and facilitating data research and the publication of Media Suite Data Stories. Doing data research comes with a plethora of challenges of which we will discuss three main ones today, informed by our own experience in creating media Suite Data Stories. Data have an aura of objectivity, straightforwardness and unity, but in reality are often incomplete, unbalanced and heterogeneous. Tools promise effective analysis but may well not be up to a task previously done by humans. And humans may think they understand results, but without knowledge of the domain, interpretations fail to provide proper meaning. As a consequence, researchers need to make sure to critically scrutinize their data, their tools and the domain they are researching.
Nicoline van der Sijs
Over the past few years, volunteers have transcribed the 17th-century newspapers which the KB, national library of the Netherlands makes available at Delpher.nl. Article segmentation and metadata were added and corrected. The Dutch Language Institute took care of the post processing, linguistic annotation and made the corpus available online. The current CourantenCorpus contains 19 million words. The data from this corpus are reliable and therefore suitable for statistical research. Cleaning up the metadata - a mega-job - allows the data to be sorted and filtered in various ways: for instance, changes in word usage and language use over the course of the 17th century can be made transparent, and differences in the language use of various newspaper titles or genres (such as news items, official announcements, advertisements) can be investigated. The first part of this presentation discusses the operations that we have performed on the data, the second part of this presentation shows the first research results that the CourantenCorpus produces.