30 November 2023
Utrecht

CLARIAH Annual Conference 2023

During CLARIAH’s annual conference, we want to reflect on our achievements and share our lessons learned, from the last years of the project. But more importantly, we want to look into the future.

Date and time

Thursday 30 November 2023, 09.00 - 18.00 hrs.

Location

Utrecht, Paushuize Kromme Nieuwegracht 49, 3512 HE Utrecht

CLARIAH Conference 2023 Programme overview

MORNING PROGRAMME

Introduction by chairperson Jim Jansen, editor-in-chief of the New Scientist, coordinator of the Parool newspaper science pages and column writer for the Algemeen Dagblad.

CLARIAH: past, present and how we shape our digital humanities future

Opening presentation by Dirk van Miert, Principal Investigator CLARIAH and Director of the KNAW Huygens Institute.

Roadmap for the CLARIAH infrastructure

Presentation of elements of the CLARIAH infrastructure, following the FAIR principles and shared development roadmap, by Roeland Ordelman, CLARIAH Chief Technology Officer and Technology Innovation Advisor at the Dutch Institute for Sound and Vision.

What do we expect from a digital humanities infrastructure?

A critical reflection from the perspective of education

By Susan Aasman, Professor of Digital Humanities and Director of the Centre for Digital Humanities, University of Groningen.

What do we expect from a digital humanities infrastructure?

A critical reflection from the perspective of research and development

By Joris van Eijnatten, General Director of the eScience Center and Professor of Digital History at Utrecht University.

Panel discussion

Question time and open discussion between our first speakers, additional panel member Julia Noordegraaf, Professor of Digital Heritage at University of Amsterdam and Vice President of the international Time Machine Organisation, and… you!

AFTERNOON PROGRAMME

There is no need to register for specific programme sections, but there will be a limit to the number of attendees, depending on the hall size.

A FAIRy Chain of CLARIAH Research Tools: Media Suite Use Cases

Taking scholarly use cases in the context of the CLARIAH Media Suite as examples, we showcase in this workshop how the CLARIAH facilitates scholars to work with data sets and tools using different services and tools available in the CLARIAH infrastructure such as data registers, virtual research environments, and specialised services for annotation, secure analysis and the publication of data stories.

Type: Workshop

Duration: 90 minutes

Hall: Beelaerts van Bloklandzaal

Presented by: Roeland Ordelman, University of Twente | Beeld & Geluid

Time: 13.00 - 14.30 hrs.

Searching, Bookmarking and Annotating in the Media Suite

The Media Suite infrastructure gives university-level access to a large variety of digital collections – comprising key broadcast, film, paper and oral history collections from NISV, Eye Filmmuseum, the KB and DANS. Moreover the environment offers advanced tools for browsing, searching and annotating the collections made developed specifically for the environment. This workshop introduces you to searching, bookmarking and annotating in the Media Suite using the infrastructure’s tools.

Type: Workshop

Duration: 90 minutes

Hall: Beelaerts van Bloklandzaal

Presented by: Christian Olesen, University of Amsterdam

Time: 15.00 - 16.30 hrs.

Humanities Data Fit for Reuse

Can data generated in the humanities be reused for new research projects? We will explore this question together. First, we examine how data reuse is supported by the DANS Data Station Social Sciences and Humanities, a domain specific research data publishing service for the humanities. Then three brief project presentations illustrate how reuse of data in the humanities could be advanced, with an emphasis on data curation and modelling: DigiDuRe (https://research-software-directory.org/projects/digidure), which datafies biografical information on early modern reformed ministers collected over a few decades; FAIR Photos (https://zenodo.org/records/8096991), which turns the descriptions of 2 million press photos dating from the second half of the 20th century into linked data; and ‘The Connector’, a mechanism for creating a distributed network of linked data across data silos, recently developed during the NDE Hackalod (https://netwerkdigitaalerfgoed.nl/activiteiten/hackalod). To conclude, we involve the audience in a discussion, kickstarted by two experienced digital humanists, on how the humanities should stimulate reuse: what needs to be built, organised and/or subsidised, both in infrastructure and in the working methods of humanists themselves, to increase reusability of research data in the humanities?

Type: Presentations and discussion

Duration: 90 minutes

Hall: Balzaal

Presented by: Carmen van den Bergh, Leiden University; Ivo Zandhuis, International Institute of Social History; Jetze Touber, DANS; Leon van Wissen, University of Amsterdam; Maurice de Kleijn, eScience Center; Nico Vriend , Noord-Hollands Archief; Rick Mourits, International Institute of Social History; Steven Claeyssens, KB - National Library of the Netherlands; Wouter Beek, Triply, VU University Amsterdam.

Time: 13.00 - 14.30 hrs.

Navigating ChatGPT: A Primer on Functionality, Safety, and Academic Use

Discover the essentials of AI in education with this introductory workshop. Learn how ChatGPT functions, methods to detect AI-generated text, and explore the recent advancements in local Large Language Models (LLMs). We will also delve into practical academic applications, demonstrating how to integrate AI tools responsibly into scholarly work. This workshop is ideal for those relatively new to AI, providing a solid foundation for using these technologies in an academic or educational setting.

Type: Presentation

Duration: 45 minutes

Hall: Balzaal

Presented by: Aron van de Pol, Leiden University

Time: 15.00 - 15.45 hrs.

ChatGPT Scales Up Computational Literary Analysis

Understanding narratives is a human skill that engages multiple mental processes. Examples are the representations that we form about the personality of characters, the network of their actions, and the geography in which they live. Many of these aspects can be modelled also with the tools of computational linguistics, to uncover textual patterns and stylistic nuances automatically. In my talk, I show that such an endeavour can be boosted by recent language technologies. Specifically, I present a methodology to create literary resources that are fully-annotated by ChatGPT, and that allow to investigate how narratives unfold throughout books.

As part of CLARIAH Work Package 6, I introduce CLAUSE-ATLAS, the first corpus created with this methodology. CLAUSE-ATLAS contains novels annotated at the level of textual clauses, in terms of sequences of eventive, subjective and contextual information. Based on its analysis, I show that ChatGPT constitutes a promising tool to annotate large amounts of data for literary studies, in sizes beyond what can be easily analysed by humans. Further, I demonstrate that ChatGPT captures the novels’ structural information in a reliable manner, as its annotations reflect meaningful narrative patterns within books, as well as qualitative differences between them.

Type: Presentation

Duration: 30 minutes

Hall: Balzaal

Presented by: Enrica Troiano, VU University Amsterdam

Time: 15.45 - 16.30 hrs.

From Data Stories to Data Storytelling: a conceptual perspective on future data representation

Linked open data applications are far from enabling everything we would like, but we would like to give organisations a glimpse of the future. What do the applications of the future look like? We do so using the idea of concept cars - very cool perspectives on how applications could work in the future. We’ll share our concept of Amsterdam Diaries for Amsterdam Time Machine and explain how we move from data stories to data storytelling.

Type: Presentation

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Ilse Rombout, Stadsarchief Amsterdam

Time: 13.00 - 13.30 hrs.

The design process of creating a data story

Crafting a data story from the ground up can be challenging. In this session, Stefan (information designer and data visualization specialist) addresses the initial hurdles of where to begin and crucial considerations before the start of such a project. Drawing from his recent collaboration with Het Geheugen van Nederland on post-World War II migration from the Netherlands, Stefan introduces a design process framework. This framework, discussed as a case study, can be used to enhance workflow during the initial stages of the project, but also helps to shape the final outcome of the storytelling process.

Type: Presentation

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Stefan Pullen, Vizard Design

Time: 13.30 - 14.00 hrs.

SPARQL'ing Diamonds: a story-driven exploration of the ANDB/ADB datasets

"As part of the Research Master/PhD course ‘Data Management for Historians’ at the N.W. Posthumus Institute, we (two ReMa students, Radboud University), had our first experience working with Linked Data. More specifically, we engaged with the ANDB (Algemene Nederlandse Diamantbewerkers Bond) and ADB (Algemene Diamantbewerkersbond van België) datasets. As an explorative study we created an online, interactive ‘data story’, in which – through the perspective of a single person – we aimed to tell a broader story about the early 20th century diamond industry of Amsterdam. ‘Data stories’ such as these allow for an approachable insight into vast and complex datasets, both for laypersons and scientists alike. In our session we will elaborate on the creative process behind our own data story, on the possibilities of linked data for projects such as these, and on the challenges that one can encounter along the way.

Type: Workshop

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Jochem Kruit, Radboud University; Wieke Metzlar, Radboud University.

Time: 14.00 - 14.30 hrs.

Text-Fabric: how to do text research in a FAIR way

Text is one of the simplest and most common data types in computer science. But there is a lot in text that does not meet the eye, and so people have been annotating texts, century-by-century. When you research texts, you consume and produce such annotations. Suddenly you find yourself in the midst of a big fabric of thoughts, contributed by many authors. Text-Fabric is a tool that helps you to follow the threads that came before you and to weave a few of your own and add them to the scholarly record. I'll show you how that looks for clay tablets of the Uruk period (the oldest writing on earth), the much more recent Hebrew Bible, and the ultramodern General Missives of the VOC time.

Type: Presentation

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Dirk Roorda, KNAW Humanities Cluster

Time: 15.00 - 15.30 hrs.

Knowledge Graphs as Art Worlds - Working towards digital-driven large-scale provenance research

The most prominent debate in ethnographic museums today revolves around questions of ownership and provenance: how and why did objects move and end up in museum collections – and should they remain where they are today? The advent of data- and AI-driven forms of analysis enables researchers to study these processes of movement and collecting at unprecedentedly large scales. In this presentation, I’ll speak about how I plan to use Network Analysis and Knowledge Graphs to move the scale of ethnographic object provenance research from the local and incidental, to the global and structural, paying special attention to how DH techniques enable us to ask new questions and how we can embed these new questions in established art historical/anthropological thinking and theory.

Type: Presentation

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Martin Berger, Leiden University

Time: 15.30 - 16.00 hrs.

Virtual game: Through the looking glass

Who hasn't dreamt of stepping into a painting and taking a stroll through its surroundings? This dream now comes a little closer. Artificial intelligence and crowd-sourced data from Wikidata were used to explore historical artworks and cityscapes and vanished buildings, such as the Maria Church in Utrecht, and bring them to life in a 3D world. Using AI, the artworks were given depth and landmarks such as well-known buildings were georeferenced, annotated and linked. The result is the virtual game Through the Looking Glass in which you can explore the works of Saftleven, Saenredam and Jan de Beijer. The game was winner of the jury prize HackaLOD 2023.

In addition, there will be a short demo of a tool 'The Linker' developed during HackaLOD 2023 by team Per Lod ad Astra. This will be done using an example with the Gouda Time Machine where linked material from IISG can be enriched by the volunteers of the Gouda Time Machine.

Type: Presentation

Duration: 30 minutes

Hall: Luxembourgzaal

Presented by: Shannon van Muijden, Zuiderzeemuseum; Rick Companje, Het Utrechts Archief; Bob Coret, KB - National Library.

Time: 16.00 - 16.30 hrs.

Building a collection of open educational resources about digital scholarship in the humanities

The primary goal of the community is to develop a collection and a platform which enables lecturers and students to find relevant learning materials on digital scholarship within the humanities. The aim is additionally to promote the reuse of learning materials among lecturers and across institutions. We will start off from the Edusources platform of SURF.

Type: Workshop

Duration: 45 minutes

Hall: Soetesalon

Presented by: Ferdinand Harmsen, Leiden University; Peter Verhaar, Leiden University

Time: 13.00 - 13.45 and 13.45 - 14.30 hrs (two identical sessions)

Workshop Responsible XR

Immersive reality or XR (eXtended Reality) is a term that encompasses all immersive technologies that expand our perception of reality, such as Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR). VR creates a completely computer-generated environment that replaces the real world. What are the possibilities, opportunities, applications, challenges and obstacles of eXtended Reality for education? Who are engaged in this? Where are we now and where are we going?

Type: Workshop

Duration: 45 minutes

Hall: Soetesalon

Presented by: John Walker, SURF; Liselore Tissen, CLARIAH - KNAW Humanities Cluster

Time: 15.00 - 15.45 and 15.45 - 16.30 hrs. (two identical sessions)

Meet the Netherlands eScience Center

Join the Netherlands eScience Center for a session where you will meet our Research Software Engineers (RSEs) and all the people in the organization who contribute to our pivotal mission: empowering academic researchers by developing and refining digital tools in collaboration with them. We are organizing two rounds of presentations, showcasing our diverse expertise across different research disciplines, technologies, and programming languages. The first round of presentations will focus on the type of support, the digital tools and knowledge we offer; the second round will focus on a sample of our projects in the Humanities and the research questions and digital solutions we work with.

Following each round of presentations, we will transition into speed-dating sessions. Here, you will have the opportunity to brainstorm your current research projects or future ideas with us, ideate new digital solutions, or simply learn more about our tools and how to apply them to your research. All the code and software we develop, in collaboration with researchers, is open source and published in our Research Software Directory. Come and join us to learn more about it!

Type: Presentations & Speed-dates

Duration: 90 minutes per session

Hall: Cardinal de Bouillonzaal

Presented by: Valentina Azzarra, eScience Center & team

Time: 13.00 - 14.30 and 15.00 - 16.30 (two different sessions)

POSTER SESSIONS

All posters and stands will be setup in the ground floor Salons at the start of the conference. Notes on the posters will show at what times their presenters are present between 13.00 and 16.30 hrs.

DANS Data Stations: Using Dataverse to Support the Dutch FAR Data Landscape

DANS is the Dutch national centre of expertise and repository for research data. In the years 2022-2023 we are migrating a large part of our archiving services from our own custom-built system, EASY, to domain-specific Data Stations based on Dataverse software. This transition facilitates the development of a more articulated, layered portfolio of archiving services. EASY used to be a one-size-fits-all solution for archiving research data of all disciplines, of any size or shape. The new Dataverse installations allow us to offer discipline-specific services, extending the archival function with additional services for processing data. At the core of this layered setup is ‘the Vault’, which provides for long-term storage of the data. Dataverse instances are built on top of the Vault, creating one Data Station for each of the four main domains which DANS caters to: Archaeology; Social Sciences & Humanities; Life, Health & Medical Sciences; and Physical & Technical Sciences. These Data Stations provide functionalities tailored to the needs of their respective communities. Through work in different projects the Data Stations are connected to external tools and services relevant for particular domains. This poster will illustrate the structure and dynamics of our layered data services built using Dataverse software and extended by additional elements, developed in projects focusing on the needs of a specific scientific domain.

Presented by: Ricarda Braukmann, DANS

The Paramaribo Ward Registers: Time Machine of a Colonial City

Between 1828-1846, the colonial government of Paramaribo, Suriname, kept a detailed register of the inhabitants of the city. These sources have recently been transcribed but the transcriptions are not well accessible. In this project, the data will be harmonized and made available as a FAIR database that can be used for both scholarly research as well as by a wider public for family history. The data will be deposited at the IISH Dataverse and hosted by the National Archives of Suriname. The registers contain rich information, with names, age, occupation, ethnicity and religion of (free) inhabitants, as well as the number of enslaved people living on an address. Therefore, such a database will function as a metaphorical time machine of a mid-nineteenth colonial city, showing residential patterns changing through time, how people moved through the city. Most importantly, it will provide a powerful tool for researching family history and genealogy.

Presented by: Thunnis van Oort, Radboud University

‘Tracing Wealth’: individual level micro-data on Dutch inherited wealth

The Memories Database contains detailed information on the end-of-life portfolios for a representative sample of people who passed away in the Netherlands in 1921. This project aims to link the so-called Memories Database with the civil registry datasets available in CLARIAH and LINKS. By integrating the Memories Database into the CLARIAH infrastructure, ‘Tracing Wealth’ will address a significant gap in CLARIAH’s data coverage, which otherwise contains a large set of socioeconomic variables but lacks wealth-information. Conversely, it enriches the Memories Database by providing more socioeconomic details on the individual level. The integrated Memories Database resulting from the ‘Tracing Wealth’ project can be used by historians, social scientists, and economists, as well as in citizen science, to answer questions related to intergenerational wealth patterns. ‘Tracing Wealth’ will also facilitate further data extensions allowing future researchers to answer other questions that lie at the frontier of current research in economics and the broader social sciences.

Presented by: Amaury de Vicq, University of Groningen; Ruben Peeters, University of Antwerp

Making Photos FAIR: Transforming a collection of two million historical press photos into five star data

The collection of Fotopersbureau De Boer (Noord-Hollands Archief) is particularly notable for its abundance of subjects and size. It contains valuable material for research on current topics such as environment, energy, and social inequality, and provides a glimpse into the everyday lives of people. Although the collection is accessible at the archive's website, it cannot easily be used due to its lack of standardisation and the omission of external references. This project will enrich the metadata of the collection by linking them to thesauri of locations, persons, and keywords in order to further open up the collection for use in research and the cultural heritage sector. This newly added information will be stored back into the collection management system of the archive for sustainable storage. When completed, the project delivers a download of all the curated and enriched metadata as spreadsheet (CSV), as well as Linked Open Data (RDF).

Presented by: Leon van Wissen, University of Amsterdam; Nico Vriend, Noord-Hollands Archief

Digitising a Dutch Historical Newspaper Corpus: Discovering the Best Possible Strategies for Datafication and Publication

We report on an improved workflow for the digitisation and datafication of a challenging collection of Dutch 17th century newspapers carried out by a group of volunteers. The new workflow uses state-of-the-art OCR in the Transkribus platform. The accuracy of different options for the OCR processing is evaluated.

Presented by: Ruud de Jong, Instituut voor de Nederlandse Taal; Jesse de Does, Instituut voor de Nederlandse Taal.

Toward a Dutch public value-driven large language model

TNO, NFI and SURF will take the next step in terms of language model development. The GPT-NL project will strengthen and perpetuate digital sovereignty by facilitating training of an open, Dutch-centric foundation model and making a non-profit deployment platform available. Do want to join? Please do!

Presented by: Matthieu Laneuville, SURF; Thomas van Osch, SURF; Annette Langedijk, SURF.

CLARIAH WP3 Demos: Natural Language Processing tools

Demo session of various NLP tools developed over the years in CLARIAH WP3 and predecessors. We show tools like Frog, ucto, FoLiA, Colibri Core that have a long history and have been maintained in CLARIAH throughout the years, as well as newer tools like Analiticcl and STAM. The demos are mostly aimed at technical researchers and developers.

Presented by: Maarten van Gompel, CLARIAH - KNAW Humanities Cluster & Radboud University

Thematic DCC Social Sciences & Humanities stand

Presented by: Nils Arlinghaus, SURF

Download the CLARIAH Conference 2023 Programme overview

CLARIAH Annual Conference 2023

MORNING PROGRAMME

CLARIAH: past, present and how we shape our digital humanities future

Roadmap for the CLARIAH infrastructure

What do we expect from a digital humanities infrastructure?

What do we expect from a digital humanities infrastructure?

Panel discussion

AFTERNOON PROGRAMME

A FAIRy Chain of CLARIAH Research Tools: Media Suite Use Cases

Searching, Bookmarking and Annotating in the Media Suite

Humanities Data Fit for Reuse

Navigating ChatGPT: A Primer on Functionality, Safety, and Academic Use

ChatGPT Scales Up Computational Literary Analysis

From Data Stories to Data Storytelling: a conceptual perspective on future data representation

The design process of creating a data story

SPARQL'ing Diamonds: a story-driven exploration of the ANDB/ADB datasets

Text-Fabric: how to do text research in a FAIR way

Knowledge Graphs as Art Worlds - Working towards digital-driven large-scale provenance research

Virtual game: Through the looking glass

Building a collection of open educational resources about digital scholarship in the humanities

Workshop Responsible XR

Meet the Netherlands eScience Center

POSTER SESSIONS

DANS Data Stations: Using Dataverse to Support the Dutch FAR Data Landscape

The Paramaribo Ward Registers: Time Machine of a Colonial City

‘Tracing Wealth’: individual level micro-data on Dutch inherited wealth

Making Photos FAIR: Transforming a collection of two million historical press photos into five star data

Digitising a Dutch Historical Newspaper Corpus: Discovering the Best Possible Strategies for Datafication and Publication

Toward a Dutch public value-driven large language model

CLARIAH WP3 Demos: Natural Language Processing tools

Thematic DCC Social Sciences & Humanities stand

Stay up to date with our Newsletter