Research with Heritage Data: Slot 2
The second timeslot consisted of presentations by Norah Karrouche, Vincent Kuitenbrouwer and Henk van den Heuvel. Available abstracts, videos and slides of these presentations can be viewed on this page.
Grassroots heritage as research data
How can we make community oral histories tangible as sustainable and reusable data in large scale research infrastructures? And why would we want to do so in the first place? As a rule, community archives, including grassroots oral history collections, are not covered by large scale research infrastructures. At the same time, digitization has significantly impacted the ways in which oral histories are preserved and disseminated by archives, and community archives are by definition intent on sharing their material with new audiences. Moreover, archivists and digital humanists have in recent years heeded the call for the ‘decolonization’ and ‘inclusivity’ of their heritage and research collections and are seeking to engage with historically disenfranchised communities in novel and productive ways. Researchers, archives and grassroots organizations can thus potentially benefit from collaboration to address these mutual concerns. The Stories in Motion project in Rotterdam (part of the Living History Route of the Dutch National Research Agenda) explored such a collaboration and investigated the process of cocreating a protocol for collecting, archiving and reusing oral histories through standards for data interoperability for both local, grassroots communities and national research infrastructures. In a community based archiving project on local women’s history, oral historians closely collaborated with a community archiving initiative, a local grassroots heritage organization, the municipal archive, DANS and CLARIAH. In this talk, I problematize co-creation as a method to ‘decolonize’ oral history data that are available in the CLARIAH Media Suite and explore how power dynamics inherent to such collaborations can be effectively addressed.
Media War: Press, Radio and Propaganda in Nazi Occupied Netherlands, 1940-1945
During the Second World War the written press and radio were the two dominant news media in Nazi-occupied Netherlands. These two media were used by supporters of both the National Socialist regime (who could work in the open) and the Allied Forces (who worked clandestinely or from London). Both sides tried to influence Dutch public opinion and employed propaganda strategies in which they targeted the opponent and tried to combat or undermine each other’s claims on ‘reliable information’. Consequently, in addition to the war that was fought on the battlefields, there was a war in the media as well, about the ways people would perceive and interpret the course of the conflict. The project Media War analyzes the ways in which the propaganda strategies of the nazified media and the media of those resisting the occupation regime were entangled and aims to uncover the discursive dynamics of propaganda narratives in the Netherlands during the Second World War.
Media War builds on the existing literature on the Dutch media during the Nazi occupation, adding new perspectives. Up until now historians, using traditional research methods, have largely focused on the institutional histories of separate newspapers or radiostations in a national context. Our project for the first time uses digital sources – from the press and radio, and the two sides in the conflict – to provide a content analysis of propaganda in Dutch media during the Second World War. To achieve this we have curated relevant digital collections, drawn from the newspaper databank of the KB, national library of the Netherlands and the audiovisual archive of the Institute of Sound and Vision, in the CLARIAH Media Suite. In addition we have developed functionalities to visualize semantic patterns in various sections of the media landscape over time, to identify specific moments that give insight in the entanglements in the propaganda narratives.
Towards dialect-specific automatic speech recognizers
Henk van den Heuvel
In the second half of the 20th century, the Meertens Institute made recordings of various dialects throughout the Netherlands of which about three hundred hours were manually transcribed. This material seems to be the ideal basis for developing dialect-specific speech recognizers, but there are also significant challenges in processing this data. The transcripts are originally not linked with the audio files, let alone that they are aligned with the audio. Furthermore, the transcripts are written in a semi-conventional spelling that has been modified to reflect the pronunciation in the dialect. Unfortunately, because of this, the manual transcriptions are not always consistent and are more difficult to link to automatic transcriptions for alignment between audio and transcription. Most manual transcriptions and the audio files could be linked to each other via a metadata file via a python script. But then the alignment had to be done.
To make a first alignment approximation we used automatic speech recognition by a Dutch Wav2vec2 model in combination with the Needleman-Wunch algorithm. This algorithm approaches optimal alignment between two sequences in this case between the manual and automatic transcriptions. This alignment is checked with a web-based annotation tool developed for this purpose. With the results of this annotation, the speech recognizer can be improved for specific dialects. We show some initial results.
View the accompanying slides of this presentation here.