Audiovisual Data And Digital Scholarship: Towards Multimodal Literacy

Logo AVinDHA summary and a reflection after the workshop at the Digital Humanities conference in Krakow (July 12-15, 2016)

By Liliana Melgar Estrada

The second version of the workshop “Audiovisual Data And Digital Scholarship: Towards Multimodal Literacy” (AVinDH workshop) took place during the Digital Humanities conference in Krakow which finished July 16.
Digital Humanities is the annual international conference of the Alliance of Digital Humanities Organizations (ADHO). In its 28th edition, the Jagiellonian University and the Pedagogical University warmly welcomed 902 people from all over the world.

The AvinDH workshop had a total of 55 participants, a keynote, 8 papers, and 10 lightning talks discussing the subject of using audio-visual media in the context of digital humanities scholarship.


The AVinDH workshop is a follow-up to the first edition held at the 2014 DH Conference in Lausanne, which led the basis for creating the Special Interest Group AVinDH at the next DH conference in Sydney in july 2015 (SIG-AVinDH). This group was initiated by researchers from the Erasmus Studio based at the Erasmus University in Rotterdam, and from the Netherlands Institute for Sound and Vision. The aim of the interest group is to create “a venue for exchanging knowledge, expertise, methods and tools by scholars who make use of audiovisual data types that can convey a certain level of narrativity: spoken audio, video and/or (moving) images.”(see website)

The workshop

The session opened with an introduction by Stef Scagliola, historian specialized in opening up audiovisual archives for multidisciplinary research, with an emphasis on oral history collections, and one of the founders of the special interest group. Scagliola introduced the main questions motivating the creation of the SIG-AVinDH and the workshop. A central issue is how audio-visual (AV) sources differ from textual sources, and/or how the ways of indexing or accessing AV materials, currently mainly via textual representations, have implications for research practices. Scagliola also summarized the scholarly process, and presented the status of current information systems support for each part of that process, highlighting the limitations to the “analysis” part of it.


Claire ClivazThe workshop continued with a keynote by Claire Clivaz, head of Digital Enhanced Learning at the Swiss Institute of Bioinformatics of Lausanne, a specialist in the field of the New Testament manuscripts and textual criticism. From her experience in textual based scholarship and her knowledge of current digital technologies, her presentation, entitled “Images, Sound, Writing in Western: a long hatred-love story?”, discussed the issues related to the validity and acceptance of AV sources in fields that are traditionally based on texts.

Based on several examples from biblical, literary, and art studies, Clivaz explains how scholarship, and our relationship to culture, is being transformed by “the emergence of a multimodal digital culture” in which text, images and sounds are intertwined. She also concludes that the well known principles for persuasion in rethorics - logos, pathos and ethos - will become more dominant due to transition from textual to multimodal communication. She invited the audience to consider the way in which they could apply multimodal approaches to scholarly publications.

Clivaz’ keynote was followed by three paper sessions:

  1. Models for training digital humanists in accessing and analyzing audiovisual collections
  2. Analysis and discovery models for audiovisual materials
  3. Copyright and Sustainability

1. First session

Clara HendersonIn the first session, chaired by Clara Henderson (Indiana University), two presentations described the use of AV materials and tools in training students. The presentation by Michaël Bourgatte (Catholic University of Paris), “When video annotation supports audiovisual education,” described his experience as a teacher using the open source video annotation software developed with the IRI (a research and innovative lab based in the Centre Pompidou): Lignes de Temps (which translates to “Timelines” in French). Bourgatte used this tool in the classroom, for introducing both children in the Paris suburbs, high-school students, and master students to the basis of film analysis and media literacy, which would enable them to critically judge the films/media they watch. Next, an educational project with bachelor students in media studies was presented by Jasmijn van Gorp & Rosita Kieweik (Utrecht University).

In their presentation, “What’s Not in the Archive: Teaching Television History in the ‘Digital Humanities’ Era”, they explained different strategies to engage the students of the “Television History Online” course with the use of archival materials, in order to let them build their understanding of the implications of using institutional collections and access tools, as well as online video platforms such as YouTube by reflecting critically on their selection processes and on how canons are built. Students were challenged to take informed decisions and play an active role in explaining them when their selections were influenced or impeded by access problems associated to copyright.

2. Second session

Martijn KleppeIn the second paper session, chaired by Martijn Kleppe (National Library of the Netherlands), four papers described current projects attempting to facilitate access to AV collections by different means. The presentation by Taylor Arnold and Lauren Tilton (Yale University) showed the use of computational and statistical methods for studying a large photographic corpus, the FSA-OWI Photographic Archive, a collection of over 170,000 photographs taken by the United States Government between 1935 and 1945. Tilton presented a demo of “Photogrammar,” a web-based platform for organizing, searching, and visualizing the large the FSA-OWI photographic collection, as well as their current data experiments and tools.

Next, Andrek Ibrus’ (Tallinn University) presentation, "Metadata as a ‘cultural modeling system’: A new rationale to study audiovisual heritage metadata systems”, described a four-year research project that studies the evolution of AV heritage metadata in Estonia, and their uses and effects to cultural memory formation. This project presents a similar critical approach to the archival practices and systems that shape audiovisual heritage, as in the previous experience described by van Gorp and Kieweik. The next two presentations focused on the processes and models of scholarly annotation of time-based media.

Melgar and Koolen, on behalf of the other authors, introduced "A conceptual model for the annotation of audiovisual heritage in a media studies context,” which is part of the current work in the context of CLARIAH-media studies in the creation of a user space, where scholars can access AV collections, and manually or semi-automatically annotate and enrich them. The presentation included both a conceptual model of the annotation phenomena (understood in a broader sense), and a process model of scholarly annotation in the framework of research stages in media studies.

mepTo conclude the session, Professor Mark Williams (Darthmouth College) presented "The Media Ecology Project: Developing New Tools for Semantic Annotation of Moving Images”, one of the most important ongoing endeavors in supporting scholarly work in film and media studies within a collaborative approach between archives and the scholarly community, and also between scholars, who can collaboratively perform close reading of their sources using different integrated platforms integrated in the Media Ecology Project. These platforms include Mediathread, a classroom platform developed at Columbia University; Scalar, a digital publishing platform developed at The University of Southern California; and, a new online tool which was developed for MEP and will facilitate the creation of controlled vocabularies that can be assigned to online media files, and the Semantic Annotation Tool (SAT), a tool currently in development at MEP.

3. Third session

Johan OomenThe third paper session, on copyright and sustainability, chaired by Johan Oomen, included a presentation by Simone Schroff (Institute for Information Law, University of Amsterdam), “Licensing audio- visual archives from a copyright perspective: between assumptions and empirical evidence”, who described in detail the factors that archives have to take into account when they intend to open their archives for online research or educational use. The presenter clearly introduced the basics of the intrinsically complicated landscape of copyright and industry practices, and pointed to interesting, less difficult directions, based on her empirical study of the contractual copyright arrangements of several public service broadcasters in the Netherlands between 1951- 2010.

Next, Inna Kizhner (Siberian Federal University Krasnoyarsk & University College London), on behalf of the other authors, presented "Licensing Images from Russian Museums for an Academic Project within Russian Legislation”, an empirical study about the actual willingness and possibilities of collaboration between musea and academic projects in online curated environments in Russia, showing the complications of dealing with legislation and museum policies in practice.

Lightning talks

The workshop included a lively session of “lightning talks”, where participants could shortly, and enthusiastically, present an idea or ongoing project to the audience. The pitch presentations included topics such as current projects that support annotation for scholarly and educational projects in different domains: EVIA (for ethnographic research), Scalar (for digital publishing), and Memorekall (for web videos in education). Projects related to saving sounds (the British Library Save Our Sounds Project), music (Restoring Early Musical Voices of India), Youtube videos (reconstructing abandoned personal YouTube collections), and performing arts in Japan (the Japanese Performing Arts Resource Center project) had also a 5-minute slot in the workshop.

There was also an enthusiastic invitation to include games with a purpose for annotating videos (which has already been explored in previous projects), a current scholarly project to study “the expressive body” within the context of the Media Ecology Project, and a report of ongoing work within CLARIAH on visualizing missing data in collections.


stef scagliolaThe workshop concluded with a summary presentation by Stef Scagliola, who revisited the initial questions. Scagliola concluded that the disciplines which are mostly concerned with AV media and multimodality are growing, which requires the need for an increasing need for scholars in incorporating other skills and critical perspectives into the production of scholarly knowledge.

The second edition of the AVinDH workshop, confirmed its importance and good reception by the scholarly community. Future editions will be also the occasion for bridging the gap between current progress on content-based video retrieval (as described for instance in Huurnink et al., 2012) and scholarly practices which need to rely on access and annotation of AV (and time-based) media.

Likewise, this venue also offers the opportunity to create links with other communities who are investigating how crowdsourcing and nichesourcing of time-based sources (as shown in the work by Gligorov et al., 2011; Oomen et al., 2014, Melgar et al., 2015) could be used to increase access to audiovisual archives. Simultaneously, other groups are developing tools for "close reading" of AV sources in scholarly domains (KWALON, organizer of the forthcoming conference on qualitative data analysis software), which seem to be quite isolated from the previous developments, and could find a space here to be discussed.

One challenging task for the workshop and interest group will be to strengthen the links with other venues where the disciplines that, by definition, are focused on the analysis of AV media (e.g., film/cinema/television studies or art history) are reflecting on the impact of the digital turn on their practices. In this case, the workshop presents an opportunity for discussing the common issues to these traditionally AV-oriented disciplines, and the methodological implications for other disciplines which have not traditionally been attached to the audio-visual message. Sharing their perspectives can bring new insights to the scholarly work in the context of multimodal research (and education), and to share best practices related to the challenges of analyzing and using audiovisual data in the context of digital humanities scholarship.

Workshop’s website
Collaborative minutes



Gligorov, R., Hildebrand, M., van Ossenbruggen, J., Schreiber, G., & Aroyo, L. (2011). On the role of user-generated metadata in audio visual collections (pp. 145–152). Presented at the K-CAP ’11, New York, NY, USA: ACM.

Huurnink, B., Snoek, C. G. M., de Rijke, M., & Smeulders, A. W. M. (2012). Content-Based Analysis Improves Audiovisual Archive Retrieval. IEEE Transactions on Multimedia, 14(4), 1166–1178.

KWALON. Reflecting on the future of QDA Software: Chances and Challenges for Humanities, Social Sciences and beyond.

Melgar Estrada, L., Hildebrand, M., de Boer, V., & van Ossenbruggen, J. (2016). Time-based tags for fiction movies: comparing experts to novices using a video labeling game. Journal of the Association for Information Science and Technology,

Oomen, J., Gligorov, R., & Hildebrand, M. (2014). Waisda?: making videos findable through crowdsourced annotations. In M. Ridge (Ed.), Crowdsourcing our Cultural Heritage (pp. 161–184). Ashgate Publishing, Ltd.