VAINT (video interoperability interest group), second meeting (July 13, 2019)

By Marijn Koolen and Liliana Melgar

The second meeting of the video interoperability interest group (VAINT), supported by the Dutch digital infrastructure CLARIAH, took place on July 13, 2019 in Amsterdam. Read our blog post to find out what this group is working on to make it possible for researchers to exchange their annotations across video annotation tools.

Film, television, oral historians, and other scholars who use audio-visual media in their research are increasingly integrating annotation tools in their analyses. Previous surveys of existing AV annotation tools have found more than fifty of those tools. Each tool has their own strengths and limitations: one video analysis tool may be good at automatically segmenting a video by detecting boundaries between shots, which saves a lot of time compared to manual segmentation, and allows the user to describe what happens in each shot. This may be useful, for example, for a film scholar doing aesthetic analyses based on shots. Another tool might be good at facilitating the manual or semi-automatic transcription of the audio speech; another tool may allow the user to add metadata to their selected video segments based on properties that are relevant to their research, to build a database that can be queried in the synthesis phase when the scholar is writing an article or book. This may be useful, for example, for a historian studying television programs showing Dutch migrants moving to Australia, for a project on Dutch-Australian migration, or for an oral historian studying immigrants’ life stories, interested in analyzing what people in a video are saying and which emotions they are expressing when telling their experiences on a certain topic.

Because using a single tool may influence or limit the analyses, an ideal situation is to be able to combine the affordances of different tools, exploiting the combination of their strengths. For making this possible, those tools (and their data) have to be interoperable. That is why we created the VAINT (Video Annotation INTeroperability) interest group.

What is the VAINT initiative and who is involved?

VAINT is an international interest group of developers and users of video annotation tools which aims to find solutions for making scholarly annotations of time-based media interoperable. This initiative started in 2017, after a CLARIAH symposium on video annotation that took place in Amsterdam. The first meeting of the VAINT interest group was in July 2018. The initiative is supported and funded by the Dutch infrastructure project CLARIAH. More information about the VAINT group can be found in their Github repository. The current members of this initiative are listed at the bottom of this post.

plaatjeThe VAINT interest group, second meeting in Amsterdam, July 13, 2019. From left to right: Hugo Huurdeman, Haan Sloetjes, Christian Olesen, Marijn Koolen, John Bell, Liliana Melgar, Joscha Jaeger, Gaudenz Halter, Jaap Blom


Which video annotation tools are involved?

The VAINT initiative includes developers and users of five video annotation tools: 

  • The CLARIAH Scholary Web Annotation Tool is a browser-based tool that is used in e.g. the CLARIAH Media Suite and allows researchers to segment and annotate videos with a range of annotation types, including links, classification codes, tags, comments and metadata cards. 
  • ELAN is a desktop tool used by researchers from many different disciplines, including linguists, conversation analysts, film scholars, communication scholars and sociologists and allows complex annotation of video and audio files using multiple layers of annotations.
  • FrameTrail is a browser-based tool to support annotation of, among others, parliamentary data. It has the ability of linking fragments in parliamentary videos to relevant documents (texts, images, other videos) that FrameTrail can then show in the sidebar to contextualise these fragments. FrameTrail can show annotations in different layers, such as a layer of transcription and a layer linking different segments to different related documents. 
  • Semantic Annotation Tool (SAT) is another browser-based annotation tool that is used in e.g. the Media Ecology Project and allows users to select segments of videos and annotation them with tags, comments and links. 
  • VIAN is a visual film annotation system centered on the semantic aspects of film color analysis.

What was the starting point of the meeting?

This was the second meeting of the VAINT initiative. In the first meeting, the tool developers had explored the needs and requirements for exchanging annotations between tools.

In preparation for this second meeting, the tool developers shared examples of their output data. John Bell, the developer of SAT, made an example annotation based on a format that was drafted in the first meeting, for everyone to start investigating what is needed to make import and export functionality in their tools. Han Sloetjes, developer of ELAN implemented a first version of the import and export for ELAN based on the example annotation, and listed what information from other tools is lost in import and what ELAN information is lost on export.

What happened in the meeting?

We first revisited and improved the scenarios and use cases for data exchange between the tools that would make sense for scholars, and what annotations we expected to exchange between the tools.

We decided on a flexible exchange format that sticks as close to the W3C Web Annotation standard as possible, and that tools need to figure out by themselves how to deal with aspects in the annotations that are uninterpretable by the tool. This points to the important point of whether, upon importing, each tool should store information that is unusable by that tool, so that it can be added back in the export.

We decided to make a minimal extension of the Web Annotation standard to allow for ways of grouping annotations that are implemented in the various annotation tools. For instance, in ELAN, a user can define different layers (so called 'tiers') and add annotations to specific layers, e.g. a layer for shot boundary segments, a layer for transcription of spoken dialogue, a layer for transcribing movement of people in the video and a layer for coding which person is visible in which shot. Upon export, it needs to be clear in the exchange data, what layer an individual annotation belongs to.

What will happen after the meeting?

The meeting closed with three main action points. First, each developer will list the most important elements in their tool output that should not be lost upon export/import.

Second, we will update and extend our draft specification for the exchange format and define elements that are new with respect to the W3C WA standard. For instance, we need a way of grouping annotations that belong to the same layer (maybe by e.g. ELAN or FrameTrail), so that other tools that don’t know the concept of layer can ignore this information but do not throw it away upon import or export.

Third, we will explore whether an existing standard for image viewer interoperability, the IIIF Presentation API, is suitable for our needs. We will discuss this with Tom Crane, one of the initiators of the Audiovisual extension of this standard, to see if this is a viable approach.


Marijn Koolen works at the KNAW Humanities Cluster as a researcher and developer. He is involved in several research projects and digital infrastructure projects within the fields of Digital Humanities, Information Retrieval and Recommender Systems. Part of his development work is related to the Dutch research infrastructure project CLARIAH.

Liliana Melgar-Estrada is user researcher at Utrecht University and The Netherlands Institute for Sound and Vision investigating how to support Scholarly Video Annotation in the Dutch infrastructure for the digital humanities CLARIAH, and conducting user testing and evaluation of the CLARIAH Media Suite.

VAINT members (2019):

  • Marijn Koolen (KNAW Humanities cluster), developer of the CLARIAH scholarly annotation client and server.
  • Jaap Blom (The Netherlands Institute for Sound and Vision, CLARIAH), developer of the CLARIAH Media Suite, and of the scholarly annotation client and server.
  • Han Sloetjes (Max Planck Institute for Psycholinguistics), developer of ELAN.
  • John Bell (Dartmouth College), developer of SAT (Semantic Annotation Tool: Waldorf.js & Statler for The Media Ecology project (led by Mark Williams)
  • Joscha Jaeger (Filmic Web), developer of FrameTrail
  • Gaudenz Halter (University of Zurich), developer of VIAN, an adaptation of ELAN for the Film Colors project, led by Prof. Barbara Flueckiger
  • Hugo Huurdeman (Timeless Future), who worked on the ReVI (REsource viewer) project for the CLARIAH Media Suite, also co-founder of, an interactive video framework
  • Liliana Melgar (Utrecht University and The Netherlands Institute for Sound and Vision), an information scientist at the CLARIAH project, whose focus is on the user requirements for scholarly annotations
  • Christian Olesen (Utrecht University), film scholar who also participated in the first and second meeting, contributing to the scholarly use cases.