It all seemed rather funny to them, until the very moment they laid eyes upon the prison block. As ‘Team Clariah’ Marieke van Erp (VU, WP3) and Richard Zijdeman (IISG, WP4) participated in the National Library's HackaLOD on 11-12 November. Alongside seven other teams they faced the challenge of building a cool (prototype) application using Linked Open Data made especially available for this event, by the National Library and Heritage partners. It had to be done within 24 hours… Inside a former prison… Here’s their account of the event.

We set out on Friday, somewhat dispirited as our third team mate Melvin Wevers (UU) was caught out by a cold. Upon arrival, it turned out we had two cells: one for hacking and one for sleeping (well more like for a three-hour tossing and turning). As you'd expect, the cells were not exactly cosy, but the organisers had provided goodie bags from which the contents were put to good use and even a Jaw Harp midnight concert.



With that, and our pre-set up plan to tell stories around buildings we set out to build our killer app. We found several datasets that contain information about buildings. The BAG for example contains addresses, geo-coordinates and information about how a building is used (as a shop or a gathering place) and 'mutations' (things that happened to the building). However, what it doesn't contain is building names (for example Rijksmuseum or Wolvenburg), which is contained in the Rijksmonumenten dataset. But the Rijksmonumenten dataset doesn't contain addresses, but as both contain geo-coordinates, they can be linked. Yay for Linked Data!

To tell the stories, we wanted to find some more information in the National Library's newspaper collection. With some help from other hackers we managed to efficiently bring up news articles that mention a particular location. With some manual analysis, we for example found that for Kloveniersburgwal 73 up until 1890 there was a steady stream of ads asking for ‘decent’ kitchen maids, followed by a sudden spike in ads announcing real estate. It turns out a notary had moved in, for which another (not linked) dataset could also provide a marriage license, confirmed by a wedding ad in the newspaper. These sort of stories can give us more insight into what happened in a particular building at a given time.

We have made some steps in starting to analyse these ads automatically to detect these changes in order to automatically generate timelines for locations, but we didn't get that done in 24 hours. However, the audience was sufficiently pleased with our idea for us to win the audience award! (Admittedly to our great surprise, as the other teams' ideas were all really awesome as well). We’re now looking for funding to complete the prototype.

In summary, it was all great fun, not in the least due to great organisation by the National Library as well as the nice ‘bonding’ atmosphere among the teams. So, our lessons learnt:

  • prison food is really not that bad (and there was lots of it)
  • 24 hours of hacking is heaps of fun
  • the data always turn out to behave different from what you'd expect
  • isolated from the daily routine, events like these prove crucial to foster new ideas and relations, in order to keep the field in motion.

(by Marieke van Erp)

This year, the 15th International Semantic Web Conference took place in Kobe Japan. The conference itself was 3 days with 3 parallel sessions as well as a 3-hour poster and demo session one evening. The two days prior to the main conference 5 tutorials and 16 workshops took place.


For NLP aficionados there was the the LD4IE (Linked Data for Information Extraction) workshop which I attended on Tuesday morning, the NLP&DBpedia workshop that I co-organised on Tuesday afternoon, the keynote by Kathleen McKeown (Columbia University) on Wednesday and the NLP session in the main conference on Friday. But there were other NLP papers dispersed along the conference programme.


For the CLARIAH community some of the work McKeown presented on computational analysis of novels is probably most relevant. It was also nice to see that more research is moving towards event extraction, for example in the work of Valentina Presutti and Aldo Gangemi (presented at the LD4IE workshop). They presented a new resource called Framester that links up all types of resources such as FrameNet, VerbNet and DBpedia to help describe events. New at the conference was the journal papers track, where I got to present our work on building Event-centric Knowledge Graphs [slides] [paper] to a pretty big room.


Sentiment analysis was also a hot topic, with several interesting papers such as On the Role of Semantics for Detecting pro-ISIS Stances on Social Media by Hassan Saif, Miriam Fernandez, Matthew Rowe and Harith Alani and A Replication Study of the Top Performing Systems in SemEval Twitter Sentiment Analysis by Efstratios Sygkounas, Giuseppe Rizzo, Raphaël Troncy. Incidentally, the last paper was only one of two replication papers in the conference.


There weren’t that many papers this year dealing with humanities research questions. Next year’s conference will take place in Vienna, perhaps CLARIAH can mitigate that?  

 Marieke in Japan

[This post is based on Maartje Kruijt‘s Media Studies Bachelor thesis: “Supporting exploratory search with features, visualizations, and interface design: a theoretical framework“.]

In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed.  In the exploratory phase of research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research

Dive screenshotThe DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed  manually by researchers.

The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.

Within DIVE and CLARIAH, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.


Linked Data, RDF and Semantic Web are popular buzzwords in tech-land and within CLARIAH. But they may not be familiar to everyone within CLARIAH. On 12 september, CLARIAH therefore organized a workshop at the Vrije Universiteit Amsterdam to discuss the use of Linked Data as technology for connecting data across the different CLARIAH work packages (WP3 linguistics, WP4 structured data and WP5 multimedia).


The goal of the workshop was twofold. First of all, to give an overview from the 'tech' side of these concepts and show how they are currently employed in the different work packages. At the same time we wanted to hear from Arts and Humanities researchers how these technologies would best suit their research and how CLARIAH can support them in familiarising themselves with Semantic Web tools and data.

The workshop

Monday afternoon, at 13:00 sharp, around 40 people showed up for the workshop at the Boelelaan in Amsterdam. The workshop included plenary presentations that laid the groundwork for discussions in smaller groups centred around the different types of data from the different WPs (raw collective notes can be found on this piratepad).


  • Rinke Hoekstra presented an Introduction Linked Data: What is it, how does it compare to other technologies and what is its potential for CLARIAH. [Slides]
    In the discussion that followed, some concerns about the potential for Linked Data to deal with data provenance and data quality were discussed.

  • After this, three humanities researchers from each of the work packages discussed experiences, opportunities, and challenges around Linked Data. Our "Linked Data Champions" of this day were:

    • WP3: Piek Vossen (Vrije Universiteit Amsterdam) [Slides]

    • WP4: Richard Zijdeman (International Institute of Social History)

    • WP5: Kaspar Beelen and Liliana Melgar (University of Amsterdam) [Slides]


Marieke van Erp, Rinke Hoekstra and Victor de Boer then discussed how Linked Data is currently being produced in the different work packages and showed an example of how these could be integrated (see image). [Slides]. If you want to try these out yourself, here are some example SPARQL queries to play with.

Break out sessions

hisco.jpgFinally, in the break out sessions, the implications and challenges for the individual work packages were further discussed.

  • For WP3, the discussion focused on formats. There are many natural language annotation formats used, some with a long history, and these formats are often very closely connected to text analysis software. One of the reasons it may not be useful to WP3 to convert all tools and data to RDF is that performance cannot be guaranteed, and in some cases has already been proven to not be preserved when doing certain text analysis tasks in RDF. However, converting certain annotations, i.e. end results of processing to RDF could be useful here. We further talked about different types of use cases for WP3 that include LOD.

  • The WP4 break-out session consisted of about a dozen researchers, representing all working packages. The focus of the talk was on the expectations of the tools and data that were demonstrated throughout the day. Various persons were interested to apply QBer, the tool that allows one to turn csv files into Linked Data. The really exciting bit about this, is that the interest was shared by persons outside WP4, thus from persons usually working with text or audio-video sources. This does not just signal the interest in interdisciplinary research, but also the interest for research based on various data types. A second issue discussed was the need for vocabularies ((hierarchical) lists of standard terms). For various research fields such vocabularies do not yet exist. While some vocabularies can be derived relatively easily from existing standards that experts use, it will prove more difficult for a large range of variables. The final issue discussed was the quality of datasets. Should tools be able to handle 'messy' data? The audience agreed that data cleaning is the responsibility of the researcher, but that tools should be accompanied by guidelines on the expected format of the datafile.

  • In the WP5 discussion, issues around data privacy and copyrights were discussed as well as how memory institutions and individual researchers can be persuaded to make their data available as LOD (see image).

wp5 result.jpg


The day ended with some final considerations and some well-deserved drinks.


Logo AVinDHA summary and a reflection after the workshop at the Digital Humanities conference in Krakow (July 12-15, 2016)

By Liliana Melgar Estrada

The second version of the workshop “Audiovisual Data And Digital Scholarship: Towards Multimodal Literacy” (AVinDH workshop) took place during the Digital Humanities conference in Krakow which finished July 16.
Digital Humanities is the annual international conference of the Alliance of Digital Humanities Organizations (ADHO). In its 28th edition, the Jagiellonian University and the Pedagogical University warmly welcomed 902 people from all over the world.

The AvinDH workshop had a total of 55 participants, a keynote, 8 papers, and 10 lightning talks discussing the subject of using audio-visual media in the context of digital humanities scholarship.


The AVinDH workshop is a follow-up to the first edition held at the 2014 DH Conference in Lausanne, which led the basis for creating the Special Interest Group AVinDH at the next DH conference in Sydney in july 2015 (SIG-AVinDH). This group was initiated by researchers from the Erasmus Studio based at the Erasmus University in Rotterdam, and from the Netherlands Institute for Sound and Vision. The aim of the interest group is to create “a venue for exchanging knowledge, expertise, methods and tools by scholars who make use of audiovisual data types that can convey a certain level of narrativity: spoken audio, video and/or (moving) images.”(see website)

The workshop

The session opened with an introduction by Stef Scagliola, historian specialized in opening up audiovisual archives for multidisciplinary research, with an emphasis on oral history collections, and one of the founders of the special interest group. Scagliola introduced the main questions motivating the creation of the SIG-AVinDH and the workshop. A central issue is how audio-visual (AV) sources differ from textual sources, and/or how the ways of indexing or accessing AV materials, currently mainly via textual representations, have implications for research practices. Scagliola also summarized the scholarly process, and presented the status of current information systems support for each part of that process, highlighting the limitations to the “analysis” part of it.


Claire ClivazThe workshop continued with a keynote by Claire Clivaz, head of Digital Enhanced Learning at the Swiss Institute of Bioinformatics of Lausanne, a specialist in the field of the New Testament manuscripts and textual criticism. From her experience in textual based scholarship and her knowledge of current digital technologies, her presentation, entitled “Images, Sound, Writing in Western: a long hatred-love story?”, discussed the issues related to the validity and acceptance of AV sources in fields that are traditionally based on texts.

Based on several examples from biblical, literary, and art studies, Clivaz explains how scholarship, and our relationship to culture, is being transformed by “the emergence of a multimodal digital culture” in which text, images and sounds are intertwined. She also concludes that the well known principles for persuasion in rethorics - logos, pathos and ethos - will become more dominant due to transition from textual to multimodal communication. She invited the audience to consider the way in which they could apply multimodal approaches to scholarly publications.

Clivaz’ keynote was followed by three paper sessions:

  1. Models for training digital humanists in accessing and analyzing audiovisual collections
  2. Analysis and discovery models for audiovisual materials
  3. Copyright and Sustainability

1. First session

Clara HendersonIn the first session, chaired by Clara Henderson (Indiana University), two presentations described the use of AV materials and tools in training students. The presentation by Michaël Bourgatte (Catholic University of Paris), “When video annotation supports audiovisual education,” described his experience as a teacher using the open source video annotation software developed with the IRI (a research and innovative lab based in the Centre Pompidou): Lignes de Temps (which translates to “Timelines” in French). Bourgatte used this tool in the classroom, for introducing both children in the Paris suburbs, high-school students, and master students to the basis of film analysis and media literacy, which would enable them to critically judge the films/media they watch. Next, an educational project with bachelor students in media studies was presented by Jasmijn van Gorp & Rosita Kieweik (Utrecht University).

In their presentation, “What’s Not in the Archive: Teaching Television History in the ‘Digital Humanities’ Era”, they explained different strategies to engage the students of the “Television History Online” course with the use of archival materials, in order to let them build their understanding of the implications of using institutional collections and access tools, as well as online video platforms such as YouTube by reflecting critically on their selection processes and on how canons are built. Students were challenged to take informed decisions and play an active role in explaining them when their selections were influenced or impeded by access problems associated to copyright.

2. Second session

Martijn KleppeIn the second paper session, chaired by Martijn Kleppe (National Library of the Netherlands), four papers described current projects attempting to facilitate access to AV collections by different means. The presentation by Taylor Arnold and Lauren Tilton (Yale University) showed the use of computational and statistical methods for studying a large photographic corpus, the FSA-OWI Photographic Archive, a collection of over 170,000 photographs taken by the United States Government between 1935 and 1945. Tilton presented a demo of “Photogrammar,” a web-based platform for organizing, searching, and visualizing the large the FSA-OWI photographic collection, as well as their current data experiments and tools.

Next, Andrek Ibrus’ (Tallinn University) presentation, "Metadata as a ‘cultural modeling system’: A new rationale to study audiovisual heritage metadata systems”, described a four-year research project that studies the evolution of AV heritage metadata in Estonia, and their uses and effects to cultural memory formation. This project presents a similar critical approach to the archival practices and systems that shape audiovisual heritage, as in the previous experience described by van Gorp and Kieweik. The next two presentations focused on the processes and models of scholarly annotation of time-based media.

Melgar and Koolen, on behalf of the other authors, introduced "A conceptual model for the annotation of audiovisual heritage in a media studies context,” which is part of the current work in the context of CLARIAH-media studies in the creation of a user space, where scholars can access AV collections, and manually or semi-automatically annotate and enrich them. The presentation included both a conceptual model of the annotation phenomena (understood in a broader sense), and a process model of scholarly annotation in the framework of research stages in media studies.

mepTo conclude the session, Professor Mark Williams (Darthmouth College) presented "The Media Ecology Project: Developing New Tools for Semantic Annotation of Moving Images”, one of the most important ongoing endeavors in supporting scholarly work in film and media studies within a collaborative approach between archives and the scholarly community, and also between scholars, who can collaboratively perform close reading of their sources using different integrated platforms integrated in the Media Ecology Project. These platforms include Mediathread, a classroom platform developed at Columbia University; Scalar, a digital publishing platform developed at The University of Southern California; and, a new online tool which was developed for MEP and will facilitate the creation of controlled vocabularies that can be assigned to online media files, and the Semantic Annotation Tool (SAT), a tool currently in development at MEP.

3. Third session

Johan OomenThe third paper session, on copyright and sustainability, chaired by Johan Oomen, included a presentation by Simone Schroff (Institute for Information Law, University of Amsterdam), “Licensing audio- visual archives from a copyright perspective: between assumptions and empirical evidence”, who described in detail the factors that archives have to take into account when they intend to open their archives for online research or educational use. The presenter clearly introduced the basics of the intrinsically complicated landscape of copyright and industry practices, and pointed to interesting, less difficult directions, based on her empirical study of the contractual copyright arrangements of several public service broadcasters in the Netherlands between 1951- 2010.

Next, Inna Kizhner (Siberian Federal University Krasnoyarsk & University College London), on behalf of the other authors, presented "Licensing Images from Russian Museums for an Academic Project within Russian Legislation”, an empirical study about the actual willingness and possibilities of collaboration between musea and academic projects in online curated environments in Russia, showing the complications of dealing with legislation and museum policies in practice.

Lightning talks

The workshop included a lively session of “lightning talks”, where participants could shortly, and enthusiastically, present an idea or ongoing project to the audience. The pitch presentations included topics such as current projects that support annotation for scholarly and educational projects in different domains: EVIA (for ethnographic research), Scalar (for digital publishing), and Memorekall (for web videos in education). Projects related to saving sounds (the British Library Save Our Sounds Project), music (Restoring Early Musical Voices of India), Youtube videos (reconstructing abandoned personal YouTube collections), and performing arts in Japan (the Japanese Performing Arts Resource Center project) had also a 5-minute slot in the workshop.

There was also an enthusiastic invitation to include games with a purpose for annotating videos (which has already been explored in previous projects), a current scholarly project to study “the expressive body” within the context of the Media Ecology Project, and a report of ongoing work within CLARIAH on visualizing missing data in collections.


stef scagliolaThe workshop concluded with a summary presentation by Stef Scagliola, who revisited the initial questions. Scagliola concluded that the disciplines which are mostly concerned with AV media and multimodality are growing, which requires the need for an increasing need for scholars in incorporating other skills and critical perspectives into the production of scholarly knowledge.

The second edition of the AVinDH workshop, confirmed its importance and good reception by the scholarly community. Future editions will be also the occasion for bridging the gap between current progress on content-based video retrieval (as described for instance in Huurnink et al., 2012) and scholarly practices which need to rely on access and annotation of AV (and time-based) media.

Likewise, this venue also offers the opportunity to create links with other communities who are investigating how crowdsourcing and nichesourcing of time-based sources (as shown in the work by Gligorov et al., 2011; Oomen et al., 2014, Melgar et al., 2015) could be used to increase access to audiovisual archives. Simultaneously, other groups are developing tools for "close reading" of AV sources in scholarly domains (KWALON, organizer of the forthcoming conference on qualitative data analysis software), which seem to be quite isolated from the previous developments, and could find a space here to be discussed.

One challenging task for the workshop and interest group will be to strengthen the links with other venues where the disciplines that, by definition, are focused on the analysis of AV media (e.g., film/cinema/television studies or art history) are reflecting on the impact of the digital turn on their practices. In this case, the workshop presents an opportunity for discussing the common issues to these traditionally AV-oriented disciplines, and the methodological implications for other disciplines which have not traditionally been attached to the audio-visual message. Sharing their perspectives can bring new insights to the scholarly work in the context of multimodal research (and education), and to share best practices related to the challenges of analyzing and using audiovisual data in the context of digital humanities scholarship.

Workshop’s website
Collaborative minutes



Gligorov, R., Hildebrand, M., van Ossenbruggen, J., Schreiber, G., & Aroyo, L. (2011). On the role of user-generated metadata in audio visual collections (pp. 145–152). Presented at the K-CAP ’11, New York, NY, USA: ACM.

Huurnink, B., Snoek, C. G. M., de Rijke, M., & Smeulders, A. W. M. (2012). Content-Based Analysis Improves Audiovisual Archive Retrieval. IEEE Transactions on Multimedia, 14(4), 1166–1178.

KWALON. Reflecting on the future of QDA Software: Chances and Challenges for Humanities, Social Sciences and beyond.

Melgar Estrada, L., Hildebrand, M., de Boer, V., & van Ossenbruggen, J. (2016). Time-based tags for fiction movies: comparing experts to novices using a video labeling game. Journal of the Association for Information Science and Technology,

Oomen, J., Gligorov, R., & Hildebrand, M. (2014). Waisda?: making videos findable through crowdsourced annotations. In M. Ridge (Ed.), Crowdsourcing our Cultural Heritage (pp. 161–184). Ashgate Publishing, Ltd.