CLARIAH WP5 Computer Vision Expert Meeting
By: Tom Slootweg
On November 18, 2019, the CLARIAH Audiovisual Data and Media Studies work package (WP5) organized a one-day meeting on computer vision and audiovisual data at Utrecht University. The main organizers, Jasmijn Van Gorp, Christian Olesen and Tom Slootweg, invited several Dutch media scholars, as well as computer & data scientists, to explore the potential of computer vision for the analysis of audiovisual data held by cultural heritage intuitions (e.g. film, television, photography). The organizers furthermore invited two special guests, Taylor Arnold and Lauren Tilton (University of Richmond). Arnold and Tilton kindly agreed to participate in the expert meeting, before bringing the day to a close with an interesting public lecture on their Distant Viewing Lab.
The expert meeting functioned as a first, exploratory step towards a more developed strategy for the upcoming computer vision processing pipeline in the CLARIAH Media Suite. Moreover, the organizers aimed to provide an informal platform on which the invited participants could share and discuss some of their preliminary expectations of and requirements for computer vision. To benefit from the expertise of Arnold and Tilton, but also from computer vision experts Nanne van Noord (NISV) and Melvin Wevers (DHLab KNAW), several short pitches were given by scholars who (plan to) work with audiovisual data, highlighting their expectations of computer vision with regard to their research interests.
Thomas Poell (University of Amsterdam) kicked off the meeting with a presentation entitled “Cross-Media Research & Computer Vision.” Poell and his team research reactions to the refugee crisis in Europe across several media: Facebook, Twitter, YouTube, newspapers and television broadcast. Besides distant reading strategies, grounded in textual analysis, he is also interested in the potential of computer vision for the analysis of the re-use of “symbolic events”, including their audiovisual “framing.” Christian Olesen and Nanne van Noord, in contrast, expounded on the insights gained by using computer vision algorithms as a basis for exploratory, serendipitous search in audiovisual archives. The foundations for this particular approach have been laid in the SEMIA project, and in the coming years Olesen and van Noord will further investigate the potential integration of the algorithms developed for this project into the CLARIAH Media Suite.
Susan Aasman and Rob Wegter (University of Groningen) continued with an overview of their first steps in ‘the land of computer vision.’ Their research project, “Intimate Histories: Finding Traces in the Early History of YouTube,” focuses on the early days of vlogging. Aasman and Wegter applied computer vision as a “(pre-)analytical method,” to ascertain whether the use of scene, frame and object recognition might arouse new questions about continuity and change of the visual dimensions in their data. In conclusion, Ruben Ros, a recent master graduate associated to the ReAct project, headed by Ann Rigney, elaborated on the deployment of computer vision on large historical protest photography datasets. The distant viewing strategies discussed by Ros are currently investigated by postdoctoral researcher and historian Thomas Smits, who seeks to develop them as a means to reveal a “visual grammar” of activism.
Discussion and next steps
Throughout the expert meeting, the pitches provided ample ground for discussion. Melvin Wevers, for example, responded to them by underlining that scholars who plan to use computer vision should first and foremost adopt a data-driven approach. Moreover, he underscored the importance of indexing the dataset used, before developing any further research questions. This recommendation was endorsed by Nanne van Noord, who added that ‘any meaningful analysis requires indexing beforehand, because only then can one opt for a robust application of computer vision methods.’ A data-driven approach, as the experts furthermore explained, also entails an awareness of the distinction between systems and variables: what is the information required and how can this then be turned into relevant output for further analysis?
Other relevant topics for discussion were also raised. Taylor Arnold, for instance, emphasized that computer vision should not be used in and of itself. Lauren Tilton added to this remark by arguing that we should eventually be working towards modes of analysis in which computer vision is used in combination with other, complementary methods. The issues of documentation and openness were also flagged by Arnold and Tilton as relevant points for discussion. Many experiments with computer vision and audiovisual data are currently taking place. However, a more or less standardized model to report on important steps taken, or on how to avoid certain pitfalls, is currently lacking. As a response to this undesirable situation, Arnold and Tilton therefore endeavor to make their documentation, toolkits and code freely available on their website. They rightly encourage others to follow suit.
Jasmijn Van Gorp brought the expert meeting to a close with some additional remarks. Based on the issues raised during the meeting, Van Gorp stressed the importance of further debate on how to meaningfully integrate computer vision algorithms and processing pipelines into the Media Suite. Of particular relevance will also be the question of whether the implementation and development of computer vision should solely occur in a “closed system,” due to copyright protected archival materials. Or should we instead strive to make available our code and develop toolkits for those who will not have access to the CLARIAH research infrastructure? Many options are still open and in the coming years important decisions need to be made in this regard. Whatever the outcome may be, Van Gorp concluded the meeting by emphasizing the importance of maintaining the trading zone between humanities scholars and computer scientists we have now established.
The public lecture rounded off the day and gave a valuable insight into the pioneering work done by Arnold and Tilton at the intersection of media studies and digital humanities. Their earlier work on Friends (NBC 1994-2004), but also on the television series Bewitched (ABC, 1964-1972) and I Dream of Jeannie (NBC, 1965-1970), sought to explore how distant viewing can ‘help identify, at scale, the cultural assumptions expressed and reflected in these TV-series.’ The results of this project can be found here. Currently, however, their research also includes other media, such as photography and film. The broadening of scope is necessary, they explained in the lecture, in order to develop various new, complementary computational methods for the distant viewing of visual culture at large.
This broader, more inclusive ambition not only necessitates the rapid prototyping of new toolkits, but also requires a formal framework in which ‘new ontologies can help in predicting features of media.’ A first step to that end is discussed at length in their excellent article “Distant Viewing: Analyzing Large Visual Corpora.” The next task is to delve deeper into the topics of visual style and formal complexity, which will be done in an upcoming paper. To wrap up the lecture, Taylor Arnold again opted to stress the importance of making tools re-useable for other interested parties, also beyond one’s own circle. With this remark, the audience was imparted with the most important takeaways of this special computer vision day: cherish openness and create a fruitful trading zone for the exchange of ideas and tools.