Last week, the 16th International Semantic Web Conference (ISWC 2017) took place in Vienna, Austria. Around 600 researchers from all over the world came together to exchange knowledge and ideas in 7 tutorials, 18 workshops, and 3 full days of keynotes, conference talks, and a big poster & demo session. Needless to say, I only saw a small part of it, but all the papers and many of the tutorial materials are avaialble through the conference website

First of all, kudos to the organising committee for putting together a fantastic programme and great overall surroundings. The WU Campus (workshops, posters & demos and jam session) has a really gorgeous campus with a marvellous spaceship-like library.

The main conference took place next door at the Messe, where the Wifi worked excellently (quite a feat at a CS conference where most participants carry more than one device). The bar for next year is set high! 

But back to the conference: 

On Sunday, I got to present the SERPENS CLARIAH research pilot during the Second Workshop on Humanities in the Semantic Web (WHISE II). There were about 30 participants in the workshop, and a variety of projects and topics was presented. I particularly liked the presentation by Mattia Egloff on his and Davide Picca's work on DHTK: The Digital Humanities ToolKit. They are working on a python module that supports analysis of books and they are developing and testing it for an undergraduate course for humanities students. I really think that by providing (humanities) students with tools to start doing their own analyses, we can get them enthusiastic about programming, as well as thinking about the limitations of such tools, which can lead to better projects in the long run. 

In the WHISE workshop, as well as in the main conference, there were several presentations on multimedia datasets for the Semantic Web. The multimedia domain is not new to Semantic Web, but some of the work (such as Rick Meerwaldt, Albert Meroño-Peñuela and Stefan Schlobach. Mixing Music as Linked Data: SPARQL-based MIDI Mashups Mashups) doesn't just focus on the metadata but actually encodes the MIDI signal as RDF and then uses it for a mashup.

Another very interesting  resource is IMGpedia, created by Sebastián Ferrada, Benjamin Bustos and Aidan Hogan, which was presented in a regular session (winner best student resource paper) as well as during the poster session (winner best poster). The interesting thing about this resource is that it doesn't only allow you to query on metadata elements, but also on visual characteristics. 

IMG 7976

Metadata and content features are also combined in The MIDI Linked Data Cloud by Albert Meroño-Peñuela, Rinke Hoekstra, Victor de Boer, Stefan Schlobach, Berit Janssen, Aldo Gangemi, Alo Allik, Reinier de Valk, Peter Bloem, Bas Stringer and Kevin Page which would for example make studies in ethnomusicology possible. I think such combinations of modalities is super exciting for humanities research where we work with extremelty rich information sources and often need to/want to combine sources to answer our research questions. 

Enriching and making available cultural heritage data is also a topic that keeps popping up at ISWC, this year there was for example "Craig Knoblock, Pedro Szekely, Eleanor Fink, Duane Degler, David Newbury, Robert Sanderson, Kate Blanch, Sara Snyder, Nilay Chheda, Nimesh Jain, Ravi Raju Krishna, Nikhila Begur Sreekanth and Yixiang Yao: Lessons Learned in Building Linked Data for the American Art Collaborative". This project was a pretty big undertaking in terms of aligning and mapping museum collections. I really like that the first lesson learnt to create reproducible workflows: 

IMG 7960

This doesn't only hold for conversion of museum collections, but for all research. But it's still nice to see mentioned here. Reproducibility is also a motivator in "Tobias Kuhn, Egon Willighagen, Chris Evelo, Núria Queralt Rosinach, Emilio Centeno and Laura Furlong: Reliable Granular References to Changing Linked Data" which investigates the use of nanopublications to enable referring to items or subsets within data collections for finegrained referencing of previous work.

My favourite keynote at this conference (and they had three excellent ones) was by Jamie Taylor, formerly of Freebase, now Google. He argued for more commonsense knowledge in our knowledge graphs. While I do think that is a great vision, as many of our resources lack this leading to all sorts of weird outcomes in for instance named entity linking (you can ask Filip Ilievski for the funniest examples) it was unclear how to go about this this and whether this would be possible at all. The examples he gave in the keynote for toasters and kettles would work out just fine (kettles heat up water, toasters heat up baked goods) but for complex concepts such as murders (Sherlock Holmes anyone?) I'm not sure how this would work. But enough food for thought. See also Pascal Hitzler's take on this keynote

For other highlights of the conference, check out these other trip reports by Juan Sequeda and Paul Groth.

 

See you in Monterey, California next year? 

IMG 7990

 

clarin logoSTSubmitted by Karolina Badzmierowska on 23 October 2017

Tour de CLARIN

“Tour de CLARIN” is a new CLARIN ERIC initiative that aims to periodically highlight prominent User Involvement (UI) activities of a particular CLARIN national consortium. The highlights include an interview with one or more prominent researchers who are using the work of national consortium’s infrastructure and can tell us more about their experience with CLARIN in general; one or more use cases that the consortium is particularly proud of and any relevant user involvement activities carried out. “Tour de CLARIN“ helps to increase the visibility of the national consortia, reveal the richness of the CLARIN landscape, and to display the full range of activities throughout the network. The content is disseminated via the CLARIN Newsflashblog posts and linked to on our social media: Twitter and Facebook.

The Netherlands

CLARIAH-NL is a project in the Netherlands that is setting up a distributed research infrastructure that provides humanities researchers with access to large collections of digital data and user-friendly processing tools. The Netherlands is a member of both CLARIN ERIC and DARIAH ERIC, so CLARIAH-NL contributes therefore not only to CLARIN but also to DARIAH. CLARIAH-NL not only covers humanities disciplines that work with natural language (the defining characteristics of CLARIN) but also disciplines that work with structured quantitative data. Though CLARIAH aims to cover the humanities as a whole in the long run, it currently focusses on three core disciplines: linguistics, social-economic history, and media studies.

CLARIAH-NL is a partnership that involves around 50 partners from universities, knowledge institutions, cultural heritage organizations and several SAB-companies, the full list of which can be found here. Currently, the data and applications of CLARIAH-NL are managed and sustained at eight centres in the Netherlands: Huygens Ing, the Meertens Institute, DANS, the International Institute for Social History, the Max Planck Institute for Psycholinguistics, the Netherlands Institute for Sound and Vision, the National Library of the Netherlands, and the Institute of Dutch Language. Huygens Ing, The Meertens Institute, the Max Planck Institute for Psycholinguistics, and the Institute of Dutch Language  are Certified CLARIN Type B centres. The consortium is led by an eight-member board and its director and national coordinator for CLARIN ERIC is Jan Odijk.

The research, development and outreach activities at CLARIAH-NL are distributed among five work packages: Dissemination and Education (WP1) and Technology (WP2) deal respectively with User Involvement and the technical design and construction of the infrastructure, whereas the  remaining three work packages focus on three selected research areas: Linguistics (WP3), Social and Economic History (WP4) and Media Studies (WP5).

 

The full blog can be read here: https://www.clarin.eu/blog/tour-de-clarin-netherlands

 

zurich

17 october 2017, Christian Olesen

Early September, Liliana Melgar and I (Christian Olesen) received an invitation from Barbara Flückiger, Professor in Film Studies at the University of Zürich, to participate in the “Colloquium Visualization Strategies for the Digital Humanities”. The aim of the day was to bring together experts to discuss film data visualization opportunities in relation to Professor Flückiger’s current research projects on the history of film colors. Currently, Flückiger leads two large-scale projects on this topic: the ERC Advanced Grant FilmColors (2015-2020) and the Filmfarben project funded by the Swiss National Science Foundation (2016-2020). A presentation of the projects’ team members can be found here.

As a scholar, Barbara Flückiger has in-depth expertise on the interrelation between film technology, aesthetics and culture covering especially aspects of film sound, special effects, film digitization and film colors in her research. In recent years, her research has increasingly focussed on film colors, especially since the launch of the online database of film colors Timeline of Historical Film Colors in 2012 after a successful crowdfunding campaign. The Timeline of Historical Film Colors has since grown to become one of the leading authoritative resources on the history and aesthetics of film colors – it is presented as “a comprehensive resource for the investigation of film color technology and aesthetics, analysis and restoration”. It is now consolidating this position as it is being followed up by the two large-scale research projects mentioned above which merge perspectives from film digitization, restoration, aesthetic and cultural history.

These projects are entering a phase in which the involved researchers are beginning to conceive ways of visualizing the data they have created so far and need to consider the potential value which data visualization may have for historical research on film color aesthetics, technology and reception.

In the full report with a lot of impressions from the vist can be read here.

On Friday, October 6th 2017 an enthusiastic group of engineers and digital humanities scholars gathered for the third annual CLARIAH Tech Day. There was an activist mood, this time we would do things differently!

Many developers in the project wanted a meeting in which building stuff would be the focus instead of listening to presentations on how other people had built stuff. The weeks before had seen a flurry of emails on the contents of such a day and the agenda, but also on doubts and concerns. And the truth was: none of us actually had the foggiest idea of how to do this.

gertjanNo one had the foggiest idea of how to do this Tech day!

I was asked to take the lead, and together with Roeland Ordelman, Richard Zijdeman and Marieke van Erp we sat down during the CLARIN Meeting in Budapest to kick around some ideas. We settled on a hackathon/unconference-style format. The agenda would be open to suggestions from the community and not be set until the meeting itself. And I’ll confess - I had some prior hesitations on this open format: what if nobody would come up with anything? Wouldn’t people want to know what the meeting was about before making time in busy schedules? But this was what the community itself had repeatedly asked for, so damn the torpedoes - full steam ahead.

And we were not disappointed! The ideas, suggestions and questions poured in and were eventually gathered into four main topics:

  1. Integration and modelling of shared data between the various domains and the generic CLARIAH infrastructure;
  2. Continued development of GRLC;
  3. A discussion on workflows, and how tool selection based on data mime-type can provide guidance for users;
  4. TEI/exist-db/TEIPublisher and Oxygen as the basis for digital editions and linguistic querying.
techdag 1 techdag 2 techdag 3 techdag 4 techdag 5

The enthusiastic response continued into the event itself. It became immediately obvious that the restyled Tech Day would also be a lot of fun. The smiles, enthusiasm and flexibility were fantastic. The number of developers who had come from all over CLARIAH had brought many guests, turning this into a truly international day that generated a very positive vibe of its own.

After a five minute pitch for each topic, the community basically took over the pantry, restaurant and meeting rooms at the IISH building. You could find groups of engineers working, discussing and building stuff everywhere. And these groups were extremely varied: people from Media Studies discussing GRLC with engineers working in the field of Social Economic History, and Linguists and Lexicographers getting stuff done with developers working on generic infrastructure. Many new ideas were born that day.

iisgLunch at the IISH

A lot of progress was made on the four main topics. Both Open Dutch Wordnet and the first version of the diachronous lexical corpus Diamant (INT, Kathrien Depuydt and Jesse de Does) were connected to the generic infrastructure, as were catalogues provided by NISV, and the Gemeente Geschiedenis dataset on Dutch municipalities (by Hic Sunt Leones). Carlos Martinez and a group of engineers added to GRLC the automatic inclusion of SPARQL queries stored in github. And there were plenty of discussions on planned and unplanned subjects. Jan Odijk and Jesse de Does ran a very interesting meeting on workflow systems and Eduard Drenth (Fryske Akademy) presented his ideas on digital editions followed by a very detailed open discussion on the pro’s and cons of the software stack he proposed.

Completely spontaneous, Richard Zijdeman showed us a new way of implementing HDMI for the improvement of health in CLARIAH, and Roeland Ordelman and Liliana Melgar came up with very interesting ideas on a user workspace that may eventually become part of the generic infrastructure. Although interest in the first was quite short-lived, the latter we are definitely going to test.

In short: the CLARIAH tech community rallied around the open format! During the final meeting I was happy to announce that given the excitement and energy, the board had decided right then and there, that we could run another Tech-meeting in late winter, early spring 2018. And with illustrating enthusiasm the first ideas for this meeting are already coming in.

 

Gertjan Filarski

 

BudapestThe 2017 CLARIN Annual Conference was held from September 18 through September 21 in Budapest.

The pre-conference part of the first day  was dedicated to committee and task force meetings such as the national coordinator’s forum (Jan Odijk participating on behalf of the Netherlands), the user involvement group (NL represented by Patricia Alkhoven), the Standards committee (with Daan Broeder and Jan Odijk on behalf of NL), and others.

There were about 170 participants from the 19 CLARIN ERIC members and 2 observer states, as well as from organisations and countries with whom cooperation discussions are on-going. The Netherlands delegation was relatively small in comparison to other years (only 7 delegates this year), but both key note speakers were from the Netherlands: Karina van Dalen-Oskam (Huygens ING) told about her work on  stylometric research and Piek Vossen (Free University Amsterdam) about the principles and research questions behind extraction information from natural language texts and representing this as linked data.

I found a number of things noteworthy: first, many data from Europeana have now been included in the Virtual Language Observatory, bringing the number of metadata records from some 900K to over 1.6 million.
Second, improvements and extensions of the CLARIN Language Resource Switchboard  (CLRS) created in the CLARIN-PLUS project were reported on. CLRS makes it possible to automatically associate data with applications that apply to them: such an application can then be applied to the data by a user through a single click. This significantly lowers the barrier for using these applications, and it is worthwhile to investigate whether more Dutch applications can be included in the CLRS (currently only some of the Nijmegen applications are included). The concept could be applied in the other CLARIAH core disciplines (social economic history and media studies) as well.
Third, Poland has a very active community and is providing an increasing number of data sets, applications and web services. And finally, all member and observer countries are now connected through federated login, connecting more than 20 countries and thousands of organisations, an impressive achievement indeed!

This year, Paul Meurer from  Uni Research Computing, Norway was awarded the Steven Krauwer award for CLARIN Achievements. As in the past years, the Bazaar was again an informal and very lively event to share the latest ideas and developments. Very interesting was for instance Ramble On by the Italian DH Group from Trento. Ramble On allows you to analyse e.g. the mobility of past famous individuals by using Natural Language Processing modules applied to unstructured texts.

The social programme was very attractive as well. On the first day we had a reception at the Academy Building, which was a quite impressive building and the trip towards it made it possible to view some of the parts of Budapest around the Danube river. On Tuesday there was a dinner on a boat which travelled up and down the Danube river, with spectacular views on the beauty of Budapest.

Budapest, by the way, is considered the birthplace of CLARIN: It was pointed out by Tamás Váradi, the local organizer and confirmed by Steven Krauwer that CLARIN originated here in a workshop  in 2006.

I enjoyed the conference very much and I am looking forward to the 2018 CLARIN Conference, for which the dates and locations are not yet known.

Jan Odijk