LDK Trip report
(by Marieke van Erp)
On 19 and 20 June, the First International Conference on Language, Data and Knowledge (LDK2017) took place in Galway, Ireland. The conference wasn’t too big (~80 participants) and featured a broad and interesting single track programme. It had been a while since I had attended a single track conference, and I had kind of forgotten how much I liked it, so I hope the organisers keep that for the next edition (Leipzig 2019).
CLARIAH collaborator Antal van den Bosch kicked off the conference with the first keynote titled “Processing Text as Socio-Economic and Cultural Data” in which he featured several social sciences and digital humanities text analytics use cases. I really liked his call for a holistic approach to language (which I interpreted as lying at the heart of the conference theme) namely combining whatever information and approaches are available to answer the deeper questions:
After his talk, an audience member mentioned that he found that approaches presented at digital humanities conferences are often still fairly coarse-grained, which may be a result of researcher expecting 100% accuracy. This is something that I have noticed before and which was also a big theme in the second keynote of the conference, by Zoltán Slávik (IBM) who argued that technology developers have a huge responsibility to manage expectations. I think Antal’s answer to the audience question reflected this, and he included a remark on keeping the human in the loop, which is also the direction IBM seems to be taking with porting Watson to the medical domain.
Of particular interest
Most of the talks were really interesting, and for the full programme and proceedings see the conference website. Here few papers that I think are particularly interesting to the CLARIAH community.
On the creation of resources: There was an interesting paper on the creation of an ontology for linguistic terminology (OnLiT: An Ontology for Linguistic Terminology, Bettina Klimek, John P. McCrae, Christian Chiarcos and Sebastian Hellmann) which aims to provide an interoperable model and dataset for linguistic terminology. One of the things we have run into in WP3 is that there are different glossaries etc around for describing different linguistic concepts, perhaps OnLiT is an interesting option to look at to start integrating them. One issue that may arise came up in Maria Keet’s presentation on Representing and aligning similar relations: parts and wholes in isiZulu vs English where certain concepts present in one language, may not exist in another, or only partly. I am not sure yet whether OnLiT can represent all of this, but it is still a work in progress.
Another issue in resource creation is the fact that the resource will always be a snapshot of the language at the time the resource was created. One of our most commonly used resources in language technology is WordNet, but it hasn’t been updated for 10 years. “To tweet” then was a verb that applied to birds, now it refers to creating a microblog. John P. McCrae and Ian Wood presented a paper they wrote together with Amanda Hicks in which they aimed to extend WordNet with Neologisms by gathering terms from Twitter and Reddit and filtering them through various sieves.
During the very nice poster session, some interesting digital humanities use cases were presented. The first two are by the group of Hyvönen in Finland: Named Entity Linking in a Complex Domain: Case Second World War History by Erkki Heino, Minna Tamper, Eetu Mäkelä, Petri Leskinen, Esko Ikkala, Jouni Tuominen, Mikko Koho and Eero Hyvönen and Reassembling and Enriching the Life Stories in Printed Biographical Registers: High School Alumni on the Semantic Web by Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuominen and Laura Sirola
What I liked about these is that they deal with real dirty data, and provide interesting examples for the things we can do with data from for example NIOD and biographical resources.
Another highlight of the poster session for me was Exploring the Role of Gender in 19th Century Fiction Through the Lens of Word Embeddings by Siobhán Grayson, Maria Mulvany, Karen Wade, Gerardine Meaney and Derek Greene. One reason for me to be interested in this is that I’m supervising an MSc thesis that deals with automatic analysis of novels, the other is that I liked how they visualised their results, which I think is very important, especially when working in interdisciplinary settings.
- Gaelic is super interesting, but also super complex (as Graham Isaac’s keynote made clear)
- The crowd was still quite technical, more humanities researchers in attendance may spark even more interesting cross-disciplinary conversations
- The weather in Ireland is really not that bad (but don’t forget your waterproof jacket)
- Kathleen McKeown (University of Columbia, the third keynote speaker) is definitely someone whose work to look into as I already mentioned in this blog post.
- Why aren’t more conferences doing BBQs?