• 23 February 2023

An interview with Jan Odijk: professor emeritus & CLARIAH director

After an impressive scientific career, Jan Odijk, CLARIAH director and professor of language and speech technology at Utrecht University, is retiring.

Officially, Jan Odijk has resigned as of the end of September, but in practice he is still regularly active for CLARIAH. "Those are mainly projects that were already running, mind you. Other than that, I'm mainly practicing hobbies. That is actually very similar to what I was doing in my job, but without the tedious tasks that sometimes came with it," he says from his office at home in Bilthoven.

Odijk can look back on a respectable career, both in language and speech technology and more broadly in Digital Humanities (DH), although he doesn't really like that term. More about that later on, first a bird's eye view of his resume. After graduating in 1981, Odijk worked for nearly 23 years (1985-2008) as a language technologist for companies such as Philips, Lernout & Hauspie, ScanSoft and Nuance. He also received his doctorate in 1993 from Tilburg University on a formal description of grammatical constructions for translation computers. In 2001, he was appointed professor of language and speech technology at Utrecht University. From that position, he was involved in several projects that eventually helped found CLARIAH.

For example, he was on the steering committee of the Corpus Gesproken Nederlands (CGN), an annotated collection of 900 hours of contemporary Dutch speech, and of IMIX, a project that developed an interactive demonstrator that answers medical questions. He also chaired the STEVIN program committee, a stimulus program for language and speech technology. From 2009, Odijk was program director of CLARIN-NL, the digital infrastructure for humanities scholars working with linguistic data. In 2013, CLARIN-NL joined forces with DARIAH, the other humanities infrastructure project on NWO's national roadmap, and CLARIAH was a reality. Odijk was director from the start.


While these are all linguistic projects, this does not necessarily mean that linguistics is at the root of digital humanities, Odijk argues. "When I taught computational linguistics in the 1980s, there were already computational courses for other disciplines. Those were taken by historians, literalists, so it was much broader back then. What is true: almost everyone in the humanities studies text. And in order to study text, you have to do linguistic things. Usually you then need more than just a search engine. So in that sense, linguistics may play more of a primary role than other fields. But by now, linguistics is even a minority within CLARIAH. And in the future, when CLARIAH will collaborate in SSHOC-NL with ODISSEI, the infrastructure for the social sciences, the share will become even smaller."

Anyway, back to that term for a moment: digital humanities. "I do not like the term at all," Odijk explains. "The term suggests that it's a discipline, but it's not. You just practice a discipline from the humanities, using computational techniques when necessary." And 'computational' is already better than 'digital', because, "nowadays everyone does everything digitally, with computers. The term Digital Humanities is simply too big and too vague. However, I don't have a good alternative either, and this term has just taken hold."

Thus, digital humanities are methodological rather than curricular in nature. And in that sense, they have been around much longer than the term. "That term came up sometime around 2008, but the activity itself is very old."


Over the last 10 or 15 years, perhaps still as a result of introduction of the term, he has seen the humanities change. "There were always a few pioneers trying out new techniques, but the majority still worked in the traditional way. Meanwhile, the number of people working with DH has grown considerably and it is also becoming more and more normal to include the methods in teaching, so students are much more familiar with them as well."

This change did not come about by itself. "In the initial phase of CLARIN-NL, Arjan van Hessen went to all the universities to talk to humanities scholars about the digital possibilities. Because if you don't know what is realistically possible, you can't formulate your wishes in such a way that a technical person can use them. That's how we taught technicians and humanities scientists to speak the same language."

Also, more and more humanities scientists are now able to do at least some programming. "Actually, I think every humanities scholar should take a programming course. Not to become a programmer yourself, but it makes talking to technical people a lot easier. Which language you choose then doesn't matter so much - Python and R are popular now - it's all about the way of structured and exact thinking."

Looking back, Odijk is most proud of the collaboration between all the different disciplines. "The fact that since 2009, so almost thirteen years now, we have been working together in harmony on infrastructure for the humanities. Of course, this is not solely attributable to me, it is a collective result. But it is a great result."

The future

That said, he does see areas for improvement for CLARIAH in the future. "Differences in formats for data and metadata often makes combining functionalities difficult. An example: even though, partly on my initiative, a standard for parliamentary data, Parla-CLARIN, was successfully created and applied in recent years, there is another standard for parliamentary data for which it seemed too difficult to integrate it as well," Odijk says. "You will probably keep doing that, because in the end it is not about the formats, but about the research questions and what data are important for that. Still, there needs to be continuous work to see if those standards can still work together."

The integration of the many search engines is also high on his wish list. "There have been dozens of projects in which a search engine has always been developed with one specific function added. All of those have to be maintained, which of course is not going to work. It would make much more sense to have far fewer search engines, which can be easily expanded with new data or with advanced ways of searching, such as sentiment mining or topic search."

For such an advanced way of searching, users sometimes do need some additional guidance. "A lack of understanding of how the underlying data are structured quickly plays tricks on them," he observes. "Especially with complex data structures, such as treebanks or a triple store. These are also difficult to explain, so then you have to find a way to still offer the tool without needing that knowledge. Unfortunately, there is no ready-made solution for that. But with GrETEL, a search engine for sentence structures, you can enter a sample sentence of the structure you are interested in, for example, and a query is automatically formulated for you. That helps users tremendously."

"Digital humanities will never be 'finished' anyway, because research is never finished and digital humanities provides the tools for research. New techniques or new adaptations of existing techniques will always be needed to answer new research questions," Odijk says.

His own main focus in the coming period will be on SASTA, a project he has now been involved with for several years. "We are developing software to analyze the language use of children or patients with aphasia. That used to be done manually, but we have now been able to automate part of it with good results. I will continue to enjoy doing that even after I retire."

Interview by Erica Renckens, science journalist.