With the start of CLARIAH PLUS, a new work package has been added to the CLARIAH family: Work package 6 "Text". The aim of the work package is to provide support for researchers who are specifically interested in text. This includes literary scholars, historians, philologists, and editing scientists. These target groups of researchers often need support for the entire digitisation chain: from primary digitisation, enrichment and publication of digital sources, to instruction in the use of computational analysis tools.

Work Package 6 provides an online environment in which these researchers can consult existing digital text files and analyse them with various statistical and machine-learning based analysis tools. The analysis results can be stored as enriched resources in the CLARIAH service structure to create circular data creation, reuse and enrichment.

Work Package 6 relies on CLAAS (CLARIAH As A Service), the digital infrastructure developed and delivered in the other work packages. A number of use cases answering to concrete research questions will be implemented in order to make an inventory of how component based workflows can be built on this infrastructure. The infrastructural-technical challenge for work package 6 is to offer existing resources (such as those available via Nederlab, for example) in a way that is suitable for reuse by arbitrary text analysis tools.

The content related technical challenge for this work package is to make the existing (and newly digitized) resources suitable for computational analysis. This is particularly problematic when it comes to historical texts as digital resources. Tools such as Named Entity Recognition, syntactic parsers, event identification, and so forth are often trained and available for modern Dutch but do not yield adequate results for older Dutch. Older Dutch, with its large semantic differences and wildly varying spelling cannot be properly researched with the aid of a computer without the development of specialist tools. A central challenge is therefore to design solid parsers that are suitable for these historical textual sources and thus increase the affordances to find and query such resources.

Within the package's work explicit attention is paid to dissemination through documentation and instruction. Digital data and tools that are delivered will be provided with sufficient documentation and training material so that training opportunities are available for researchers at every level of "digital literacy". Making clear what is and what is not possible with the available data and what computer tools can and cannot do, is a third important challenge for work package 6 "Text".

This project focus on three different areas in the Humanities. gebieden die functioneren als voorloper voor andere disciplines en die samen alle vormen van data vertegenwoordigen: tekst, beeld, audio-visueel materiaal en gestructureerde data (databases).
De drie focusgebieden zijn:

  • Linguistics
  • Social-Economic History
  • Media Studies

In each area multidisciplinairy teams work of humanity scholars, comouter scientists and data providers work together in order to curate existing data and applications and link them together. The three focus areas will be supported by technology that is usefull for all of them.

Hoewel CLARIAH zich nu richt op drie gebieden, kunnen andere disciplines uit zowel de Geestes- als de Sociale wetenschappen ook hun data en applicaties aanmelden en mee laten doen in curatietrajecten.