The Couranten Corpus: thousands of 17th century Dutch newspapers
Do you want to know first-hand what happened during the Disaster Year 1672? Are you curious in which years there were plagues of wolves? Or are you looking for the oldest advertisements for a medicine for all 'Qualen op de Borst, Benaeutheden, Hoest en Sinckingen'?
Then you can look in the Couranten Corpus, which was recently put online by the Instituut Nederlandse Taal, and for which CLARIAH Work Package 6: Text worked on the data curation.
The Couranten Corpus comprises the seventeenth-century Dutch newspapers found on Delpher: 13 newspapers with 109,532 articles and almost 19 million words. The newspapers have been carefully transcribed and corrected by about 300 volunteers, under the direction of Nicoline van der Sijs, using a crowdsource environment developed at the Meertens Institute. In 2020 the dataset received the Dutch Data Prize.
The metadata of the articles have been corrected by interns and INT staff. All articles have been given their own metadata and all duplicate newspapers have been removed from the dataset. New metadata have also been added, for instance on genre and news type (advertisements, domestic news, foreign news, etc.). The data have been automatically enriched linguistically, making them more searchable. The historical lexicon of the INT also helps finding spelling variants. The information in these papers is relevant for researchers of various disciplines, ranging from historians and historical linguists to literature scholars and art historians.
Nicoline van der Sijs
Professor of Historical Linguistics of Dutch in the digital world, Radboud University