SERPENS: SEaRch PEst and Nuisance Species
Contextual search and analysis of pest and nuisance species through time in the KB newspaper collection, particularly focused on the perception of Mustelid species like polecats, martens and stoats.
Historical newspapers are a fascinating source of information for historical ecologists to study interactions between humans and animals through time and space. Digitized newspaper archives are particularly interesting to analyze because of their breadth and depth and easy access. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis still remains cumbersome and laborious.
In SERPENS, we performed experiments to automate query expansion and categorization for the perception of alleged pest and nuisance animal species mentioned in digitized newspapers from a subset of the KB newspaper collection (1800-1940). We particularly focused on the perception of Mustelid species like polecats, martens and stoats. For animal taxonomy we made use of ATHENA; for query expansion we used lexicons; for categorization of newspaper articles we trained a Support Vector Machine model.
Our results indicate that – with a rather limited number of training examples – we can fairly easily distinguish newspaper articles that are about animal species from those that are not (~92% accuracy) and between different types of subcategories of newspaper articles (e.g., articles about material damage caused by pest species, non-material damage, pest control and hunting; ~84% accuracy). Automated procedures like this can greatly enhance the usability of large digitized collections, not only for historical ecology but also for other fields in the natural sciences and humanities.