DUID: Uniform Information Density for Dutch word order variation
This project investigates whether the UID hypothesis holds for Dutch, and thus whether this hypothesis can account for preferences for specific word order choices in Dutch.
About the project
The Uniform Information Density (UID) hypothesis states that language is at its most efficient when information is spread evenly throughout an utterance. UID has been shown to be a strong predictor of various kinds of grammatical variation but these studies are usually limited to word omission phenomena. However, UID also has the potential to explain why certain word orders are preferred in word order alternations. When multiple orders are possible, speakers may change their word order to keep the information density of the utterance more uniform. This might explain why certain word orders are preferred even when choosing that word order does not appear to contribute any meaning to an utterance. In Bloem (2016), I showed that a simple measure of UID is a significant predictor of Dutch verbal cluster order, as in:
1. ... dat zij dat heeft gezegd.
2. ... dat zij dat gezegd heeft.
However, there are many other word order alternations in Dutch, such as PP extraposition, object scrambling, and the dative alternation. For most of these, there is debate on which order is the default/unscrambled position. This prompts my more general research question: do word orders that are considered ‘default’ in the Dutch linguistics literature have a more uniform information density than those that are considered ‘marked’?
For this project, Colibri Core will be used. This is a CLARIAH component for efficiently computing n-gram, skipgram and flexgram frequencies, which form the basis of such language models. It will be used to compute the uniformity of information density of constructions with alternate word orders, and to compare these results to findings from the Dutch linguistics literature to see whether the orders that are considered unmarked or ‘default’, also have a more uniform information density.