Dataset Discourse Management Constructions in Wikipedia Talk Pages

This dataset forms the basis to the paper Gillmann, Melitta (to appear): Allostructions and Stancetaking: A Corpus Study of the German Discourse Management Constructions Wo/wenn wir gerade/schon dabei sind. In: Cognitive Linguistics.

Drawing on a corpus study of Wikipedia Talk pages, the paper presents a case study of German discourse management markers such as wo wir gerade dabei sind ‘Speaking of which’ or wenn wir schon dabei sind ‘while we’re at it’. Based on the dataset, the observed frequencies of the filler items were compared to the statistically expected ones, using Hierarchical Configural Frequency Analysis and Distinctive Collexeme Analysis. Those measures revealed that there are two different collocational types, namely wo wir/ich gerade bei NP sind/bin ‘as we are/I am just at NP’ and wenn wir/du schon bei NP sind/bist ‘as we/you are already at NP’.

Both serve as discourse management markers, topic orientation markers in particular, whose purpose it is to shift the topic. They involve the same fixed pattern, combining the same categorical slots. However, they diverge in collocational preferences, which reflect functional differences.

The raw dataset consists of a table, with each row containing one corpus occurrence as well as the lexical filler items of the categorical slots that recurred in both patterns. Those filler items comprise
a)    the connector slot with the connectors wo or wenn,
b)    the subject slot that in the vast majority of the cases contains a personal pronoun,
c)    the adverb slot,
d)    the preposition slot,
e)    lemmatas occurring in the noun slot that is embedded in a prepositional phrase,
f)    punctuation marks.
These variables are the basis for the collocation measures presented in the paper.


