Comparison of Different Modeling Techniques for Robust Prototype Matching of Speech Pitch-Contours
In verbal interactions between humans (HHI), and between humans and computers(HCI), a multitude of information is being transmitted. Speech conveys, besides the pure textualinformation, additional details regarding the speaker’s feelings, believes, and social relations.Intonation reveals functional details about the speakers’ communicative relation and theirattitude towards the ongoing dialogue, such as affirmation, disagreement or the wish of turn-taking. Since the intonation of full words is influenced by semantic and grammatical information,it is advisable to rather investigate the intonation and corresponding functional meaning of so-called discourse particles (DPs) such as “hm” or “uhm”. They cannot be inflected but can beemphasized, and the interlocutors are able to differentiate the functional meanings of DPs solelyfrom their intonation. To take advantage of this relation in automatic dialogue processing, thegoal of this investigation is to enable an automatic classification of the functional meaning ofthe DP “hm” from its intonation. The acoustic intonational curve can be represented using thepitch-values extracted from the raw speech material. Three different classification methods willbe presented and compared to evaluate the best one. Furthermore, to ensure the reliability ofthe classifier both for HHI and HCI, and to gain more training data, cross-validating tests werecarried out on two publicly available datasets containing HCI and HHI.