Acoustical Society of America - 161st Meeting Lay Language Papers

Lay Language Paper Index

Understanding casual speech: “Well he was like, 'what's wrong?!?'”

Authors:

Dan Brenner (University of Arizona)
Natasha Warner (University of Arizona; and MPI for Psycholinguistics Nijmegen, the Netherlands)
Mirjam Ernestus (Radboud University, Nijmegen, the Netherlands; and MPI for Psycholinguistics Nijmegen, the Netherlands)
Benjamin V. Tucker (University of Alberta, Edmonton, Canada)

Popular version of paper 5pSC17

This paper examines how listeners combine information about words, sentences, and sounds to understand very casual, even "sloppy" speech, in which sounds are not pronounced clearly.  Hearing speech is one of our most ordinary everyday activities, and much of the speech we hear is casual conversation, rather than careful speech like newscasting.  Casual speech often has sounds or whole syllables missing, and is not pronounced at all as one would expect. For example, if we listen to

[iso_chillin.wav]

it is difficult to make out what the person is saying.  Even though this is a string of three normal English words, it is difficult to understand it as having any meaning at all, or to say what sounds it includes.  However, when one hears this same recording in context,

[full_chillin.wav]

one can easily understand the whole sentence ("Er, uh, Tuesday night, uh, when we were chillin' in the spa…"), and the part that was unintelligible out of context sounds perfectly natural, although still very casual.  (More examples at http://www.u.arizona.edu/~nwarner/reduction_examples.html)  Everyone knows that “cannot” is usually pronounced “can’t”, and “want to” can sound like “wanna,” but “reduced speech” in casual conversation goes much further than that, and is far more common than we realize. For example, speakers in our recordings made "do you have time" sound like "dyutem," and "she'd be like" sound like "shelie." It is tempting to think of this kind of speech as “lazy," "sloppy,” or “slurred” speech, but researchers have found that it is very common in our everyday talk, and listeners are expert at deciphering this “reduced speech.”  In fact, reduced speech may be efficient rather than sloppy, conveying the meaning more quickly while also signaling to the listener that this is a casual conversation. Since we hear so much casual speech, we get a great deal of practice in combining the information across parts of sentences to sort out the meaning despite the missing sounds.

 The current paper examines what types of information listeners use to do that.      

Sometimes, very common ways speakers reduce speech can make one word sound very much like another.  For example, if you hear it out of context, this may sound like "we’re" or “we are”:

 [iso_bookstore.wav

But if you hear the same recording in context, you may be completely sure it is "we were" instead:

[full_bookstore.wav]

In our present study, we cut sections of speech like these out of everyday conversations to find out how listeners combine various kinds of information to decide between the two possible meanings: did she say “we’re”, or “we were”?

There are several sources of information listeners could use in combination to understand the meaning of speech. Of course we listen for specific acoustic features in the sounds, like the hissing quality in the sound “s”, or the vowel sound “oo” in "bookstore.”  But while we listen, we have strategies and linguistic knowledge that help us decode the sounds as we hear them. For example, we know that if the speaker is talking fast, we should expect sounds to be deleted or merged together, and we can use that to infer that some sounds are missing.  Thus, if the rest of the sentence consists of fast speech, we may infer that the part that sounds like "we’re" by itself is too long to be just "we’re" in such fast speech, so it must have had some sounds deleted, and therefore it might well be "we were" instead. We also know a lot about how words are put together in sentences, and what kinds of words are likely to come up, so if the word "yesterday" occurs in the sentence, we might infer that it's more likely to be "we were" because it happened in the past. That is, we can use the sounds of the words themselves, the speech rate of the surrounding speech, and the meaning of the rest of the sentence to understand reduced speech.  Our current work investigates how each of these sources of information contributes to our perception of phrases that can sound the same, like “we’re” vs. “we were” and "he’s" vs. "he was."

To do this, we cut members of these pairs (“he’s” ↔ “he was”; “we’re” ↔ “we were”; etc.) out of recordings of college students’ telephone conversations, and we varied the amount of context included with those target words in order to compare how listeners respond to the word in isolation, with some small bit of acoustic context to tell them how fast the speaker is talking (but not enough to give the meaning of surrounding words), or with the entire sentence.

“And we were outside the bookstore. ”

[iso_bookstore.wav]

[lim_bookstore.wav]

[full_bookstore.wav]

We found evidence that listeners can extract a lot of information from the sounds in the words themselves:  they are relatively accurate just from “we were,” “he’s,” etc.  They also do perceive the words more accurately if they hear the speech rate of the surrounding sounds, and even more accurately if they hear the whole sentence.  Thus, they use all three types of information.  However, if the words were actually “was” or “were,” listeners made use of the speech rate information to infer that sounds had been deleted, but they did not make much use of the words of the surrounding sentence.  That is, we do not primarily rely on the meaning of the context, but rather on realizing that the context is fast, casual speech.  When listeners know how fast the speaker is talking, this helps them to predict when there may be missing sounds.

The most surprising discovery we made concerns the so-called quotative like, as in, “well he was like, ‘what’s wrong?’”

[full_what’s wrong.wav]

Quotative ‘like’ was very common in our recordings, as is known of college-aged speakers. We compared “he was” before “like” (for example “he was like ‘what’s wrong?’”) with “he was” without “like” (for example “he was in the wedding”).  We found that listeners heard the “he was” differently, even if it was cut out so that they could not hear the “like.”  Because some young speakers use the phrase “he was like” so often, they reduce the “he was” more than usual even before they get to the “like.”  Making “he was” so short makes it sound more like “he’s.”  We find that listeners favor acoustic information from the sounds they hear over information about the meaning of the rest of the sentence, even though that acoustic information may be misleading because it is so reduced.

To summarize, this work shows that listeners are very skilled at combining information from the acoustics of the sounds of a word itself, the rate of surrounding speech, and the meanings of other words in the sentence to determine the meaning of very reduced, casual speech.  However, they do not do this by relying primarily on the meaning of the context, as people sometimes think.  Rather, they favor the information in the sounds they actually hear, and make inferences about what sounds the speaker may have left out in fast speech.