HCS 7367 Speech Perception Dr. Peter Assmann Fall 2010 EARLY LANGUAGE ACQUISITION: CRACKING THE SPEECH CODE P.K. Kuhl NATURE REVIEWS NEUROSCIENCE 5 Nov 2004, 831-844 Mapping sounds Ladefoged (2004) estimated that the world s languages can be described by a set of basic sound units: about 600 consonants and 200 vowels Each language contains only about 40 sound segments (phonemes) used to differentiate all of the words in that language Mapping sounds Phonemes are abstract units that represent a class of sounds that share a set of features e.g., g, voiceless labiodental fricative /f/ The infant s task is to analyze and learn the sound structure of the phonemic categories of his/her native language before trying to learn to recognize and produce whole words. Evidence that phonetic learning precedes word learning Eimas et al. (1971) presented evidence that 1-4 month old infants, like adults, are more sensitive to sound contrasts that straddle phoneme boundaries (categorical perception) 1
Eimas et al. (1971) High-amplitude sucking procedure Eimas et al. (1971) Discriminability was measured by an increase in sucking rate to a second speech sound after habituation to the first sound. Recovery from habituation was greater when the two stimuli were from different adult phonemic categories than when they were from the same category. Eimas et al. (1971) Infants could discriminate differences between synthesized sounds along a /ba/ to /pa/ continuum; but only when the sounds were from different sides of the phoneme boundary (e.g., VOTs of 20 and 40 ms). They could not reliably discriminate withincategory differences (e.g. VOTs of 20 and 0 ms, or 60 and 80 ms). Phonetic feature detectors? Eimas et al. (1971) reported that infants, like adults, demonstrate categorical perception p for speech sounds. To explain this they proposed that infants are equipped with innate, speech-specific, phonetic feature detectors tuned to the range of phonetic contrasts used in the world s languages. Phonetic feature detectors? Feature detectors stimulated by language input are activated; other contrasts are lost. Initial state: infants discriminate all phonetic contrasts, but at a later age only those used in the native language are retained. Non-native contrasts (such as /r-l/ for Japanese infants) become harder to discriminate. However, comparable tests on animals produced similar perceptual boundaries for VOT 2
Monkeys trained to discriminate syllables on a /b-d-g/ continuum show peaks in discriminability near human boundaries Facts inconsistent with the idea of specialized speech-specific feature detectors: Categorical perception of speech sounds by animals Categorical perception of non-speech sounds Supports domain-general auditory analysis Kuhl (2004) Infants do not discriminate all physically equal acoustic differences equally well; they show heightened sensitivity to those that are useful in language acquisition Languages choose phonetic contrasts that capitalize on these heightened sensitivities Werker and Tees (1984) English-speaking infants can discriminate Hindi and Salish sounds at 6 months of age but this discrimination declines by 12 months of age. When infants approach their first birthday, their perceptual systems become attuned to the fine phonetic distinctions that serve to distinguish words in their native language. Infant speech discrimination http://www.youtube.com/watch?v=gsiwu_mhl4a Statistical learning What determines the change in phonetic perception between 9 and 12 months of age? Hypothesis: infants analyze the statistical distribution of sounds they hear in ambient language. 3
Maye, Werker & Gerken (2002) 6-8 month old infants exposed to an 8-step [da-ta] continuum Frequency distribution manipulated to simulate languages with either one- or twocategory voicing distinction (unimodal vs. bimodal) Enhanced discrimination of endpoints by bimodal group Head Turn Procedure (Saffran et al.) http://www.waisman.wisc.edu/infantlearning/participation.html Infants heard 2-min strings of computergenerated sounds without breaks, pauses or prosodic patterns to signal word boundaries Example: tibudopabikugolatudaropi Example: tibudopabikugolatudaropi Strings contained 4 pseudo-words ( tibuko, pabiku, golatu, daropi ) with transition probability of 1.0; other sequences occurred with lower probability When infants were later exposed to two of the pseudo-words (e.g. golatu and daropi) and two part-words formed by combining syllables that crossed word boundaries (e.g. tudaro ) they showed a novelty preference for the latter, suggesting they had detected the statistical regularities in the original set Possible mechanism for segmentation and word learning? Statistical learning of transition probabilities http://www.waisman.wisc.edu/infantlearning/current_research.html 4
Milestones of speech development (P. Kuhl, Nature Neuroscience Reviews, Nov 2004) Infant-directed Speech Limited vocabulary, concrete reference doggie, potty, choo-choo train, boo-boo Simple syntax: short sentences; more imperatives and questions Precise articulation Repetition Exaggerated pitch contours what a good boy! Infant-directed Speech Prosody melody of speech 1. approval heavily modulated, extensive rise/fall in fundamental frequency (pitch) 2. prohibition i ("NO!") 3. attention (shorter than approval, more rapid rise to maximum) 4. comfort (relatively unmodulated, downsweep in fundamental frequency) Infantdirected speech P.K. Kuhl (2004). Early language acquisition: Cracking the speech code. Nature reviews Neuroscience Volume 5 Nov 2004 831 Athena Vouloumanos and Janet F. Werker Listening to language at birth: evidence for a bias for speech in neonates Developmental Science 10(2): 159-164, 2007. Stuart Rosen and Paul Iverson Constructing adequate non-speech analogues: what is special about speech anyway? Developmental Science 10(2): 154-169, 2007. The nature and origin i of the human capacity for acquiring i language is not yet fully understood. Here we uncover early roots of this capacity by demonstrating that humans are born with a preference for listening to speech. Human neonates adjusted their high amplitude sucking to preferentially listen to speech, compared with complex non-speech analogues that controlled for critical spectral and temporal parameters of speech. These results support the hypothesis that human infants begin language acquisition with a bias for listening to speech. The implications of these results for language and communication development are discussed. V l d W k (2007) l i th t h t h ( ibl Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to be learned pre-natally. Possible avenues of research to address this crucial issue are proposed, based on a consideration of the distinctive acoustic properties of speech. 5
Rosen and Iverson (2007) Rosen and Iverson acknowledge that human infants prefer to listen to natural speech in comparison to sine-wave analogs. But they disagree over the interpretation, and caution against concluding that human neonates are biased to listen to speech. Why? Sine-wave speech Three time-varying sinusoids track the frequency of the lowest 3 formants. standard SWS omits voice pitch information the lowest tone (corresponding to F1) is perceived as giving rise to a weird intonation pattern (Remez and Rubin, 1984) 6