You can raise your eyebrows, I don t mind: are monolingual and bilingual infants equally good at learning from the eyes region of a talking face?

Similar documents
Perceptual foundations of bilingual acquisition in infancy

Visual processing speed: effects of auditory input on

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Introduction to Psychology

On the Links Among Face Processing, Language Processing, and Narrowing During Development

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Mandarin Lexical Tone Recognition: The Gating Paradigm

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

Degeneracy results in canalisation of language structure: A computational model of word learning

Proceedings of Meetings on Acoustics

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Infants learn phonotactic regularities from brief auditory experience

Communicative signals promote abstract rule learning by 7-month-old infants

Lecture 2: Quantifiers and Approximation

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Without it no music: beat induction as a fundamental musical trait

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Usability Design Strategies for Children: Developing Children Learning and Knowledge in Decreasing Children Dental Anxiety

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Study Abroad Housing and Cultural Intelligence: Does Housing Influence the Gaining of Cultural Intelligence?

Cognition 112 (2009) Contents lists available at ScienceDirect. Cognition. journal homepage:

Summary results (year 1-3)

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

SLINGERLAND: A Multisensory Structured Language Instructional Approach

WHEN THERE IS A mismatch between the acoustic

The patient-centered medical

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Gestures in Communication through Line Graphs

Communication around Interactive Tables

Age Effects on Syntactic Control in. Second Language Learning

Contribution of facial and vocal cues in the still-face response of 4-month-old infants

Language Development: The Components of Language. How Children Develop. Chapter 6

Phonological encoding in speech production

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Speech Recognition at ICSI: Broadcast News and beyond

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

One major theoretical issue of interest in both developing and

ROLE OF SELF-ESTEEM IN ENGLISH SPEAKING SKILLS IN ADOLESCENT LEARNERS

SARDNET: A Self-Organizing Feature Map for Sequences

Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Abstract Rule Learning for Visual Sequences in 8- and 11-Month-Olds

Contact: For more information on Breakthrough visit or contact Carmel Crévola at Resources:

Learning By Asking: How Children Ask Questions To Achieve Efficient Search


Drug Addiction NROD66H3. (Friday 10:00-12:00 pm; BV361) COURSE DESCRIPTION

A Comparison of the Effects of Two Practice Session Distribution Types on Acquisition and Retention of Discrete and Continuous Skills

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

SOFTWARE EVALUATION TOOL

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Eyebrows in French talk-in-interaction

Evolution of Symbolisation in Chimpanzees and Neural Nets

Running head: DELAY AND PROSPECTIVE MEMORY 1

Gridlocked: The impact of adapting survey grids for smartphones. Ashley Richards 1, Rebecca Powell 1, Joe Murphy 1, Shengchao Yu 2, Mai Nguyen 1

ANNUAL REPORT SCHOOL OF COMMUNICATION SCIENCES & DISORDERS FACULTY OF MEDICINE

Genevieve L. Hartman, Ph.D.

YMCA SCHOOL AGE CHILD CARE PROGRAM PLAN

On-Line Data Analytics

IMPLEMENTING THE EARLY YEARS LEARNING FRAMEWORK

A Bayesian Model of Imitation in Infants and Robots

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Infants Perception of Intonation: Is It a Statement or a Question?

Eye Movements in Speech Technologies: an overview of current research

A Case-Based Approach To Imitation Learning in Robotic Agents

Problem-Solving with Toothpicks, Dots, and Coins Agenda (Target duration: 50 min.)

Phonological and Phonetic Representations: The Case of Neutralization

Conseil scolaire francophone de la Colombie Britannique. Literacy Plan. Submitted on July 15, Alain Laberge, Director of Educational Services

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Cross Language Information Retrieval

Evidence for Reliability, Validity and Learning Effectiveness

American Journal of Business Education October 2009 Volume 2, Number 7

Case of the Department of Biomedical Engineering at the Lebanese. International University

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

ANALYSIS OF USER BROWSING BEHAVIOR ON A HEALTH DISCUSSION FORUM USING AN EYE TRACKER WENJING PIAN, CHRISTOPHER S.G. KHOO & YUN-KE CHANG

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Guru: A Computer Tutor that Models Expert Human Tutors

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Special Education Program Continuum

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Sight Word Assessment

Rhythm-typology revisited.

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Accelerated Learning Online. Course Outline

CHAPTER 5: COMPARABILITY OF WRITTEN QUESTIONNAIRE DATA AND INTERVIEW DATA

Self-Supervised Acquisition of Vowels in American English

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

Non-Secure Information Only

End-of-Module Assessment Task

Florida Reading Endorsement Alignment Matrix Competency 1

Is Event-Based Prospective Memory Resistant to Proactive Interference?

Learning Methods in Multilingual Speech Recognition

Chapter 1 Notes Wadsworth, Cengage Learning

Universal contrastive analysis as a learning principle in CAPT

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Transcription:

ISCA Archive http://www.isca-speech.org/archive FAAVSP - The 1 st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing Vienna, Austria, September 11-13, 2015 You can raise your eyebrows, I don t mind: are monolingual and bilingual infants equally good at learning from the eyes region of a talking face? Mathilde Fort, Anira Escrichs, Alba Ayneto-Gimeno, Núria Sebastián-Gallés Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain mathilde.fort@upf.edu, anira.es@gmail.com, alba.ayneto@upf.edu, nuria.sebastian@upf.edu Abstract In this study we investigate whether paying attention to a speaker s mouth impacts 15- and 18-month-old infants ability to process visual information displayed in the talker s eyes or mouth region. Our results showed that both monolingual and bilingual 15 month-olds could detect the apparition of visual information appearing in the eyes/mouth region but only 15- month-old monolinguals and 18-month-old bilinguals could learn to anticipate its appearance in the eyes region. Overall, we demonstrate that specific language constrains (i.e., bilingualism) not only influences how infants selectively deploy their attention to different region of human faces, but also impact their ability to learn from them. Index Terms: attention, eyes, mouth, talking faces, early language acquisition, bilingualism, infancy 1. Introduction In most situations, speech perception occurs in a bimodal fashion: we both see the face of the speaker moving (i.e., visual speech) and hear the corresponding acoustic signal (i.e., auditory speech). A large body of evidence indicates that adults benefit from the presence of visual speech, especially under adverse conditions of speech perception. For instance, seeing the face of a person talking improves speech processing when the acoustic signal is noisy [1-9] or when it is produced in a second language [10-12]. The goal of this study is to investigate how much human infants, who are in the process of learning their native language, rely on audiovisual speech cues to decode talking faces. Seeing the moving face of the speaker enhances speech perception mostly because it carries information that is highly redundant with the corresponding auditory speech signal [13, 14]. When the acoustic stream is somewhat deteriorated, having access to the articulatory gestures of the speaker provides the complementary information that has been masked/distorted in the acoustic signal, enhancing its global saliency [7-9]. In line with this claim, studies show that adults, in adverse conditions of speech perception (second or foreign language [15]; noise [16-19]) focus their visual attention on the mouth region of a speaker, giving them direct access to these redundant audiovisual speech cues. Conversely, they prefer focusing on the upper part of the face when the speech signal is easy to process (e.g., native language, clear acoustic signal [15, 20]), allowing them to process socio-emotional information coming from the eyes region [15, 21]. From a language learning perspective, focusing on the mouth of the speaker is also a good strategy [22]. By having access to redundant audiovisual speech cues, you will be more able to encode its sensori-motor components, improving your ability to process it (speech perception) and imitate it (speech production). But when does the ability to process speech audiovisually arise in human development? Do infants rely on this bimodal redundancy to acquire their native language? In their first months of life, infants already show rudimentary skills to associate visual speech with its corresponding acoustic signal [23-31]. This capacity greatly improves over the course of infancy ([31-34]; see [35, 36] for reviews) and childhood [6, 37-44]. More specifically, recent studies have shown that infants also shift their attention from the eyes to the mouth of a speaker during their first year of life ([15, 45-48]). For instance, [15] showed that when watching a face talking in their native language, monolingual infants between 6- and 8-months of age prefer looking at the mouth region rather than at the eyes region of a speaker. At 12 months, monolinguals preference for the mouth over the eyes region starts decreasing while it remains constant when they look at a face talking in a foreign language. This suggests that, as adults, infants can selectively deploy their attention to the mouth of the speaker to improve their ability to process the speech signal. From a larger socio-communicative perspective, focusing on the mouth of your social partner can make it more difficult to detect and/or interpret essential socio-emotional information coming from the eyes region (e.g., eye-gaze, eyebrow movements). Several studies show that infants are sensitive to these cues from birth [49-52], but little is known about infants ability to process information coming from the eyes region of a face that is actually talking at the same time (but see [53]). This question is crucial because in everyday face-to-face situations, both of these sources of information are available in the signal. In the present study, we investigated whether this preference for the talker s mouth makes it more difficult for infants to detect and/or learn to anticipate the apparition of a visual event displayed in the talker s eyes/mouth region. In Experiment 1, we recorded Spanish and Catalan 15-month-old monolinguals eyegaze while they watched and listened to a speaker reciting short sentences. At the end of each sentence, she produced a nonspeech movement by either raising both of her eyebrows (Eyebrow-raise condition) or protruding her lips (Lipprotrusion condition). We predict that if infants in the Eyebrow-raise condition can detect the eyebrow movement, they should look more at the eyes region during the non-speech movement than infants in the Lip-protrusion condition. If they can anticipate its apparition, we should observe the same difference before the apparition of the non-speech movement, namely during the sentence. FAAVSP, Vienna, Austria, September 11-13, 2015 7

2.1. Methods 2.1.1. Participants 2. Experiment 1 Thirty-six 15-month-old (15M) healthy full-term monolingual infants participated (range: 14;15-16;0; mean: 15;12; 16 girls). Parents were administered a language questionnaire [54], indicating that all infants were raised in a monolingual environment and were exposed to Catalan or Spanish at least 85% of the time. Eighteen (8 girls) of them participated in the Eyebrow-raise condition, while the others were in the Lip-Protrusion condition (8 girls). The data from 20 more infants were excluded from the final analysis due to the total looking time to the screen being less than 50% (6), insufficient number of trials (8), and failure to calibrate (6). 2.1.2. Stimuli and recordings Sentences. The stimuli consisted of six-syllable long sentences produced in adult-directed-speech either in Spanish (N = 19) or in Catalan (N = 19) by a native bilingual Spanish/Catalan-speaking female speaker (e.g., in Spanish: Cada día canto. Everyday I sing). Each sentence lasted between 1180 and 2200 ms; the duration of Spanish and Catalan sentences did not statically differ (mean Spanish = 1899 ms, mean Catalan = 1962 ms, t < 1). Non-speech movements. We used two different types of nonspeech movements: either a movement in the eyes region, where the speaker raised her eyebrows (Eyebrow-raise condition) or a movement in the mouth region, where the speaker protruded her lips (Lip-protrusion condition). The timing (e.g, duration before and after the eyebrow raise/lip-protrusion peak) of each nonspeech movements was similar between languages (Catalan vs. Spanish) and condition (Eyebrow-raise/Lip-protrusion). Recordings. Both sentences and non-speech movements were recorded by a bilingual Spanish-Catalan female speaker. Sentences and non-speech movements were recorded separately. 2.1.3. Procedure The participants were tested in a quiet room while sitting on their parent s lap, about 60 cm away from a 1080x1920 screen. The stimuli were generated using MATLAB and Tobii Analytics Software Development Kit (Tobii Analytics SDK). The visual component of the video clip was displayed at the same resolution as the screen, at a frequency of 25 images/second, whereas the auditory component was displayed at a frequency of 44,100 Hz. Infants eye movements were recorded by a Tobii TX300 (Tobii Technology AB, Danderyd, Sweden) stand-alone eye tracker at a sampling rate of 300 Hz. Once the phase of calibration was completed, the test started. Each infant was randomly assigned to the Eyebrow-raise or the Lip-protrusion condition. Each trial began with a central attention-getter stimulus in order to keep the infants' attention on the screen. The experimenter pressed a key when the infant looked at the screen to launch the stimuli. Immediately after, infants saw the speaker produce a sentence. In each condition, the first sentence was always a dummy trial where the speaker introduced herself and smiled at the infant. At the end of the remaining 19 sentences, she either systematically raised her eyebrows (Eyebrow-raise condition) or protruded her lips (Lipprotrusion condition). The next trial could only start after the experimenter pressed the key again. Each trial lasted approximately for 2s. The order of sentences was counterbalanced across 10 experimental lists so that each participant perceived each sentence only once. The order of presentation of the stimuli was pseudo-randomized so that across participants, each sentence was displayed at a different position (either in the first third, the second third or the last third of the study). 2.2. Results and discussion 2.2.1. Definition of the Areas Of Interest (AOI) To determine which part of the talker s face infants were looking at, we divided each video into two areas of interest (AOIs): one around the eyes and one around the mouth. Because the speaker s position was almost constant across all the videos, we could use just one eyes and one mouth AOI for all the videos. To do so, we first defined each AOI for each of the video stimuli and then took the minimum and maximum values of these coordinates (in pixels from the screen). Using MATLAB, we then transformed the raw data collected by the eye-tracker (coordinate in pixels) by computing whether infants looked at the defined AOIs (eyes, mouth). Then, we computed the data we obtained to get one proportion of looking time to each defined AOI every 10% of the elapsed time of the video stimuli. This operation was performed separately for the sentences and the non-speech movements. 2.2.2. Statistical analyses We computed the proportion of total looking time (PTLT) for each AOI by dividing the total amount of time infants looked at each AOI by the time they spent looking at any other part of the face. As in [15], we then subtracted the PTLT obtained for the eyes AOI to the one obtained for the mouth AOI for each participant: a positive score indicates a preference for the eyes region, a negative one a preference for the mouth region. These difference scores were then averaged for the duration of the sentence and the non-speech movement separately. The resulting mean difference-scores were then submitted Student s t-tests (two-tailed) with Condition (Eyebrow-raise:, Lipprotrusion: ) as a between-participant factor. As shown in Table 1 (left panels) and Figure 1, results show a significant effect of Condition, indicating that 15Ms looked longer at the eyes region in the condition and at the mouth region in the condition, both during the presentation of the sentence (t(34) = 4.21, p <.05) and of the non-speech movement (t(34) = 20.9, p <.001). 2.2.3. Conclusions Thus, 15Ms in the condition were able to detect and learn to anticipate the apparition of the eyebrow-raise movement compared to those in the condition. In other words, they were able to disengage from the mouth region to get information and learn from the eyes region. Interestingly, very recent findings indicate that from 4 to 12 months of age, infants growing up in a bilingual environment exhibit a stronger preference for the mouth region of the speaker s talking face than their monolingual peers [45]. This FAAVSP, Vienna, Austria, September 11-13, 2015 8

result suggests that bilingual infants rely even more strongly on redundant audiovisual speech cues to learn both of their native languages. In Experiment 2, we thus tested whether bilinguals, at 15 months of age (15B) similarly detect and anticipate the apparition of information in the eyes/mouth region of a speaker. Table 1. Mean PTLT difference scores (from -1 to 1, 1 indicating a preference for the eyes region) as a function of Condition (Eyebrowraise:, Lip-protrusion: ), for the sentences and the non-speech movements in Experiment 1 (15-months, Monolinguals) Experiment 2 (15-months, Bilinguals) and Experiment 3 (18-months, Bilinguals). Standard errors from the mean are shown in parentheses. Exp1. 15M (N=18/cond) Exp2.15B (N=18/cond) Exp3.18B (N=12) Condition Sentences Nonspeech movements.04.18 (.10) -.26 -.50 -.25 -.02 (.12) -.17 -.52.13 (.08).27 (.10) analysis due to the total looking time to the screen being less than 50% (3), insufficient number of trials (6) failure to calibrate (5), and experimental error (3). 3.1.2. Stimuli recordings and procedure They were identical to the ones used in Experiment 1. 3.2. Results and discussion We performed the same computations and statistical analyses as in Experiment 1. As shown in Table 1 and Figure 1 (central columns), the effect of Condition was significant only during the presentation of the non-speech movement (t(35) = 10.5, p <.005) but not during the sentence (t < 1). Just as monolinguals in Experiment 1, 15Bs could systematically detect the eyebrow movement at the end of each sentence in the condition. However, as opposed to 15Ms (Experiment 1), 15Bs did not show any sign of anticipation of the eyebrow movement in the condition as compared to the condition. In Experiment 3, we tested whether older (18-monthold) bilingual infants manage to anticipate the apparition of the eyebrow-raise in the condition. 4. Experiment 3 4.1. Methods 4.1.1. Participants Twelve 18-month-old (18B) healthy, full-term bilingual infants participated (range: 17;15-19;0; mean: 17;21; 6 girls). The data from 3 more infants were excluded from the final analysis due to the total looking time to the screen being less than 50% (2) and failure to calibrate (1). Figure 1. Mean PTLT difference scores as a function of Condition ( : blue line, : pink line) for Experiment 1 (15M, dotted lines, green circles), Experiment 2 (15B, continuous lines, orange diamonds) and Experiment 3 (18B, continuous line, brown squares) over the duration of the sentence (left panel) and of the non-speech movement (right panel). Positive scores indicate a preference for the eyes region over the mouth region. Error bars represent standard errors from the mean. 3.1. Methods 3.1.1. Participants 3. Experiment 2 Thirty-six 15-month-old (15B) healthy full-term infants participated (range: 14;15-16;0; mean: 15;17; 17 girls). These infants were raised in a Spanish-Catalan bilingual environment, meaning that they were exposed to two languages: one dominant and one second language for at least 20% of the time. Eighteen (8 girls) of them participated in the Eyebrow-raise condition, while the others were in the Lip-protrusion condition (9 girls). The data from 17 more infants were excluded from the final 4.1.2. Stimuli and procedure The stimuli and procedure were the same as in Experiment 1 and 2, except that participants were only tested in the condition. 4.2. Results and discussion We performed the same computations of the raw data collected by the eye-tracker as in Experiment 1 and 2. The results are displayed in Table 1 (right columns) and Figure 1. To test whether 18Bs behave like 15Ms or 15Bs in the Eyebrowraise condition, we ran separate t-tests, both for the sentence and the non-speech movements. Analyses revealed that 18Bs behaved differently from the 15Bs during the sentence (t(28) = 7.7, p <.01) but did not for the non-speech movement (t(28) = 3.09, p =.10). Interestingly, they did significantly differ from the 15Ms results when comparing either the sentence or the non-speech movement (both t < 1). These results clearly show that 18Bs could, as 15Ms, both detect and learn to anticipate the eyebrow-raise of the speaker. 5. General discussion In this study, we first found that all infants could detect the non-speech movement when it was displayed in the eyes or the FAAVSP, Vienna, Austria, September 11-13, 2015 9

mouth region of a talking face. However, we observe a different pattern of results at 15 months of age during the presentation of the sentence. For the 15Ms, we found that, before the apparition of the non-speech movement, they increased their looking time to the eyes region or to the mouth region (Eyebrow raise: vs. Lip-protrusion:, respectively) as a function of its location. For the 15Bs, however, no effect of location of the non-speech movement was found. For both conditions, they remained focused on the mouth while the speaker was talking and only increased their looking time to the eyes region at the end of the sentence (Fig. 1). In other words, these results showed that only 15M but not 15B could predict the apparition of the eyebrow movement in the condition as opposed to the condition. Only at 18 months did bilingual infants show some evidence that they actually learned that the speaker recurrently raised her eyebrows at the end of each sentence. Of course, given that 15-month-olds had a general preference for the mouth region of the speaker, we cannot state, on the basis of these results, whether 15Bs could learn to anticipate the Lipprotrusion movement in the condition or not. That being said, the data clearly demonstrate that by selectively choosing to focus on the mouth region of the speaker (to probably improve language processing, see for instance [33, 36]) 15Bs were unable to predict the apparition of the eyebrow-raise movement in the condition. To summarize, we demonstrated that infants, depending on their linguistic background, do not equally process information coming from different locations of a talking face. Consequently, this study provides new insight regarding the study of early bilingualism and its impact on cognitive and emotional development (see [55-57] for reviews). First it provides data in line with recent work showing that when seeing talking faces, Catalan-Spanish bilingual infants exhibit greater mouth preference than Catalan or Spanish monolinguals [45]. This increased preference for the mouth is probably due to the fact that bilingual infants rely more on redundant audiovisual speech cues than their monolingual peers. To learn two languages simultaneously, as opposed to one, bilinguals have to detect and remember the distinct language-specific features that belong to each of them. Given that inter-sensorial redundancy has been shown to facilitate learning processes in infancy [55, 57-59] having access to the audiovisual redundant speech signal might facilitate the building and memorizing of bilinguals duallanguage system. This claim is consistent with other findings, indicating that bilingual infants are better at detecting and memorizing differences between two languages on the basis of visual speech information alone [60], even when both of these languages are unknown [61]. In the present paper, we go one step further by showing that at 15 months of age, bilingual infants do not equally process information coming from the eyes of the speaker compared to monolinguals. It is thus possible that, at least at this age, bilinguals take information coming from the eyes region of a talking face less into account. It is worth mentioning here that independent evidence from our laboratory indicates that early bilingualism also modulates attention to the eyes and mouth area of non-linguistic communicative faces (Ayneto-Gimeno & Sebastián-Gallés, in prep). Further research is thus needed in order to understand whether early bilingualism impacts socioemotional development, but also the ability to decode prosodic information [62]. To conclude, this study demonstrates the impact of selective attention and language specific experience (e.g., early bilingualism) on infant s ability to learn from social entities they encounter on a daily-basis (i.e., audiovisual talking faces). We are now running the same study with younger infants, in order to better understand the developmental trajectory of this phenomenon. 6. Acknowledgements Research for this article was funded by the European Research Council grant (ERC-2012-ADG 323961). We thank Anna Basora for recording the stimuli and Luca Bonatti for his useful comments. 7. References [1] W. H. Sumby and I. Pollack, "Visual Contribution to Speech Intelligibility in Noise," Journal of the Acoustical Society of America, vol. 26, pp. 212-215, 1954. [2] C. Benoît, et al., "Effects of phonetic context on audio-visual intelligibility of French.," Journal Of Speech And Hearing Research, vol. 37, pp. 1195-1203, 1994. [3] C. A. Binnie, et al., "Auditory and visual contributions to the perception of consonants," Journal Of Speech And Hearing Research, vol. 17, pp. 619-630, 1974. [4] N. P. Erber, "Interaction of audition and vision in the recognition of oral speech stimuli.," Journal Of Speech And Hearing Research, vol. 12, pp. 423-425, 1969. [5] M. Fort, Spinelli, E, Savariaux, C, Kandel, S, "The word superiority effect in audiovisual speech perception," Speech Communication, vol. 52 pp. 525-532, 2010. [6] M. Fort, Spinelli, E., Savariaux, C., Kandel, S., "Audiovisual vowel monitoring and the word superiority effect in children," International Journal of Behavior and Development, vol. In Press, 2012. [7] A. MacLeod and Q. Summerfield, "Quantifying the contribution of vision to speech perception in noise.," British Journal of Audiology, vol. 21, pp. 131-141, 1987. [8] L. a. Ross, et al., "Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments.," Cerebral Cortex, vol. 17, pp. 1147-53, 2007. [9] J. L. Schwartz, et al., "Seeing to hear better: evidence for early audio-visual interactions in speech identification.," Cognition, vol. 93, pp. B69-78, 2004. [10] D. Reisberg, et al., "Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli," in Hearing by Eye: The Psychology of LipReading, B. Dodd and R. Campbell, Eds., ed Londres: Erlbaum Associates, 1987, pp. 97-113. [11] J. Navarra and S. Soto-Faraco, "Hearing lips in a second language: visual articulatory information enables the perception of second language sounds.," Psychological Research, vol. 71, pp. 4-12, 2007. [12] P. Arnold and F. Hill, "Bisensory augmentation : A speechreading advantage when speech is clearly audible and intact," British Journal of Psychology, vol. 92, pp. 339-355, 2001. [13] J. L. Schwartz, et al., "Ten years after Summerfield: A taxonomy of models for audio-visual fusion in speech perception.," in Hearing by Eye II: Advances in the psychology of speechreading and audiovisual speech, R. Campbell, et al., Eds., ed Hove: Psychology Press, 1998, pp. 85-108. [14] Q. A. Summerfield, "Some preliminaries to a comprehensive account of audio-visual speech perception," in Hearing by Eye: The Psychology of LipReading, ed Londres: Erlbaum Associates, 1987, pp. 3-51. [15] D. J. Lewkowicz and A. M. Hansen-Tift, "Infants deploy selective attention to the mouth of a talking face when learning speech," Proc Natl Acad Sci U S A, vol. 109, pp. 1431-6, Jan 31 2012. FAAVSP, Vienna, Austria, September 11-13, 2015 10

[16] E. Vatikiotis-Bateson, et al., "Eye movement of perceivers during audiovisual speech perception.," Perception & Psychophysics, vol. 60, pp. 926-940, 1998. [17] C. Benoît, et al., "Which components of the face do humans and machines best speechread?," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds., ed: Berlin: NATO-ASI Series 150 Springer, 1996. [18] C. R. Lansing and G. W. McConkie, "Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences," Percept Psychophys, vol. 65, pp. 536-52, May 2003. [19] J. N. Buchan, et al., "The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception," Brain Res, vol. 1242, pp. 162-71, Nov 25 2008. [20] I. T. Everdell, et al., "Gaze behaviour in audiovisual speech perception: asymmetrical distribution of face-directed fixations," Perception, vol. 36, pp. 1535-45, 2007. [21] J. N. Buchan, et al., "Spatial statistics of gaze fixations during dynamic face processing," Soc Neurosci, vol. 2, pp. 1-13, 2007. [22] K. G. Munhall and E. K. Johnson, "Speech perception: when to put your money where the mouth is," Curr Biol, vol. 22, pp. R190-2, Mar 20 2012. [23] M. Patterson, "Matching phonetic information in lips and voice is robust in 4.5-month-old infants," Infant Behavior and Development, vol. 22, pp. 237-247, 1999. [24] M. L. Patterson and J. F. Werker, "Infants' ability to match dynamic phonetic and gender information in the face and voice.," Journal of Experimental Child Psychology, vol. 81, pp. 93-115, 2002. [25] M. L. Patterson and J. F. Werker, "Two-month-old infants match phonetic information in lips and voice," Developmental Science, vol. 6, pp. 191-196, 2003. [26] P. K. Kuhl and A. N. Meltzoff, "The bimodal perception of speech in infancy.," Science, vol. 218, pp. 1138-1141, 1982. [27] D. Burnham and B. Dodd, "Auditory-visual speech perception as a direct process: The McGurk effect in infants and across languages," in Speechreading by humans and machines. vol. 150, D. G. Stork and M. E. Hennecke, Eds., ed: Springer-Verlag, 1996, pp. 103-114. [28] D. Burnham and B. Dodd, "Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect.," Developmental Psychobiology, vol. 45, pp. 204-220, 2004. [29] H. H. Yeung and J. F. Werker, "Lip movements affect infants' audiovisual speech perception," Psychol Sci, vol. 24, pp. 603-12, May 2013. [30] E. Kushnerenko, et al., "Electrophysiological evidence of illusory audiovisual speech percept in human infants," Proc Natl Acad Sci U S A, vol. 105, pp. 11442-5, Aug 12 2008. [31] F. Pons, et al., "Narrowing of intersensory speech perception in infancy," Proc Natl Acad Sci U S A, vol. 106, pp. 10598-602, Jun 30 2009. [32] L. D. Rosenblum, et al., "The McGurk effect in infants," Perception & Psychophysics, vol. 59, pp. 347-357, 1997. [33] T. Teinonen, et al., "Visual speech contributes to phonetic learning in 6-month-old infants," Cognition, vol. 108, pp. 850-5, Sep 2008. [34] D. J. Lewkowicz, et al., "Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience," J Exp Child Psychol, vol. 130, pp. 147-62, Feb 2015. [35] D. J. Lewkowicz, "Early experience and multisensory perceptual narrowing," Dev Psychobiol, vol. 56, pp. 292-315, Feb 2014. [36] D. J. Lewkowicz and A. A. Ghazanfar, "The decline of crossspecies intersensory perception in human infants," Proc Natl Acad Sci U S A, vol. 103, pp. 6771-4, Apr 25 2006. [37] K. Sekiyama and D. Burnham, "Impact of language on development of auditory-visual speech perception," Developmental Science, vol. 11, pp. 306-20, 2008. [38] S. Dupont, et al., "A study of the McGurk effect in 4-and 5- year-old French Canadian children," ZAS Papers in Linguistics, vol. 40, pp. 1-17, 2005. [39] D. J. Lewkowicz and R. Flom, "The audiovisual temporal binding window narrows in early childhood," Child Dev, vol. 85, pp. 685-94, Mar-Apr 2014. [40] D. W. Massaro, "Children's perception of visual and auditory speech.," Child Development, vol. 55, pp. 1777-1788, 1984. [41] D. W. Massaro, et al., "Developmental changes in visual and auditory contributions to speech perception.," Journal of Experimental Child Psychology, vol. 41, pp. 93-113, 1986. [42] N. S. Hockley and L. Polka, "A developmental study of audiovisual speech perception using the McGurk paradigm," Journal of the Acoustical Society of America, vol. 96, p. 3309, 1994. [43] S. Jerger, et al., "Developmental shifts in children's sensitivity to visual speech: a new multimodal picture-word task.," Journal of Experimental Child Psychology, vol. 102, pp. 40-59, 2009. [44] H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976. [45] F. Pons, et al., "Bilingualism modulates infants' selective attention to the mouth of a talking face," Psychol Sci, vol. 26, pp. 490-8, Apr 2015. [46] E. J. Tenenbaum, et al., "Increased focus on the mouth among infants in the first year of life: A longitudinal eye-tracking study," Infancy, vol. 18, pp. 534-553, Jul 2013. [47] E. Kushnerenko, et al., "Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour," Eur J Neurosci, vol. 38, pp. 3363-9, Nov 2013. [48] C. Kubicek, et al., "Cross-modal matching of audio-visual German and French fluent speech in infancy," PLoS One, vol. 9, p. e89275, 2014. [49] T. Farroni, et al., "Eye contact detection in humans from birth," Proc Natl Acad Sci U S A, vol. 99, pp. 9602-5, Jul 9 2002. [50] T. Farroni, et al., "The perception of facial expressions in newborns," Eur J Dev Psychol, vol. 4, pp. 2-13, Mar 2007. [51] A. Senju and G. Csibra, "Gaze following in human infants depends on communicative signals," Curr Biol, vol. 18, pp. 668-71, May 6 2008. [52] A. Senju, et al., "Understanding the referential nature of looking: infants' preference for object-directed gaze," Cognition, vol. 108, pp. 303-19, Aug 2008. [53] D. Stahl, et al., "Eye contact and emotional face processing in 6-month-old infants: advanced statistical methods applied to eventrelated potentials," Brain Dev, vol. 32, pp. 305-17, Apr 2010. [54] L. Bosch and N. Sebastian-Galles, "Early language differentiation in bilingual infants," in Trends in bilingual acquisition, J. C. F. Genesee, Ed., ed Amsterdam, The Netherlands: John Benjamins, 2001, pp. 71-93. [55] A. Costa and N. Sebastian-Galles, "How does the bilingual experience sculpt the brain?," Nature Reviews Neuroscience, vol. 15, pp. 336 345, 2014. [56] J. F. Werker, "Perceptual foundations of bilingualism acquisition in infancy," Annals of the New-York Academy of Sciences, vol. 0077-8923, pp. 50-61, 2014. [57] E. Bialystok and F. I. Craik, "Cognitive and linguistic processing in the bilingual mind," Courrent Directions In Psychological Science, vol. 70, pp. 636-644, 2010. [58] L. E. Bahrick, et al., "Intersensory redundancy facilitates discrimination of tempo in 3-month-old infants," Dev Psychobiol, vol. 41, pp. 352-63, Dec 2002. [59] L. E. Bahrick and R. Lickliter, "Intersensory redundancy guides attentional selectivity and perceptual learning in infancy," Dev Psychol, vol. 36, pp. 190-201, Mar 2000. [60] W. M. Weikum, et al., "Visual language discrimination in infancy.," Science, vol. 316, p. 1159, 2007. [61] N. Sebastian-Galles, et al., "A bilingual advantage in visual language discrimination in infancy," Psychol Sci, vol. 23, pp. 994-9, Sep 1 2012. [62] N. Esteve-Gibert, et al., "Nine-month-old infants are sensitive to the temporal alignment of prosodic and gesture prominences," Infant Behav Dev, vol. 38, pp. 126-9, Feb 2015. FAAVSP, Vienna, Austria, September 11-13, 2015 11