Discourse-mediation of the mapping between language and the visual world: Eye movements and mental representation

Discourse-mediation of the mapping between language and the visual world: Eye movements and mental representation Altmann & Kamide (2009) Presented by Ambika Kirkland

The Visual World Paradigm Non-linguistic information can interact with language processing Example: visual information Visual world can shape interpretation of language & v.v. Observing this interaction can give clues about prediction bow

The Visual World Paradigm The man/girl will ride/taste the motorbike/carousel/beer/sweets

The Visual World Paradigm

The Visual World Paradigm Aim: Investigate when and how language intersects with the visual context When: Almost immediately, in some cases (e.g. Kamide, Altmann & Haywood (2003) Suggests an incremental language processor How: Eye movements are mediated by the cognitive mechanisms of both language processing and visual processing and also reflect how these processes interact

The Visual World Paradigm Representations of events are internally complex Initial state Final state Some change between states Denise celebrated her birthday with her friends.

The Visual World Paradigm Comprehending a sentence + static visual context requires keeping track of multiple representations The state (e.g., initial, final, intermediate) represented by the static visual scene Changes to the participants as the language unfolds Representations of the participants in the static scene vs. the unfolding event described by the language

The Visual World Paradigm Many studies used static scenes Primarily looked at the likelihood of fixating on various objects as words/sentences are presented Recent studies used animations or multiple frames Eye movements measured in reference to a static scene seen after the animations Pro: Evidence that eye movements can map to dynamically updated visual situations Con: In these studies, the situation was updated based on visual information from the scenes/animations themselves Can linguistic information alone update the representation of the visual situation?

The Blank Screen Paradigm A version of the Visual World Paradigm Visual stimuli presented and then removed before the auditory presentation of words or sentences. Purpose: Without the visual stimulus directly present, any eye movements must be related to internal representations

The Blank Screen Paradigm

The Blank Screen Paradigm The man will eat the cake/the woman will read the newspaper

The Blank Screen Paradigm

Predictions Linguistic information alone can update the representation of objects/participants in an event (1) Paul will give Jeanne a bouquet of flowers. (2) She will admire the flowers for a moment, before putting them in a vase.

Predictions When participants need to retrieve the representation of an object, they should look at the location suggested by linguistic context even if it conflicts with the actual location of the object in the scene.

Experiment 1 Purpose: Determine whether eye movements are mediated by the visual scene or by the content of mental representations that can be at least partially dissociated from the perceptual properties of the objects in the scene General setup Visual scene Description of how an object will be moved by a protagonist in the scene (spoken)

Experiment 1 1. The woman will put the glass on the table. Then, she will pick up the bottle and pour the wine carefully into the glass. (Moved) 2. The woman is too lazy to put the glass on the table. Instead, she will pick up the bottle and pour the wine carefully into the glass. (Unmoved)

Experiment 1 Questions Where do participants eyes move when they anticipate where pouring will occur (i.e., after pour )? Where do participants look after hearing glass? If this varies according to the position of the glass suggested by the sentence, it implies that the mental representation of the visual situation was updated by linguistic information.

Methods Subjects 32 native speakers of English from the University of York Stimuli 16 pictures like the previous example. Each picture paired with two sets of sentences Moved condition (e.g., The woman will put the glass on the table ) Unmoved condition (e.g. The woman is too lazy to put the glass on the table ) Same target for all conditions (e.g.,...she will pick up the bottle and pour the wine carefully into the glass. ) 32 fillers

Methods The woman is too lazy to put the glass on the table. Instead, she will pick up the bottle and pour the wine carefully into the glass. Position of the glass based on linguistic context in the unmoved condition

Methods Procedure Participants are seated in front of a 17 display wearing a head-mounted eye tracker Passive listening task Participants simply look at pictures and then listen to an auditory stimulus. The visual scene is presented 1000 ms before the onset of the auditory stimulus and remains during the presentation of the auditory stimulus.

Methods 1 2 + 3 4

Results Fig. 2. Percentage of trials in Experiment 1 with fixations on the regions of interest corresponding to the table and the glass in the moved and unmoved conditions during she will pick up the bottle and pour the wine carefully into the glass or its equivalent across trials. The percentages reflect the proportion of trials on which each of the regions of interest was fixated at each moment in time, and were calculated at each successive 25 ms from the synchronization point. See the main text for a description of the resynchronization process. The region of the graph corresponding to the target noun phrase the glass is highlighted.

Results Anticipatory eye movements (at the wine ) Marginally higher probability of fixating on the table (or the corresponding location in other scenes) in the moved condition. Probability of fixating on the glass (or the corresponding location in other scenes) did not vary by condition. Main effect of object participants looked at the glass more overall. Eye movements at ( the glass ) Similar pattern as above, but the difference between the probability of looking at the table in the moved vs unmoved conditions was greater.

Discussion Linguistic information does seem to mediate eye movements. This suggests that the representation of the event is being updated based on linguistic context. But why do participants look at the glass so much more frequently overall? Perhaps there is a conflict between the mental representation that's been updated by the unfolding language and the static visual scene? The position of the glass in the static scene may be more salient How to address this issue?

Experiment 2 Purpose: Determine whether removing the conflicting visual information during the presentation of the auditory stimulus will attenuate the bias toward looking at the glass (or corresponding object). Predictions: Bias toward the glass should disappear. The effect of linguistic context should still be observed. Looks to the glass should be more frequent in the unmoved condition and looks to the table should be more frequent in the moved condition.

Experiment 2

Experiment 2 1. The woman will put the glass on the table. Then, she will pick up the bottle and pour the wine carefully into the glass. (Moved) 2. The woman is too lazy to put the glass on the table. Instead, she will pick up the bottle and pour the wine carefully into the glass. (Unmoved)

Methods Subjects 34 native speakers of English from York University Stimuli As in Experiment 1 but with different filler material Procedure As in Experiment 1 except that scenes were replaced with a light gray screen during the presentation of the auditory stimulus.

Methods 1 2 + 3 4 5

Results Anticipatory movements at the wine No significant differences Anticipatory movements at the wine carefully into Higher probability of looking at where the glass had been in the unmoved condition Higher probability of looking at where the table had been in the moved condition. No overall bias toward looking at the glass or the table Eye movements during and at the offset of the glass Same pattern as for the wine carefully into

Results

Discussion Eye movements seem to have been driven by a representation that was not reliant on the physical position of objects. Eye movements appear to have been mediated by linguistic information. However: What about prediction? This time there were no anticipatory eye movements until later in the sentence. Why would removing the visual scene attenuate anticipatory glances if they're mediated by the language-dependent mental representation?

Questions/Criticisms Is this really evidence of prediction? Participants do seem to anticipate the locations where aspects of the unfolding event will occur... The blank screen addresses the issue of the visual scene driving anticipation......yet removing the scene reduces anticipatory eye movements What if we introduce objects that were never in the scene? How exactly is information about an object's location represented and how does language update this representation? Situated vision: spatial pointer Embodied cognition: simulations Affordances of objects Ultimately no clear account.

Future Directions Investigate underlying mechanisms Plenty of speculation, no clear explanation E.g., would manipulating the plausibility of locations where the object could be move have any effect? Take better advantage of this setup to look at prediction more explicitly

Discussion Questions To what extent do these results seem to actually suggest prediction? Alternative ways to explain the data? Questions/concerns about the methods? Other ideas for followup studies? Which account of how location is represented/updated seems most plausible? (e.g., spatial pointer vs. simulation vs. affordances)