INF5820/INF9820: Examples of exam questions

Similar documents
Miscommunication and error handling

Lecture 10: Reinforcement Learning

Using dialogue context to improve parsing performance in dialogue systems

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

LEGO MINDSTORMS Education EV3 Coding Activities

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Speech Recognition at ICSI: Broadcast News and beyond

November 2012 MUET (800)

Tutor Guidelines Fall 2016

Modeling function word errors in DNN-HMM based LVCSR systems

Student Perceptions of Reflective Learning Activities

STAT 220 Midterm Exam, Friday, Feb. 24

CS Machine Learning

Interpreting ACER Test Results

FCE Speaking Part 4 Discussion teacher s notes

Tour. English Discoveries Online

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Shockwheat. Statistics 1, Activity 1

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

CEFR Overall Illustrative English Proficiency Scales

Some Principles of Automated Natural Language Information Extraction

Eye Movements in Speech Technologies: an overview of current research

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Feedback, Marking and Presentation Policy

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Mandarin Lexical Tone Recognition: The Gating Paradigm

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Modeling function word errors in DNN-HMM based LVCSR systems

The Common European Framework of Reference for Languages p. 58 to p. 82

Disambiguation of Thai Personal Name from Online News Articles

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Calibration of Confidence Measures in Speech Recognition

How to learn writing english online free >>>CLICK HERE<<<

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

SIE: Speech Enabled Interface for E-Learning

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

The Foundations of Interpersonal Communication

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

A BEGINNERS GUIDE TO SUCCESSFUL ONLINE SURVEYS

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Fundraising 101 Introduction to Autism Speaks. An Orientation for New Hires

Modern Project Management. Brendan Bartels

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Researcher Development Assessment A: Knowledge and intellectual abilities

How do adults reason about their opponent? Typologies of players in a turn-taking game

2 nd grade Task 5 Half and Half

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Understanding and Supporting Dyslexia Godstone Village School. January 2017

A Case Study: News Classification Based on Term Frequency

12- A whirlwind tour of statistics

Learning Semantic Maps Through Dialog for a Voice-Commandable Wheelchair

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

Ohio s Learning Standards-Clear Learning Targets

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Appendix L: Online Testing Highlights and Script

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

End-of-Module Assessment Task

An Architecture to Develop Multimodal Educative Applications with Chatbots

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Eyebrows in French talk-in-interaction

Standards-Based Bulletin Boards. Tuesday, January 17, 2012 Principals Meeting

Author: Fatima Lemtouni, Wayzata High School, Wayzata, MN

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

EMPOWER Self-Service Portal Student User Manual

An Introduction to Simio for Beginners

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

The Importance of Social Network Structure in the Open Source Software Developer Community

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Using computational modeling in language acquisition research

Chemistry 106 Chemistry for Health Professions Online Fall 2015

Task Completion Transfer Learning for Reward Inference

Lab 1 - The Scientific Method

RETURNING TEACHER REQUIRED TRAINING MODULE YE TRANSCRIPT

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

How to Judge the Quality of an Objective Classroom Test

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Secret Code for Mazes

Answer each question by placing an X over the appropriate answer. Select only one answer for each question.

Section 7, Unit 4: Sample Student Book Activities for Teaching Listening

The Political Engagement Activity Student Guide

HUBBARD COMMUNICATIONS OFFICE Saint Hill Manor, East Grinstead, Sussex. HCO BULLETIN OF 11 AUGUST 1978 Issue I RUDIMENTS DEFINITIONS AND PATTER

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

CHAPTER 5: COMPARABILITY OF WRITTEN QUESTIONNAIRE DATA AND INTERVIEW DATA

LITERACY ACROSS THE CURRICULUM POLICY

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Transcription:

INF5820/INF9820: Examples of exam questions 1 Spoken dialogue 1. Turn-taking is a crucial part of conversational competence. What linguistic and extra-linguistic factors can influence how people take and release turns, and where the boundaries of these turns are likely to lie? 2. Take the following utterance: robot please look at the ball no sorry the box Analyse the disfluent part based on Shriberg s disfluency model, and briefly describe how an NLU system could handle such disfluencies. 2 Speech recognition Assume you have a user uttering the following utterance: Could you please take the red box and put it on the other end of the table? But that your speech recognition hypothesis turns out to be: Could you place vague red box and put it this on another end of that? Calculate the Word Error Rate (WER) between the ASR hypothesis and the actual utterance. Detail your calculations. 3 Natural language understanding In our lectures, we have reviewed three distinct strategies for parsing spoken utterances. Describe these three strategies, and compare their respective advantages and shortcomings. 1

4 Dialogue management Imagine a kitchen robot whose task is to ask the user what kinds of cereals he/she wants for breakfast, wait for the user answer, and then hand out the appropriate cereal box once it knows the desired cereal. We want to design a simple dialogue system to handle the interaction with the user. A simple way to model it is via a MDP with only two states: state s = UnknownCereal where the robot doesn t know which cereal to give, and state s = KnownCereal, where the robot knows the cereal to hand out. There are only two possible actions in the model: Action a = AskCerealT ype corresponds to the robot asking the user for the cereal box he wishes to have. The action is only available in state UnknownCereal, and has a reward R = 1 in that state. Action a = GiveCereal corresponds to the robot physically giving the cereal to the user. The action is only available in state KnownCereal, and has a reward R = +5 in that state. When the robot executes the action AskCerealT ype in state U nknowncereal, it has a probability 0.8 of reaching state s = KnownCereal (if the user answers the robot s question), and a probability 0.2 of remaining in state U nknowncereal (if the user ignores the question or provides an unclear answer). When the robot executes action GiveCereal in state KnownCereal, the MDP reaches a final state and finishes. You are asked to calculate the expected cumulative reward of asking the cereal type while in the UnknownCereal state, i.e. Q(s = UnknownCereal, a = AskCerealT ype). You can assume a discount factor of 0.9. (Hint: use Bellman equation to calculate the Q values). 5 Speech synthesis Unit selection synthesis operates by searching for speech segments in a database that correspond to parts of the utterance to synthesise, and then gluing them together. Describe how this search is performed, and how the synthesiser ultimately decides which segments should be glued together. 2

6 Probabilistic modelling You want to develop a spoken dialogue system for a human-robot interaction domain, where the user can tell the robot to move forward, backward, turn left and right, as well as take and release an object. You have already integrated a speech recogniser, but you quickly realise that it tends to make systematic mistakes for particular words. To reduce the number of speech recognition errors, you therefore decide to implement a simple post-processing tool to correct the N-Best lists provided by the speech recogniser. You decide to implement this post-processing tool using the noisy-channel model. This tool will take a N-Best list as input, and output another (hopefully improved) N-Best list. As you remember from your dialogue system course, a noisy-channel model includes both a language model and a channel model. To estimate the language model, you collect a few sample sentences for your domain, and get the following tiny corpus of 20 sentences: Move forward Turn right Take object Turn left Release object Move backward Turn left Take object Move forward Turn left Take object Move forward Take object Move forward Release object Move forward Release object Turn left Turn right Move forward You also need to find a channel model for your domain 1. To this end, you analyse the outputs of the speech recogniser, and notice that most words have a probability 0.8 of being correctly recognised, and a probability 0.2 of being incorrectly recognised as another word. But there is an exception: the words forward and backward are frequently confused with one another by the ASR. These two words have a probability 0.5 of being correctly recognised, a probability 0.3 of being mistaken for the other one ( forward instead of backward and vice versa), and a probability 0.2 of being yet another word. Based on these informations, answer the following questions: 1. Derive a simple bigram model (without smoothing) for the small corpus shown above, and detail its probability distribution; 2. Construct the channel model corresponding to the word confusions informally described above, and detail its probability distribution; 3. Briefly explain how these two models are combined in a noisy channel model; 1 To keep things simple, we consider here only 1-to-1 word confusions. A more general model would have to take into account more complex, n-to-m confusion matrices. 3

4. Finally, apply your post-processing tool to correct the following N-Best list: NBest list = Move backward P = 0.6 Move forward P = 0.3 Turn right P = 0.1 You can discard recognition hypotheses with a probability lower than 0.01. 7 Probabilistic modelling Part 1 You want to build a new voice-controlled image repository for Android phones. Instead of using buttons to navigate through the images, the user will use voice controls to navigate through the pictures, using three distinct commands: previous (to move to the previous picture), next (to move to the next picture), and delete (to delete the current picture). Instead of using a full-fletched speech recogniser, you decide to use a simpler system based on the detection of stop consonants at the beginning and end of each command, since these consonants nicely discriminate between the three commands ( previous only has a stop at the beginning, next only at the end, and delete both at the beginning and the end). We therefore have a variable command with three distinct values = {previous, next, delete}, as well as two observation variables stopdetectedatbeginning, stopdetectedatend with binary values. Of course, the relation between command and these two observation variables is probabilistic, so you want to estimate a probabilistic model between them. You start collecting data of users experimenting with your system, and you end up a sample of 1000 commands. Out of these 1000 commands, 100 were previous commands, 200 were delete commands, and 700 were next commands. Furthermore: out of the 100 previous commands, 90 had a detected stop at the beginning, and 10 a detected stop at the end; out of the 200 delete commands, 180 had a detected stop at the beginning, and 180 had a detected stop at the end; out of the 700 next commands, 50 had a detected stop at the beginning, and 600 had a detected stop at the end. Given this information: 4

1. construct a Bayesian Network representing the three random variables and their probability distributions; 2. Based on this Bayesian Network, calculate the probability that the user uttered delete if the system detected both a stop at the beginning and at the end that is, P (command = delete stopdetectedatbeginning = true, stopdetectedatend = true). Part 2 The Bayesian Network you constructed allows us to determine the probability of each user command given the observations of stop consonants. We haven t however determined yet how our application will make a decision about its actions based on it. The system has four actions at its disposal: systemaction = {GoToPrevious, GoToNext, Delete, DoNothing}. Each of these actions have different utilities for instance, deleting a picture if the user intended something else should have a high negative utility. Assume the following utility distribution: command systemaction Utility previous GoToPrevious +1 previous GoToNext -2 previous Delete -5 next GoToPrevious -2 next GoToNext +1 next Delete -5 delete GoToPrevious -2 delete GoToNext -2 delete Delete +2 The DoNothing action always has a utility of 0.0. Given this utility function, answer the following questions: 3. Assumes the system detects a stop consonant at the beginning but no stop at the end (i.e. stopdetectedatbeginning = true, stopdetectedatend = false). What would then be the best action for the system to select, based on the utility distribution shown above? Show your calculations. 4. Draw the Bayesian network augmented with utility nodes (diamonds) and decision nodes (squares) that represents the full problem. 5