Increased Acceptance of Controller Assistance by Automatic Speech Recognition

Similar documents
Aviation English Solutions

LEGO MINDSTORMS Education EV3 Coding Activities

Cognitive Modelling of Pilot Errors and Error Recovery in Flight Management Tasks

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Speech Recognition at ICSI: Broadcast News and beyond

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Jacqueline C. Kowtko, Patti J. Price Speech Research Program, SRI International, Menlo Park, CA 94025

DEVELOPMENT AND EVALUATION OF AN AUTOMATED PATH PLANNING AID

Longman English Interactive

Human Factors Computer Based Training in Air Traffic Control

Appendix L: Online Testing Highlights and Script

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

Natural Language Analysis and Machine Translation in Pilot - ATC Communication. Boh Wasyliw* & Douglas Clarke $

Calibration of Confidence Measures in Speech Recognition

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

An Automated Data Fusion Process for an Air Defense Scenario

An Introduction to Simio for Beginners

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Robot manipulations and development of spatial imagery

A Case Study: News Classification Based on Term Frequency

Seminar - Organic Computing

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

A Pipelined Approach for Iterative Software Process Model

MAE Flight Simulation for Aircraft Safety

Initial English Language Training for Controllers and Pilots. Mr. John Kennedy École Nationale de L Aviation Civile (ENAC) Toulouse, France.

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STABILISATION AND PROCESS IMPROVEMENT IN NAB

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

PART 1. A. Safer Keyboarding Introduction. B. Fifteen Principles of Safer Keyboarding Instruction

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Plain Language NAGC Review

Moderator: Gary Weckman Ohio University USA

SYSTEM ENTITY STRUCTUURE ONTOLOGICAL DATA FUSION PROCESS INTEGRAGTED WITH C2 SYSTEMS

Learning Methods in Multilingual Speech Recognition

STUDENT MOODLE ORIENTATION

A study of speaker adaptation for DNN-based speech synthesis

Classroom Activities/Lesson Plan

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Software Maintenance

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Circuit Simulators: A Revolutionary E-Learning Platform

Miscommunication and error handling

CEFR Overall Illustrative English Proficiency Scales

Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Institutionen för datavetenskap. Hardware test equipment utilization measurement

The Evolution of Random Phenomena

Linking Task: Identifying authors and book titles in verbose queries

Making welding simulators effective

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Participation rules for the. Pegasus-AIAA Student Conference

Airplane Rescue: Social Studies. LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group The LEGO Group.

Lecture 1: Machine Learning Basics

Iep Data Collection Templates

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Learning Methods for Fuzzy Systems

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Visit us at:

Online Updating of Word Representations for Part-of-Speech Tagging

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

COURSE LISTING. Courses Listed. Training for Cloud with SAP SuccessFactors in Integration. 23 November 2017 (08:13 GMT) Beginner.

Making Sales Calls. Watertown High School, Watertown, Massachusetts. 1 hour, 4 5 days per week

Evaluation of Learning Management System software. Part II of LMS Evaluation

Speech Emotion Recognition Using Support Vector Machine

Formulaic Language and Fluency: ESL Teaching Applications

IVY TECH COMMUNITY COLLEGE

Administrative Services Manager Information Guide

Coimisiún na Scrúduithe Stáit State Examinations Commission LEAVING CERTIFICATE 2008 MARKING SCHEME GEOGRAPHY HIGHER LEVEL

Proceedings of Meetings on Acoustics

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Parsing of part-of-speech tagged Assamese Texts

elearning OVERVIEW GFA Consulting Group GmbH 1

MAKINO GmbH. Training centres in the following European cities:

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

GACE Computer Science Assessment Test at a Glance

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Abstractions and the Brain

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Degeneracy results in canalisation of language structure: A computational model of word learning

RWTH Aachen University

ADMN-1311: MicroSoft Word I ( Online Fall 2017 )

Transcription:

www.dlr.de/fl Chart 1 Increased Acceptance of Controller Assistance by Automatic Speech Recognition Project partners Supported by DLR Technology Marketing and Helmholtz Validation Fund Hartmut Helmke Heiko Ehr Matthias Kleinert German Aerospace Center (DLR) Institute of Flight Guidance Braunschweig, Germany Friedrich Faubel Dietrich Klakow Saarland University (UdS) Department of Computational Linguistics and Phonetics (LSV), Saarbrücken, Germany

www.dlr.de/fl Chart 2 Contents 1. Benefits of an additional sensor for Automatic Speech Recognition (ASR) and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps ASR = Automatic Speech Recognition AcListant = Active Listening Assistant 4D-CARMA = DLR s AMAN

www.dlr.de/fl Chart 3 Arrival Management: The View of a Scientist (1) Controller Task: Sequencing 2 runways FFM DEP rwy

www.dlr.de/fl Chart 4 Arrival Management: The View of a Scientist (2) Controller Task: Separation Cleared FL Cleared IAS Actual FL Actual ground speed

www.dlr.de/fl Chart 5 Arrival Management: The View of a Scientist (3) Controller Task: Trajectory Prediction Green: shortest possible way Yellow: planned trajectory, to keep minimal distances

www.dlr.de/fl Chart 6 Arrival Management: The View of a Scientist (4) Controller Task: Give commands to pilot Advisory Stack Start turn in 25 s Controller Assistance could be so easy if we do not have to include - the weather, - the controllers, - the pilots, - - i.e. the real world

www.dlr.de/fl Chart 7 Challenges - Currently the main use case of AMAN application is for flow control. - Saves coordination via telephone reduces controller workload - More support is possible. - The controllers are responsible. - They may and shall deviate. - Therefore, the suggestions are not always useful. - Metaphor car navigation system : - Recognize my intended deviations and warn me if I accidently deviate. - Trade-off between stability and adaptivity is necessary.

www.dlr.de/fl Chart 8 Ideas of controller-1 This is English Loss of runway capacity Gap between No. 2 and No. 3

www.dlr.de/fl Chart 9 Ideas of controller-2 Good usage of available capacity Adaptation of AMAN late

www.dlr.de/fl Chart 10 Ideas of controller-3 Sequence change too late AMAN should produce a warning

www.dlr.de/fl Chart t 11 We need an additional sensor AMAN Mouse EFS radar data Assistance controller Pilots voice commands Only using radar data results in instable sequences. Mouse as 2 nd sensor helps AMAN, but not controllers. Their workload even increases. We need better solutions!!!

www.dlr.de/fl Chart 12 We need an additional sensor AMAN Assistance radar data ASR controller Pilots voice commands Additional sensor requires no additional controller workload. Controller already uses microphone for voice communication. We just have to evaluate controller-pilot party-line by ASR. Additional sensor enables alignment of the two knowledge worlds: - mental model/plan of operators, - mental model/plan of system.

www.dlr.de/fl Chart 13 An additional sensor speeds up the control AMAN loop. + ASR improve each other (1) Possible controller intents: 1. Keep sequence 2. Increase Flow; 5 AMAN before 3. 3. Increase Flow; 4 before 3. Assistant radar data ASR controller Pilots voice commands Possible commands: 1. QFA764 Turn right, DLH645 Reduce 180 kt 2. ADR6887 Turn right, ADR6887 Descent 3000 ft 3. DLH645 Turn left, QFA764 Reduce 190 kt The additional sensor requires no further workload compared to e.g. intent input via mouse, keyboard, touch screen. A recognized command reduces the number of possible controller intents.

www.dlr.de/fl Chart 14 Possible controller intents: 1. Keep An sequence additional sensor speeds up the control AMAN loop. + ASR improve each other (2) 2. Increase Flow: 5 before 3. 3. Increase Flow 4 before 3. Possible commands: 1. QFA764 Turn right, radar data AMAN DLH645 Reduce 180 kt 2. ADR6887 Turn right, ADR6887 Descent 3000 ft 3. DLH645 Turn left, QFA764 Reduce 190 kt Assistant ASR Pilots controller As by product AMAN helps ASR, because AMAN permanently generates context information resulting in a set of voice commands The additional sensor requires no further workload compared to e.g. intent input via mouse, keyboard, touch screen. ASR possible commands.

www.dlr.de/fl Chart 15 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Folie 16 Development Status - A concept demonstrator was shown at the ATC Global 2011 in Amsterdam. - H1: ASR improves AMAN. - H2: AMAN improves ASR. - We have to quantify both hypotheses. - And we must show that both improvements have benefits for the controller. We want a validated air traffic application with light house (flagship) characteristics. Other area, e.g.: - Control room - Online gaming / playstations

www.dlr.de/fl Folie 17 High-Level Validation Hypotheses 1. AMAN improves approach control (e.g. predictability, optimal landing sequence). 2. An additional sensor speeds up AMAN plan adaptation. 3. ASR enhances both AMAN performance and the quality of controller work (performance, workload). 4. Dynamic context information created by AMAN improves ASR. 5. Adaptation of ASR components to ATC context improves ASR performance (e.g. model for non-native speakers, digit recognizer, robustness concerning out grammar utterances).

www.dlr.de/fl Chart 18 Hypothesis: ASR helps AMAN: Simulation Setup Radar data file Radar data Simulator Radar data Database 4D-CARMA seq, traj adv - Historical radar data of Frankfurt and Cologne/Bonn - 4D-CARMA updates plan every 5 seconds - Passive shadow-mode: planning output had no effect radar data - Many deviations between internal pictures of 4D-CARMA resp. of controller - How many deviations and how fast both pictures match again? - Baseline: No speech information available Evaluation

www.dlr.de/fl Chart 19 Hypothesis: ASR helps AMAN: Simulation Setup (2) Radar data file Radar data Simulator Radar data Database Command Extractor (Perfect ASR) Given Commands 4D-CARMA seq, traj adv - Estimation of possible benefits of perfect ASR - Preprocessing radar data in advance: Track, altitude, speed change command

www.dlr.de/fl Chart 20 Derived Measurements The time until subsequences were stable (SS-3, SS-4, ) Example SS-4 (with four elements): Real landing sequence: ( E, F, G, H ) Planned sequence: - E, G, H, F, I, J wait for better sequence - E, A, F, G, H, I, J wait for better sequence - E, F, I, G, H, J wait for better sequence - E, F, G, H, J, K use it, if not changed until landing For each of these subsequences the time is determined until the order of the M elements of the planned subsequence matches with the landing subsequence and is not changed until touchdown of the whole subsequence. We measure the time in seconds until the landing of the last airplane in the subsequence.

www.dlr.de/fl Chart 21 Derived Measurements (2) Vertical Deviation > 5 FL Lateral Deviation > 0.5 NM Speed/Time Deviation > 10s 4D-CARMA determines for each aircraft if radar data is conform to actual planned trajectory Non-Conformance Time (NConfT) Non-Conformance Counter (NConfCnt). These measurements indicate how long resp. how often the internal picture of controller and machine differ from each other.

www.dlr.de/fl Chart 22 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN Results a. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 23 Results Frankfurt: SS-N: subsequence stability Without ASR AMAN knows approx. 11 minutes before touchdown the correct sequence (SS-6, subsequence with 6 elements). With support of ASR this time increases to approx. 15 minutes.

www.dlr.de/fl Chart 24 Results Frankfurt: SS-N: subsequence stability (2) Subsequent processes get stable information earlier, which improves A-CDM. No. 8 11 minutes No. 11. 15 minutes before touchdown

www.dlr.de/fl Chart 25 Results Frankfurt: Conformance Conformance Monitoring (without ASR) NConfCnt=586, NConfT=12157 s Conformance Monitoring (with ASR) NConfCnt=456, NConfT=5250 s 14% 33% NONCONFORM time HALFCONFORM time 10% NONCONFORM time HALFCONFORM time 62% 5% CONFORM time 76% CONFORM time Without ASR we do not know advised target values. Aircraft are not conform one third of the time. Non-Conform deviations: 0.5 NM, 5 FL, 10 s Half-Conform deviations: 0.25 NM, 2.5 FL, 5 s

www.dlr.de/fl Chart 26 Results Frankfurt: Conformance (2) Also with ASR, the aircraft are not conform to their trajectory in 14% of the time. This high value is based on different effects: - Unknown wind in historical IAS uncertainty - Controllers heavily used vectoring for aircraft separation many lateral deviations. - Controller had no chance to implement the plan of 4D-CARMA (passive shadow mode). Conformance Monitoring (with ASR) NConfCnt=456, NConfT=5250 s 76% 14% 10%

www.dlr.de/fl Chart 27 Human-in-the-Loop Simulation (HITL)

www.dlr.de/fl Chart 28 Results HITL: Conformance Conformance Monitoring (without ASR) NConfCnt=170, NConfT=7277 s 18% Conformance Monitoring (with ASR) NConfCnt=70, NConfT=1174s 3% 2% 1% NONCONFORM time HALFCONFORM time CONFORM time NONCONFORM time HALFCONFORM time CONFORM time 81% 95% Match of controller model to system model increases by factor of 6. Deviations still occur, when controllers implement advisories earlier or later or uses different target values.

www.dlr.de/fl Chart 29 Results HITL: SS-N: subsequence stability Controller deviates from AMAN sequence only once which results in only low average improvements.

www.dlr.de/fl Chart 30 Results HITL: SS-N: subsequence stability (2) Gap between QFA764 (no. 3) and DLH645 (no. 4). Controller decided to change the sequence With ASR sequence update happened 15 seconds earlier.

www.dlr.de/fl Chart 31 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 32 How Speech Recognition works? (Automatic) Speech Recognition is application of statistics.

www.dlr.de/fl Chart 33 Definitions Process which transforms speech signal (wav-file) into sequence of spoken single words is the transcription. Transcription represent the recognized tokens, e.g. produced words, pauses and filled pauses (eh etc.). Example: Lufthansa one two nina eh descent level six correction flight level seven zero, reduce speed to one eight zero knots

www.dlr.de/fl Chart 34 Definitions (2) Lufthansa one two nina eh descent to level six correction flight level seven zero Reduce speed to one eight zero knots The process which extracts the recognized concepts described by XML-tags from the sentence/transcription is the annotation. <s> <callsign><airline>lufthansa</airline> <flightnumber>one two nine </flightnumber></callsign> _pause_ <command_type= descend >descend level six correction flight level <flightlevel> seven zero</flightlevel> </command> <command_type= reduce > </s>

www.dlr.de/fl Chart 35 Definitions (3) Lufthansa one two nina eh descent to level six correction flight level seven zero Reduce speed to one eight zero knots Process, creating the recognized concepts, is the concept extraction: - DLH129 Descend FL 70 Reduce 180 Process, creating the recognized commands, is the command extraction: - DLH129 Descend FL 70 - DLH129 Reduce 180

www.dlr.de/fl Chart 36 Definitions (4) The word error rate (WER) is defined as: iiiiii ss + dddddd ss + ssssss(ss) WWWWWW ss = WW(ss) - ins(s): number of word insertions (words never spoken), - del(s): number of deletions (words missed by ASR), - sub(s): number of substitutions needed to align the two sequences, - W(s): number of words actually said. Example: controller utterance: - Lufthansa one two nina reduce speed one eight zero (W(s) = 9) Recognized: - Lufthansa one two reduce speed to three eight zero knots - Ins(s) = #{to, knots} = 2; del(s) = #{nina}=1; sub(s) = #{one} = 1; - WER=44%

www.dlr.de/fl Chart 37 Definitions (5) Sentence error rate (SER): Rate of sentences having at least one error (i.e. the rate of not perfectly recognized sentences). Although WER and SER are often related, this is not always the case. Generally, the SER increases with the WER, but one cannot be inferred from the other. Concept error rate (CER): Rate concepts having at least one error. DLH12 DESCEND FL 120 Command error rate (CoER): It is not important that ASR correctly recognizes Good morning Lufthansa one two descend level one two zero, but that the command DLH12 DESCEND FL 120 is extracted.

www.dlr.de/fl Chart 38 Definitions (6) Mean reciprocal rank (MRR): is useful to measure the benefits of using context information. MMMMMM YY = 1 YY 1 rrrrrrrr(yy) yy YY Example: Said: AF123 DESCEND FL 60 Recognized: rank 1: AF123 DESCEND FL 50 rank 2: AF123 DESCEND FL 60 rank 3 BWA123 DESCEND FL 60 rank( AF123 DESCEND FL 60 ) = 2 Here, Y denotes the complete set of utterances (i.e. the set of given commands). The rank of each utterance y is determined as follows: If a command y1 is recognized correctly, i.e. y1 is the highest-scoring hypothesis in the word lattice then rank(y1) is 1. If a command y2 is not recognized correctly and the hypothesis, that this command was given, is only the third best hypothesis in the lattice, then rank(y2) is 3, and so on

www.dlr.de/fl Chart 39 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context Results 3. Conclusions and next steps

www.dlr.de/fl Chart 40 Hypothesis: AMAN helps ASR: Experiment 16 people all of them no ATC experts - 8 German speakers, - 3 North American English speakers, - 2 Greek, - 1 Malayalam, 1 Romanian and 1 Russian speaker. - 12 male 4 female - Approach scenario with 31 inbounds for Frankfurt airport - 4D-CARMA was used (in passive shadow mode) to create sequences and ATC commands, which were displayed to the participants (in English). - voice commands were recorded using a headset. - 1,107 ATC commands were recorded - average length of 9.5 words per sentence

www.dlr.de/fl Chart 41 Hypothesis: AMAN helps ASR: Results Constraints Used WER SER MRR none (baseline) 2.81% 22.58% 0.849 constraint callsign 0.55% 4.61% 0.966 constraint callsign, 0.52% 4.52% 0.967 speed, altitude oracle (best possible) 0.31% 2.07% 0.979 - Callsign constraint already improves WER and SER by a factor of 5. - We only considered simple commands. - Combined reduce and descend commands, which also contain a heading or frequency change command, were not considered.

www.dlr.de/fl Chart 42 Contents 1. Motivation for an additional sensor 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 43 Conclusions - Dynamic context information provided by an AMAN can reduce error rates by a factor of 5! - Speech recognition provides an additional sensor which reduces downtime (plan adaptation time) by 35 seconds! We detect controller deviations in the presented videos very early. - ASR and assistant systems improve each other. - ASR could be even an enabler for the introduction of higher levels of automation (provided recognition rate is acceptable enough) Parallelism of the world of the situational knowledge - between the operators and (created by direct communication and listening) - between the operators and the systems (sensor based without knowledge of operator intentions)

www.dlr.de/fl Chart 44 Next Steps - Funding of the AcListant (= Active Listing Assistant) idea within a 2 year commercialization and validation project available (started in Feb. 2013). - Using a real airport (Düsseldorf)

www.dlr.de/fl Chart 45 Next Steps (2) Non-Native Speakers Digit Recognizer Gender Models Use of Context Out-Off- Grammar

www.dlr.de/fl Chart 46 Next Steps (3) - Creating of dynamic context and exchange with ASR - Validation trials (in Nov 13, March 14, Nov. 14) - Integration of ASR into an AMAN is one application - DMAN, SMAN, TMAN, - Electronic flight strips - Keywords recognition (e.g. go around, ) - Check of pilot read-backs - Check, if target altitude sent by aircraft corresponds to cleared altitude - - Stakeholder Workshop in 11. 12. Sep. 2013 in Braunschweig (see www.aclistant.de) - Second Stakeholder Workshop in June 2014

www.dlr.de/fl Folie 47 Supported by DLR Technology Marketing and Helmholtz Validation Fund More information on: www.aclistant.de Thank you very much for attention. Listening Participating in discussion and decision making