Increased Acceptance of Controller Assistance by Automatic Speech Recognition

www.dlr.de/fl Chart 1 Increased Acceptance of Controller Assistance by Automatic Speech Recognition Project partners Supported by DLR Technology Marketing and Helmholtz Validation Fund Hartmut Helmke Heiko Ehr Matthias Kleinert German Aerospace Center (DLR) Institute of Flight Guidance Braunschweig, Germany Friedrich Faubel Dietrich Klakow Saarland University (UdS) Department of Computational Linguistics and Phonetics (LSV), Saarbrücken, Germany

www.dlr.de/fl Chart 2 Contents 1. Benefits of an additional sensor for Automatic Speech Recognition (ASR) and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps ASR = Automatic Speech Recognition AcListant = Active Listening Assistant 4D-CARMA = DLR s AMAN

www.dlr.de/fl Chart 3 Arrival Management: The View of a Scientist (1) Controller Task: Sequencing 2 runways FFM DEP rwy

www.dlr.de/fl Chart 4 Arrival Management: The View of a Scientist (2) Controller Task: Separation Cleared FL Cleared IAS Actual FL Actual ground speed

www.dlr.de/fl Chart 5 Arrival Management: The View of a Scientist (3) Controller Task: Trajectory Prediction Green: shortest possible way Yellow: planned trajectory, to keep minimal distances

www.dlr.de/fl Chart 6 Arrival Management: The View of a Scientist (4) Controller Task: Give commands to pilot Advisory Stack Start turn in 25 s Controller Assistance could be so easy if we do not have to include - the weather, - the controllers, - the pilots, - - i.e. the real world

www.dlr.de/fl Chart 7 Challenges - Currently the main use case of AMAN application is for flow control. - Saves coordination via telephone reduces controller workload - More support is possible. - The controllers are responsible. - They may and shall deviate. - Therefore, the suggestions are not always useful. - Metaphor car navigation system : - Recognize my intended deviations and warn me if I accidently deviate. - Trade-off between stability and adaptivity is necessary.

www.dlr.de/fl Chart 8 Ideas of controller-1 This is English Loss of runway capacity Gap between No. 2 and No. 3

www.dlr.de/fl Chart 9 Ideas of controller-2 Good usage of available capacity Adaptation of AMAN late

www.dlr.de/fl Chart 10 Ideas of controller-3 Sequence change too late AMAN should produce a warning

www.dlr.de/fl Chart t 11 We need an additional sensor AMAN Mouse EFS radar data Assistance controller Pilots voice commands Only using radar data results in instable sequences. Mouse as 2 nd sensor helps AMAN, but not controllers. Their workload even increases. We need better solutions!!!

www.dlr.de/fl Chart 12 We need an additional sensor AMAN Assistance radar data ASR controller Pilots voice commands Additional sensor requires no additional controller workload. Controller already uses microphone for voice communication. We just have to evaluate controller-pilot party-line by ASR. Additional sensor enables alignment of the two knowledge worlds: - mental model/plan of operators, - mental model/plan of system.

www.dlr.de/fl Chart 13 An additional sensor speeds up the control AMAN loop. + ASR improve each other (1) Possible controller intents: 1. Keep sequence 2. Increase Flow; 5 AMAN before 3. 3. Increase Flow; 4 before 3. Assistant radar data ASR controller Pilots voice commands Possible commands: 1. QFA764 Turn right, DLH645 Reduce 180 kt 2. ADR6887 Turn right, ADR6887 Descent 3000 ft 3. DLH645 Turn left, QFA764 Reduce 190 kt The additional sensor requires no further workload compared to e.g. intent input via mouse, keyboard, touch screen. A recognized command reduces the number of possible controller intents.

www.dlr.de/fl Chart 14 Possible controller intents: 1. Keep An sequence additional sensor speeds up the control AMAN loop. + ASR improve each other (2) 2. Increase Flow: 5 before 3. 3. Increase Flow 4 before 3. Possible commands: 1. QFA764 Turn right, radar data AMAN DLH645 Reduce 180 kt 2. ADR6887 Turn right, ADR6887 Descent 3000 ft 3. DLH645 Turn left, QFA764 Reduce 190 kt Assistant ASR Pilots controller As by product AMAN helps ASR, because AMAN permanently generates context information resulting in a set of voice commands The additional sensor requires no further workload compared to e.g. intent input via mouse, keyboard, touch screen. ASR possible commands.

www.dlr.de/fl Chart 15 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Folie 16 Development Status - A concept demonstrator was shown at the ATC Global 2011 in Amsterdam. - H1: ASR improves AMAN. - H2: AMAN improves ASR. - We have to quantify both hypotheses. - And we must show that both improvements have benefits for the controller. We want a validated air traffic application with light house (flagship) characteristics. Other area, e.g.: - Control room - Online gaming / playstations

www.dlr.de/fl Folie 17 High-Level Validation Hypotheses 1. AMAN improves approach control (e.g. predictability, optimal landing sequence). 2. An additional sensor speeds up AMAN plan adaptation. 3. ASR enhances both AMAN performance and the quality of controller work (performance, workload). 4. Dynamic context information created by AMAN improves ASR. 5. Adaptation of ASR components to ATC context improves ASR performance (e.g. model for non-native speakers, digit recognizer, robustness concerning out grammar utterances).

www.dlr.de/fl Chart 18 Hypothesis: ASR helps AMAN: Simulation Setup Radar data file Radar data Simulator Radar data Database 4D-CARMA seq, traj adv - Historical radar data of Frankfurt and Cologne/Bonn - 4D-CARMA updates plan every 5 seconds - Passive shadow-mode: planning output had no effect radar data - Many deviations between internal pictures of 4D-CARMA resp. of controller - How many deviations and how fast both pictures match again? - Baseline: No speech information available Evaluation

www.dlr.de/fl Chart 19 Hypothesis: ASR helps AMAN: Simulation Setup (2) Radar data file Radar data Simulator Radar data Database Command Extractor (Perfect ASR) Given Commands 4D-CARMA seq, traj adv - Estimation of possible benefits of perfect ASR - Preprocessing radar data in advance: Track, altitude, speed change command

www.dlr.de/fl Chart 20 Derived Measurements The time until subsequences were stable (SS-3, SS-4, ) Example SS-4 (with four elements): Real landing sequence: ( E, F, G, H ) Planned sequence: - E, G, H, F, I, J wait for better sequence - E, A, F, G, H, I, J wait for better sequence - E, F, I, G, H, J wait for better sequence - E, F, G, H, J, K use it, if not changed until landing For each of these subsequences the time is determined until the order of the M elements of the planned subsequence matches with the landing subsequence and is not changed until touchdown of the whole subsequence. We measure the time in seconds until the landing of the last airplane in the subsequence.

www.dlr.de/fl Chart 21 Derived Measurements (2) Vertical Deviation > 5 FL Lateral Deviation > 0.5 NM Speed/Time Deviation > 10s 4D-CARMA determines for each aircraft if radar data is conform to actual planned trajectory Non-Conformance Time (NConfT) Non-Conformance Counter (NConfCnt). These measurements indicate how long resp. how often the internal picture of controller and machine differ from each other.

www.dlr.de/fl Chart 22 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN Results a. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 23 Results Frankfurt: SS-N: subsequence stability Without ASR AMAN knows approx. 11 minutes before touchdown the correct sequence (SS-6, subsequence with 6 elements). With support of ASR this time increases to approx. 15 minutes.

www.dlr.de/fl Chart 24 Results Frankfurt: SS-N: subsequence stability (2) Subsequent processes get stable information earlier, which improves A-CDM. No. 8 11 minutes No. 11. 15 minutes before touchdown

www.dlr.de/fl Chart 25 Results Frankfurt: Conformance Conformance Monitoring (without ASR) NConfCnt=586, NConfT=12157 s Conformance Monitoring (with ASR) NConfCnt=456, NConfT=5250 s 14% 33% NONCONFORM time HALFCONFORM time 10% NONCONFORM time HALFCONFORM time 62% 5% CONFORM time 76% CONFORM time Without ASR we do not know advised target values. Aircraft are not conform one third of the time. Non-Conform deviations: 0.5 NM, 5 FL, 10 s Half-Conform deviations: 0.25 NM, 2.5 FL, 5 s

www.dlr.de/fl Chart 26 Results Frankfurt: Conformance (2) Also with ASR, the aircraft are not conform to their trajectory in 14% of the time. This high value is based on different effects: - Unknown wind in historical IAS uncertainty - Controllers heavily used vectoring for aircraft separation many lateral deviations. - Controller had no chance to implement the plan of 4D-CARMA (passive shadow mode). Conformance Monitoring (with ASR) NConfCnt=456, NConfT=5250 s 76% 14% 10%

www.dlr.de/fl Chart 27 Human-in-the-Loop Simulation (HITL)

www.dlr.de/fl Chart 28 Results HITL: Conformance Conformance Monitoring (without ASR) NConfCnt=170, NConfT=7277 s 18% Conformance Monitoring (with ASR) NConfCnt=70, NConfT=1174s 3% 2% 1% NONCONFORM time HALFCONFORM time CONFORM time NONCONFORM time HALFCONFORM time CONFORM time 81% 95% Match of controller model to system model increases by factor of 6. Deviations still occur, when controllers implement advisories earlier or later or uses different target values.

www.dlr.de/fl Chart 29 Results HITL: SS-N: subsequence stability Controller deviates from AMAN sequence only once which results in only low average improvements.

www.dlr.de/fl Chart 30 Results HITL: SS-N: subsequence stability (2) Gap between QFA764 (no. 3) and DLH645 (no. 4). Controller decided to change the sequence With ASR sequence update happened 15 seconds earlier.

www.dlr.de/fl Chart 31 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 32 How Speech Recognition works? (Automatic) Speech Recognition is application of statistics.

www.dlr.de/fl Chart 33 Definitions Process which transforms speech signal (wav-file) into sequence of spoken single words is the transcription. Transcription represent the recognized tokens, e.g. produced words, pauses and filled pauses (eh etc.). Example: Lufthansa one two nina eh descent level six correction flight level seven zero, reduce speed to one eight zero knots

www.dlr.de/fl Chart 34 Definitions (2) Lufthansa one two nina eh descent to level six correction flight level seven zero Reduce speed to one eight zero knots The process which extracts the recognized concepts described by XML-tags from the sentence/transcription is the annotation. <s> <callsign><airline>lufthansa</airline> <flightnumber>one two nine </flightnumber></callsign> _pause_ <command_type= descend >descend level six correction flight level <flightlevel> seven zero</flightlevel> </command> <command_type= reduce > </s>

www.dlr.de/fl Chart 35 Definitions (3) Lufthansa one two nina eh descent to level six correction flight level seven zero Reduce speed to one eight zero knots Process, creating the recognized concepts, is the concept extraction: - DLH129 Descend FL 70 Reduce 180 Process, creating the recognized commands, is the command extraction: - DLH129 Descend FL 70 - DLH129 Reduce 180

www.dlr.de/fl Chart 36 Definitions (4) The word error rate (WER) is defined as: iiiiii ss + dddddd ss + ssssss(ss) WWWWWW ss = WW(ss) - ins(s): number of word insertions (words never spoken), - del(s): number of deletions (words missed by ASR), - sub(s): number of substitutions needed to align the two sequences, - W(s): number of words actually said. Example: controller utterance: - Lufthansa one two nina reduce speed one eight zero (W(s) = 9) Recognized: - Lufthansa one two reduce speed to three eight zero knots - Ins(s) = #{to, knots} = 2; del(s) = #{nina}=1; sub(s) = #{one} = 1; - WER=44%

www.dlr.de/fl Chart 37 Definitions (5) Sentence error rate (SER): Rate of sentences having at least one error (i.e. the rate of not perfectly recognized sentences). Although WER and SER are often related, this is not always the case. Generally, the SER increases with the WER, but one cannot be inferred from the other. Concept error rate (CER): Rate concepts having at least one error. DLH12 DESCEND FL 120 Command error rate (CoER): It is not important that ASR correctly recognizes Good morning Lufthansa one two descend level one two zero, but that the command DLH12 DESCEND FL 120 is extracted.

www.dlr.de/fl Chart 38 Definitions (6) Mean reciprocal rank (MRR): is useful to measure the benefits of using context information. MMMMMM YY = 1 YY 1 rrrrrrrr(yy) yy YY Example: Said: AF123 DESCEND FL 60 Recognized: rank 1: AF123 DESCEND FL 50 rank 2: AF123 DESCEND FL 60 rank 3 BWA123 DESCEND FL 60 rank( AF123 DESCEND FL 60 ) = 2 Here, Y denotes the complete set of utterances (i.e. the set of given commands). The rank of each utterance y is determined as follows: If a command y1 is recognized correctly, i.e. y1 is the highest-scoring hypothesis in the word lattice then rank(y1) is 1. If a command y2 is not recognized correctly and the hypothesis, that this command was given, is only the third best hypothesis in the lattice, then rank(y2) is 3, and so on

www.dlr.de/fl Chart 39 Contents 1. Benefits of an additional sensor for ASR and controller assistance. 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context Results 3. Conclusions and next steps

www.dlr.de/fl Chart 40 Hypothesis: AMAN helps ASR: Experiment 16 people all of them no ATC experts - 8 German speakers, - 3 North American English speakers, - 2 Greek, - 1 Malayalam, 1 Romanian and 1 Russian speaker. - 12 male 4 female - Approach scenario with 31 inbounds for Frankfurt airport - 4D-CARMA was used (in passive shadow mode) to create sequences and ATC commands, which were displayed to the participants (in English). - voice commands were recorded using a headset. - 1,107 ATC commands were recorded - average length of 9.5 words per sentence

www.dlr.de/fl Chart 41 Hypothesis: AMAN helps ASR: Results Constraints Used WER SER MRR none (baseline) 2.81% 22.58% 0.849 constraint callsign 0.55% 4.61% 0.966 constraint callsign, 0.52% 4.52% 0.967 speed, altitude oracle (best possible) 0.31% 2.07% 0.979 - Callsign constraint already improves WER and SER by a factor of 5. - We only considered simple commands. - Combined reduce and descend commands, which also contain a heading or frequency change command, were not considered.

www.dlr.de/fl Chart 42 Contents 1. Motivation for an additional sensor 2. Development status and validation exercises a. Validation of hypotheses ASR improves AMAN b. Validation of AMAN improves ASR by dynamic context 3. Conclusions and next steps

www.dlr.de/fl Chart 43 Conclusions - Dynamic context information provided by an AMAN can reduce error rates by a factor of 5! - Speech recognition provides an additional sensor which reduces downtime (plan adaptation time) by 35 seconds! We detect controller deviations in the presented videos very early. - ASR and assistant systems improve each other. - ASR could be even an enabler for the introduction of higher levels of automation (provided recognition rate is acceptable enough) Parallelism of the world of the situational knowledge - between the operators and (created by direct communication and listening) - between the operators and the systems (sensor based without knowledge of operator intentions)

www.dlr.de/fl Chart 44 Next Steps - Funding of the AcListant (= Active Listing Assistant) idea within a 2 year commercialization and validation project available (started in Feb. 2013). - Using a real airport (Düsseldorf)

www.dlr.de/fl Chart 45 Next Steps (2) Non-Native Speakers Digit Recognizer Gender Models Use of Context Out-Off- Grammar

www.dlr.de/fl Chart 46 Next Steps (3) - Creating of dynamic context and exchange with ASR - Validation trials (in Nov 13, March 14, Nov. 14) - Integration of ASR into an AMAN is one application - DMAN, SMAN, TMAN, - Electronic flight strips - Keywords recognition (e.g. go around, ) - Check of pilot read-backs - Check, if target altitude sent by aircraft corresponds to cleared altitude - - Stakeholder Workshop in 11. 12. Sep. 2013 in Braunschweig (see www.aclistant.de) - Second Stakeholder Workshop in June 2014

www.dlr.de/fl Folie 47 Supported by DLR Technology Marketing and Helmholtz Validation Fund More information on: www.aclistant.de Thank you very much for attention. Listening Participating in discussion and decision making