The Closed Runway Operation Prevention Device: Applying Automatic Speech Recognition Technology for Aviation Safety

MITRE CAASD The Closed Runway Operation Prevention Device: Applying Automatic Speech Recognition Technology for Aviation Safety Shuo Chen Hunter Kopald June 25, 2015 Approved for Public Release; Distribution Unlimited. 15-1756 2015 The MITRE Corporation. All rights reserved.

2 Outline Overview of closed runway operations Closed Runway Operation Prevention Device (CROPD) Role of speech recognition Automatic speech recognition, in general and in ATC Tuning techniques to tune performance for the ATC domain Language modeling Acoustic model adaptation Semantic interpretation and confidence thresholding Evaluating the CROPD speech recognition performance through field demonstration Implications and future work

Overview of Closed Runway Operations 3 Local Controller Transmission American one twenty-three, wind three five zero at eight, runway one left, cleared for takeoff Prevention Mechanisms Standard Operating Procedures Basic memory aids: flight strip placards Additional automation systems that provide alerts Pilot Readback Runway one left, cleared for takeoff, american one twenty-three, X X Operations no longer allowed on runways designated as closed

Closed Runway Operation Prevention Device (CROPD) 4 FAA-proposed concept to prevent operations on closed runways by using automatic speech recognition to monitor controller clearances CROPD alerts if both: 1. Clearance detected cleared to land cleared for takeoff line up and wait Other clearances to be added, e.g., cleared touch and go 2. Runway associated with clearance is designated as closed Additional details Only listens to local controller transmissions Not integrated with other tower systems (e.g., ASDE-X) Intended for use at airports both with and without advanced runway safety systems Air Traffic Control Tower! Alert Controller Closed runway status Graphical User Interface Transmission Audio Closed Runway Status Alert Trigger CROPD in Equipment Room Speech recognition and understanding Identify spoken intent: 1. Runway 2. Clearance Alert logic Compare: 1. Runway and clearance detected 2. Closed runway status Trigger alert

5 The Role of Speech Recognition Objective of the speech recognition is to detect intent to use a runway United one twenty-three, wind three three zero at six, runway one center cleared to land, traffic departs runway three zero. Recognized numbers One twenty-three Three three zero Six One (center) Logical concept ACID Wind direction Wind speed Runway clearance For CROPD, intent can be considered the presence of a clearance and the runway associated with that clearance Simplifies performance measure to specific phrase-level accuracy Requires disambiguation of phrases that could be confused with runways Requires accurate association of the correct runway with the clearance post-recognition Three zero Runway Not all words in the transmission are equally important to the detection of intent Word Error Rate (WER) is not appropriate Need an application-specific metric Composite measure of both runway and clearance Considers correct association of runway with clearance Use intent as metric, with 5 outcome types CROPD Result Intent: RWY 30, CFT Intent: RWY 19L, CTL No Intent Truth Actual Transmission Intent: RWY30, CFT Correct Intent Incorrect Intent Missed Intent No Intent False Intent False Intent Correct Rejection Speech recognition performance is different from system performance Incorrect speech recognition results do not always translate to incorrect system performance

Automatic Speech Recognition 6 In General Automatic speech recognition performance depends not only on the speech input at the time of application, but also on what is already known about the application prior to use In some cases, meaning deduction is more important than word recognition Need to tailor system to the application Need to define what recognition is needed for success With Air Traffic Control ATC characteristics that make automatic speech recognition challenging Acoustic characteristics of audio equipment Acoustic characteristics of the speaker set Rate of speech Pronunciation Accents Deviations from standard phraseology ATC characteristics that facilitate automatic speech recognition Standard phraseology, when followed Acoustic modeling custom pronunciation adaptation Context information: what the system should expect to hear KIAD Airport Diagram 12 19R 1L 19C 1C 30 19L 1R

Tuning Recognition Performance: Language Model Creation and Adaptation 7 Language models define the universe of words and word sequences that the speech recognition system is designed to recognize 1. Finite-state grammars Manually defined vocabulary and word sequences Can yield near-perfect recognition in applications where speakers adhere to defined sequences Poor tolerance for phrase deviations 2. Statistical language models (SLMs) Machine-generated vocabulary and probability model of word sequences based on transcription data from target environment Robust to speech variations and disfluencies Can yield improbable word sequences CROPD used both types of models Initial A grammar benchmark containing of speech only runway recognition and performance on local controller audio using clearance the two phrases, CROPD with language common models yielded substitutions, 70% true intent additions, detection and omissions A 50% statistical false intent language detection model trained on 10,000+ transcriptions from the local controller position

Tuning Recognition Performance: Acoustic Model Adaptation 8 Acoustic models define the statistical signatures of speech sounds, i.e. phonemes, and specify the sound sequences that form words Default acoustic model usually provided out-of-the-box Methods of adaptation: 1. Pronunciation dictionaries CROPD used both types Acoustic of acoustic Model tuning = techniques = cleared-to-land Manually defined word pronunciations Accounts for common pronunciation variation and non-standard words, e.g. airline names, fix names 2. Automatic acoustic model adaptation Automatic supervised training of existing acoustic model with audio and transcriptions from target speaker set and environment Accounts for channel characteristics and consistent speech patterns, i.e. accents, within a speaker set Custom ATC dictionary Non-standard words, e.g. = niner niner = one Composite = words where coarticulation = and assimilation were frequently observed, e.g. clearedto-land clea da lan = Custom acoustic model adapted with audio and transcriptions from local controller position Second benchmark of speech recognition performance showed improvement increased true intent detection to 89% reduced false intent detection to 18%

Tuning Recognition Performance: Post- Recognition Processing 9 Semantic interpretation performs text processing to derive logical concepts from potentially error-filled raw text Apply application-specific reasoning to disambiguate confusable phrases Introduce robustness to recognition error by selectively using only parts of the recognized text For CROPD, a semantic interpretation algorithm was developed Rescores hypotheses based on systemgenerated confidence scores Deduces the most probable combination of clearance and runway Third benchmark of speech recognition performance showed improvement increased true intent detection to 95% reduced false intent detection to 17% Confidence Thresholding uses system-generated scores to bias final result to a specific balance of missed and false detections In CROPD, confidence thresholding is implemented System users must determine what confidence threshold is operationally acceptable Can vary by facility Does not affect speech recognition system Advises the application using the speech recognition whether to accept a recognition result based on its own certainty of accuracy

Evaluating Performance through Field Demonstration 10 In summer of 2014, the CROPD was field tested at KIAD as a part of the NAS Passively monitored local controller channel by connecting to the facility voice switch A subset of field audio was selected for performance evaluation The effect of adjusting the confidence threshold was also studied True Intent Transmissions with Intent Incorrect Intent Missed Intent Transmissions without Intent Correct Non-Intent False Intent 92.70% 4.77% 2.54% 90.33% 9.67% Analysis of some of the false and incorrect intent errors showed that the semantic interpretation algorithm could be further improved Erroneously discarding correctly recognized runway phrases Some ambiguous phrases still erroneously tagged as runways

11 Implications and Future Work A combination of various tuning techniques can yield significant performance improvements CROPD is a simple, isolated application that demonstrates the feasibility of applying automatic speech recognition on live ATC transmissions Other applications could make use of the same detected clearances for other purposes or detect new instructions for other controller positions More complex applications could also be feasible with the addition of context information from tower automation systems MITRE is currently investigating the benefits of integrating speech recognition with Tower/Surface surveillance data Leverage speech-derived intent to improve safety alert performance Use aircraft and airport state information to improve speech recognition performance

Transmission without Intent Transmission with Intent A Closer Look at Recognition vs. System Performance 13 Spoken Intent: 19L, cleared to land Closure: none Detected: none Correct non-alert Spoken Intent: 19L, cleared to land Closure: 19L/1R Detected: none Missed Alert Correct Alert Behavior Incorrect Alert Behavior Spoken runway not closed, no missed or false alert Spoken runway closed, missed alert Missed Intent Detection

Transmission without Intent Transmission with Intent A Closer Look at Recognition vs. System Performance 14 Correct Alert Behavior Incorrect Alert Behavior False Intent Detection Detected runway not closed, no false alert Detected runway closed, false alert Spoken Intent: none Closure: none Detected: 19L, cleared to land Correct non-alert Spoken Intent: none Closure: 19L/1R Detected: 19L, cleared to land False Alert

Transmission without Intent Transmission with Intent A Closer Look at Recognition vs. System Performance 15 Spoken Intent: 19L, cleared to land Closure: none Detected: 1L, cleared to land Correct non-alert Spoken Intent: 19C, cleared to land Closure: 19C/1C Detected: 1C, cleared to land Correct non-alert Correct Alert Behavior Spoken runway not closed, detected runway not closed, no missed or false alert Spoken runway closed, detected runway is opposite end of spoken runway, alert correctly generated Incorrect Alert Behavior Spoken runway closed, detected runway not closed, missed alert Spoken Intent: 19L, cleared to land Closure: 19L/1R Detected: 1L, cleared to land Missed Alert Detected runway is not closed, no false alert Detected runway is closed, false alert Spoken Intent: 19L, cleared to land Closure: none Detected: 1L, cleared to land Correct non-alert Incorrect Intent Detection Spoken Intent: 19L, cleared to land Closure: 1L/19R Detected: 1L, cleared to land False Alert