Machine Learning Yearning is a deeplearning.ai project Andrew Ng. All Rights Reserved. Page 2 Machine Learning Yearning-Draft Andrew Ng

Similar documents
Human Emotion Recognition From Speech

Modeling function word errors in DNN-HMM based LVCSR systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

Using dialogue context to improve parsing performance in dialogue systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Python Machine Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

LEGO MINDSTORMS Education EV3 Coding Activities

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

(Sub)Gradient Descent

Speech Recognition at ICSI: Broadcast News and beyond

On the Formation of Phoneme Categories in DNN Acoustic Models

CS Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Machine Learning Basics

Speaker Identification by Comparison of Smart Methods. Abstract

A Case Study: News Classification Based on Term Frequency

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Word Segmentation of Off-line Handwritten Documents

Generative models and adversarial training

CS 446: Machine Learning

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Automatic Pronunciation Checker

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

WHEN THERE IS A mismatch between the acoustic

Lecture 10: Reinforcement Learning

An investigation of imitation learning algorithms for structured prediction

AQUA: An Ontology-Driven Question Answering System

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

The stages of event extraction

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Probabilistic Latent Semantic Analysis

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Rule Learning With Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Assignment 1: Predicting Amazon Review Ratings

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Computers Change the World

Shockwheat. Statistics 1, Activity 1

Linking Task: Identifying authors and book titles in verbose queries

Medical Complexity: A Pragmatic Theory

CSC200: Lecture 4. Allan Borodin

MYCIN. The MYCIN Task

B. How to write a research paper

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Proceedings of Meetings on Acoustics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Seminar - Organic Computing

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Developing Grammar in Context

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Spinners at the School Carnival (Unequal Sections)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Affective Classification of Generic Audio Clips using Regression Models

Learning Methods in Multilingual Speech Recognition

Calibration of Confidence Measures in Speech Recognition

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Edinburgh Research Explorer

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

A Vector Space Approach for Aspect-Based Sentiment Analysis

Navigating the PhD Options in CMS

Rule Learning with Negation: Issues Regarding Effectiveness

SARDNET: A Self-Organizing Feature Map for Sequences

Beyond the Pipeline: Discrete Optimization in NLP

Australian Journal of Basic and Applied Sciences

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Dimensions of Classroom Behavior Measured by Two Systems of Interaction Analysis

Answer each question by placing an X over the appropriate answer. Select only one answer for each question.

Detecting English-French Cognates Using Orthographic Edit Distance

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Phonological Processing for Urdu Text to Speech System

Tap vs. Bottled Water

A process by any other name

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Speaker recognition using universal background model on YOHO database

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Evolution of Symbolisation in Chimpanzees and Neural Nets

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Reinforcement Learning by Comparing Immediate Reward

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Support Vector Machines for Speaker and Language Recognition

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Transcription:

Machine Learning Yearning is a deeplearning.ai project. 2018 Andrew Ng. All Rights Reserved. Page 2 Machine Learning Yearning-Draft Andrew Ng

End-to-end deep learning Page 3 Machine Learning Yearning-Draft Andrew Ng

47 The rise of end-to-end learning Suppose you want to build a system to examine online product reviews and automatically tell you if the writer liked or disliked that product. For example, you hope to recognize the following review as highly positive: This is a great mop! and the following as highly negative: This mop is low quality--i regret buying it. The problem of recognizing positive vs. negative opinions is called sentiment classification. To build this system, you might build a pipeline of two components: 1. Parser: A system that annotates the text with information identifying the most 1 important words. For example, you might use the parser to label all the adjectives and nouns. You would therefore get the following annotated text: This is a great Adjective mop Noun! 2. Sentiment classifier: A learning algorithm that takes as input the annotated text and predicts the overall sentiment. The parser s annotation could help this learning algorithm greatly: By giving adjectives a higher weight, your algorithm will be able to quickly hone in on the important words such as great, and ignore less important words such as this. We can visualize your pipeline of two components as follows: There has been a recent trend toward replacing pipeline systems with a single learning algorithm. An end-to-end learning algorithm for this task would simply take as input the raw, original text This is a great mop!, and try to directly recognize the sentiment: 1 A parser gives a much richer annotation of the text than this, but this simplified description will suffice for explaining end-to-end deep learning. Page 4 Machine Learning Yearning-Draft Andrew Ng

Neural networks are commonly used in end-to-end learning systems. The term end-to-end refers to the fact that we are asking the learning algorithm to go directly from the input to the desired output. I.e., the learning algorithm directly connects the input end of the system to the output end. In problems where data is abundant, end-to-end systems have been remarkably successful. But they are not always a good choice. The next few chapters will give more examples of end-to-end systems as well as give advice on when you should and should not use them. Page 5 Machine Learning Yearning-Draft Andrew Ng

48 More end-to-end learning examples Suppose you want to build a speech recognition system. You might build a system with three components: The components work as follows: 1. Compute features: Extract hand-designed features, such as MFCC ( Mel-frequency cepstrum coefficients) features, which try to capture the content of an utterance while disregarding less relevant properties, such as the speaker s pitch. 2. Phoneme recognizer: Some linguists believe that there are basic units of sound called phonemes. For example, the initial k sound in keep is the same phoneme as the c sound in cake. This system tries to recognize the phonemes in the audio clip. 3. Final recognizer: Take the sequence of recognized phonemes, and try to string them together into an output transcript. In contrast, an end-to-end system might input an audio clip, and try to directly output the transcript: So far, we have only described machine learning pipelines that are completely linear: the output is sequentially passed from one staged to the next. Pipelines can be more complex. For example, here is a simple architecture for an autonomous car: Page 6 Machine Learning Yearning-Draft Andrew Ng

It has three components: One detects other cars using the camera images; one detects pedestrians; then a final component plans a path for our own car that avoids the cars and pedestrians. Not every component in a pipeline has to be learned. For example, the literature on robot motion planning has numerous algorithms for the final path planning step for the car. Many of these algorithms do not involve learning. In contrast, and end-to-end approach might try to take in the sensor inputs and directly output the steering direction: Even though end-to-end learning has seen many successes, it is not always the best approach. For example, end-to-end speech recognition works well. But I m skeptical about end-to-end learning for autonomous driving. The next few chapters explain why. Page 7 Machine Learning Yearning-Draft Andrew Ng

49 Pros and cons of end-to-end learning Consider the same speech pipeline from our earlier example: Many parts of this pipeline were hand-engineered : MFCCs are a set of hand-designed audio features. Although they provide a reasonable summary of the audio input, they also simplify the input signal by throwing some information away. Phonemes are an invention of linguists. They are an imperfect representation of speech sounds. To the extent that phonemes are a poor approximation of reality, forcing an algorithm to use a phoneme representation will limit the speech system s performance. These hand-engineered components limit the potential performance of the speech system. However, allowing hand-engineered components also has some advantages: The MFCC features are robust to some properties of speech that do not affect the content, such as speaker pitch. Thus, they help simplify the problem for the learning algorithm. To the extent that phonemes are a reasonable representation of speech, they can also help the learning algorithm understand basic sound components and therefore improve its performance. Having more hand-engineered components generally allows a speech system to learn with less data. The hand-engineered knowledge captured by MFCCs and phonemes supplements the knowledge our algorithm acquires from data. When we don t have much data, this knowledge is useful. Now, consider the end-to-end system: Page 8 Machine Learning Yearning-Draft Andrew Ng

This system lacks the hand-engineered knowledge. Thus, when the training set is small, it might do worse than the hand-engineered pipeline. However, when the training set is large, then it is not hampered by the limitations of an MFCC or phoneme-based representation. If the learning algorithm is a large-enough neural network and if it is trained with enough training data, it has the potential to do very well, and perhaps even approach the optimal error rate. End-to-end learning systems tend to do well when there is a lot of labeled data for both ends the input end and the output end. In this example, we require a large dataset of (audio, transcript) pairs. When this type of data is not available, approach end-to-end learning with great caution. If you are working on a machine learning problem where the training set is very small, most of your algorithm s knowledge will have to come from your human insight. I.e., from your hand engineering components. If you choose not to use an end-to-end system, you will have to decide what are the steps in your pipeline, and how they should plug together. In the next few chapters, we ll give some suggestions for designing such pipelines. Page 9 Machine Learning Yearning-Draft Andrew Ng