Lecture 6: Course Project Introduction and Deep Learning Preliminaries
|
|
- Marcus Barton
- 1 years ago
- Views:
Transcription
1 CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 6: Course Project Introduction and Deep Learning Preliminaries
2 Outline for Today Course projects What makes for a successful project Leveraging existing tools Project archetypes and considerations Discussion Deep learning preliminaries
3 Silence models for HMM-GMM SIL is a phoneme to a recognizer Always inserted at start and end of utterance Corrupting silence with bad forced alignments can break recognizer training (silence eats everything) The sound of silence Turns out to be difficult to model! Silence GMM models must capture lots of noise artifacts, breathing, laughing (depending on data transcription standards) Microphones in the wild with background noise make SIL/non-speech even more difficult Special models for silence transition since we often stay there a long time
4 Course project goals A substantial piece of work related to topics specific to this course A successful project Results in most of a conference paper submission if academically oriented A portfolio item / work sample for job interviews related to ML, NLP, or SLP Reflects deeper understanding of SLP technology than simply applying existing API s for ASR, voice commands, etc. No midterm or final exam to allow more focus on projects
5 A successful project Course-relevant topic. Proposed experiments or system address a challenging, unsolved SLP problem Proposes and executes a sensible approach informed by previous related work Performs error analysis to understand what aspects of the system are good/bad Adapts system or introduces new hypotheses/components based on initial error analysis Goes beyond simply combining existing components / tools to solve a standard problem
6 Complexity and focus SLP systems are some of the most complex in AI Example: A simple voice command system contains: Speech recognizer (Language model, pronunciation lexicon, acoustic model, decoder, lots of training options) Intent/command slot filling (some combination of lexicon, rules, and ML to handle variation) Get a complete baseline system working by milestone Focus on a subset of all areas to make a bigger contribution there. APIs/tools are a great choice for areas not directly relevant to your focus
7 Balancing scale and depth Working on real scale datasets/problems is a plus But don t let scale distract from getting to the meat of your technical contribution Example: Comparing some neural architectures for end-to-end speech recognition Case 1: Use WSJ. Medium sized corpus, read speech. SOTA error rates ~3% Case 2: Use Switchboard: Large, conversational corpus. SOTA error rates ~15% Case 2 stronger overall if you run the same experiments / error analysis. Don t let scale prevent thoughtful loops
8 Thoughtful loops A single loop: Try something reasonable Perform relatively detailed error analysis using what we know from the course Propose a modification / new experiment based on what you find Try it! Repeat above A successful project does this at least once Scale introduces risk of overly slow loops Ablative analysis or oracle experiments are a great way to guide what system component to work on
9 Oracle experiments Slide from Andrew Ng s CS229 lecture on applying ML
10 Ablation experiments Slide from Andrew Ng s CS229 lecture on applying ML
11 Ablation experiments Slide from Andrew Ng s CS229 lecture on applying ML
12 Pitfalls in project planning Data! What dataset will you use for your task? If you need to collect data, why? Understand that a project with a lot of required data collection creates high risk of not being able to execute enough loops Do you really need to collect data? Really? Overly complex baseline system Relying on external tools to the point that connecting them becomes the entire effort and makes innovation hard Off-topic. Could this be a CS 229 project instead?
13 Deliverables All projects Proposal: What task, dataset, evaluation metrics and approach outline? Milestone: Have you gotten your data and built a baseline for your task? Final paper: Methods, results, related work, conclusions. Should read like aconference paper Audio/Visual material Include links to audio samples for TTS. Screen capture videos for dialog interactions (spoken dialog especially) Much easier to understand your contribution this way than leave us to guess. Even if it doesn t quite work. Available on laptop at poster session (live demo!)
14 Leveraging existing tools Free to use any tool, but realize using the Google speech API does not constitute building a recognizer Ensure the tool does not prevent trying the algorithmic modifications of interest (e.g. can t do acoustic model research on speech API s) Projects that combine existing tools in a straightforward way should be avoided Conversely, almost every project can and should use some form of tool: Tensorflow, speech API, language model toolkit, Kaldi, etc. Use tools to focus on your project hypotheses
15 Error analysis with tools Project writeup / presentation should be able to explain: What goal does this tool achieve for our system? Is the tool a source of errors? (e.g. oracle error rate for a speech API) How could this tool be modified / replaced to improve the system? (maybe it is perfect and that s okay) As with any component, important to isolate sources of errors Work with tools in a way that reflects your deeper understanding of what they do internally (e.g. n-best lists)
16 Sample of tools and APIs Speech APIs: Google, IBM, Microsoft all have options Varying levels of customization and conveying n-best Speech synthesis APIs: same as speech + Festival Slack or Facebook for text dialog interfaces Slack allows downloading of historical data which could help train systems Howdy.ai / botkit for integration Intent recognition APIs Wit.ai, API.ai. Amazon Alexa
17 Sample project archetypes
18 Speech recognition research Benchmark corpus (WSJ, Switchboard, noisy ASR on CHIME) Baseline system in Kaldi. State of the art known Template very amenable to publication in speech or machine learning conferences Can be very difficult to improve on state of the art. The best systems have a lot of heuristics that might not be in Kaldi Systems can be cumbersome to train Lots of algorithmic variations to try Successful projects do not need to improve on best existing results
19 Speech synthesis Blizzard challenge provides training data and systems for comparison Evaluation is difficult. No single metric Matching state of the art can be very tedious signal processing Open realm of experiments to try, especially working to be expressive or improve prosody Relatively large systems without the convenience of a tool like Kaldi
20 Extracting affect from speech Beyond transcription, understanding emotion, accent, or mental state (intoxication, depression, Parkinson s etc.) Very dataset dependent. How will you access labeled data to train a system? Can t be just a classifier. Need to use insights from this course or combine with speech recognition Should be spoken rather than just written text
21 Dialog systems Build a dialog system for a task that interests you (bartender, medical guidance, chess) Must be multi-turn. Not just voice commands or single slot intent recognizers Evaluation is difficult, likely will have to collect any training data yourself Don t over-invest in knowledge engineering Lots of room to be creative and design interactions to hide system limitations More difficult to publish smaller scale systems, but make for great demos / portfolio items
22 Deep learning approaches Active area of research for every area of SLP Beware: Do you have enough training data compared to the most similar paper to your approach? Do you have enough compute power? How long will a single model take to train? Think about your time to complete one loop Ensure you are doing SLP experiments not just tuning neural nets for a dataset Hot area for academic publications at the moment
23 Summary Have fun Build something you re proud of Project ideas posted to Piazza by Friday and more through next week
24 Discussion/Questions
25 Outline for Today Course projects What makes for a successful project Leveraging existing tools Project archetypes and considerations Discussion Deep learning preliminaries
26 Neural Network Basics: Single Unit Logistic regression as a neuron x 1 w 1 x 2 w 2 Σ Output w 3 x 3 b +1 Slides from Awni Hannun (CS221 Autumn 2013)
27 Single Hidden Layer Neural Network Stack many logistic units to create a Neural Network x 1 w 11 w 21 a 1 x 2 a 2 x 3 +1 Layer 1 / Input +1 Layer 2 / hidden layer Layer 3 / output Slides from Awni Hannun (CS221 Autumn 2013)
28 Slides from Awni Hannun (CS221 Autumn 2013) Notation
29 Forward Propagation x 1 w 11 w 21 x 2 x Slides from Awni Hannun (CS221 Autumn 2013)
30 Forward Propagation x 1 x 2 x 3 +1 Layer 1 / Input +1 Layer 2 / hidden layer Layer 3 / output Slides from Awni Hannun (CS221 Autumn 2013)
31 Forward Propagation with Many Hidden Layers Layer l Layer l+1 Slides from Awni Hannun (CS221 Autumn 2013)
32 Forward Propagation as a Single Function Gives us a single non-linear function of the input But what about multi-class outputs? Replace output unit for your needs Softmax output unit instead of sigmoid Slides from Awni Hannun (CS221 Autumn 2013)
33 Objective Function for Learning Supervised learning, minimize our classification errors Standard choice: Cross entropy loss function Straightforward extension of logistic loss for binary This is a frame-wise loss. We use a label for each frame from a forced alignment Other loss functions possible. Can get deeper integration with the HMM or word error rate
34 The Learning Problem Find the optimal network weights How do we do this in practice? Non-convex Gradient-based optimization Simplest is stochastic gradient descent (SGD) Many choices exist. Area of active research
35 Computing Gradients: Backpropagation Backpropagation Algorithm to compute the derivative of the loss function with respect to the parameters of the network Slides from Awni Hannun (CS221 Autumn 2013)
36 Recall our NN as a single function: Chain Rule x g f Slides from Awni Hannun (CS221 Autumn 2013)
37 Chain Rule g 1 x f g 2 Slides from Awni Hannun (CS221 Autumn 2013)
38 Chain Rule g 1 x... f g n Slides from Awni Hannun (CS221 Autumn 2013)
39 Backpropagation Idea: apply chain rule recursively w 1 w 2 w 3 f 1 x f 2 f 3 δ (3) δ (2) Slides from Awni Hannun (CS221 Autumn 2013)
40 Backpropagation x 1 x 2 δ (3) Loss x Slides from Awni Hannun (CS221 Autumn 2013)
41 Neural network with regression loss Minimize Output Layer Hidden Layer Noisy Input
42 Recurrent Network Output Layer Hidden Layer Noisy Input
43 Deep Recurrent Network Output Layer Hidden Layer Hidden Layer Noisy Input
44 Compute graphs
CS 510: Lecture 8. Deep Learning, Fairness, and Bias
CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already
Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)
Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling
Natural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks and Language Models Abigail See Announcements Assignment 1: Grades will be released after class Assignment
Deep learning for automatic speech recognition. Mikko Kurimo Department for Signal Processing and Acoustics Aalto University
Deep learning for automatic speech recognition Mikko Kurimo Department for Signal Processing and Acoustics Aalto University Mikko Kurimo Associate professor in speech and language processing Background
Article from. Predictive Analytics and Futurism December 2015 Issue 12
Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural
Modeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
Machine Learning for SAS Programmers
Machine Learning for SAS Programmers The Agenda Introduction of Machine Learning Supervised and Unsupervised Machine Learning Deep Neural Network Machine Learning implementation Questions and Discussion
Phoneme Recognition Using Deep Neural Networks
CS229 Final Project Report, Stanford University Phoneme Recognition Using Deep Neural Networks John Labiak December 16, 2011 1 Introduction Deep architectures, such as multilayer neural networks, can be
Modeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
Lecture 7: Distributed Representations
Lecture 7: Distributed Representations Roger Grosse 1 Introduction We ll take a break from derivatives and optimization, and look at a particular example of a neural net that we can train using backprop:
Computer Vision for Card Games
Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program
Sequence Discriminative Training;Robust Speech Recognition1
Sequence Discriminative Training; Robust Speech Recognition Steve Renals Automatic Speech Recognition 16 March 2017 Sequence Discriminative Training;Robust Speech Recognition1 Recall: Maximum likelihood
CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-
Tencent AI Lab Rhino-Bird Visiting Scholar Program. Research Topics
Tencent AI Lab Rhino-Bird Visiting Scholar Program Research Topics 1. Computer Vision Center Interested in multimedia (both image and video) AI, including: 1.1 Generation: theory and applications (e.g.,
In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples
Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples 2017-09-30 2 1 To enable
Discriminative Learning of Feature Functions of Generative Type in Speech Translation
Discriminative Learning of Feature Functions of Generative Type in Speech Translation Xiaodong He Microsoft Research, One Microsoft Way, Redmond, WA 98052 USA Li Deng Microsoft Research, One Microsoft
Using Word Confusion Networks for Slot Filling in Spoken Language Understanding
INTERSPEECH 2015 Using Word Confusion Networks for Slot Filling in Spoken Language Understanding Xiaohao Yang, Jia Liu Tsinghua National Laboratory for Information Science and Technology Department of
SPEECH RECOGNITION WITH PREDICTION-ADAPTATION-CORRECTION RECURRENT NEURAL NETWORKS
SPEECH RECOGNITION WITH PREDICTION-ADAPTATION-CORRECTION RECURRENT NEURAL NETWORKS Yu Zhang MIT CSAIL Cambridge, MA, USA yzhang87@csail.mit.edu Dong Yu, Michael L. Seltzer, Jasha Droppo Microsoft Research
Deep (Structured) Learning
Deep (Structured) Learning Yasmine Badr 06/23/2015 NanoCAD Lab UCLA What is Deep Learning? [1] A wide class of machine learning techniques and architectures Using many layers of non-linear information
COMP150 DR Final Project Proposal
COMP150 DR Final Project Proposal Ari Brown and Julie Jiang October 26, 2017 Abstract The problem of sound classification has been studied in depth and has multiple applications related to identity discrimination,
Discriminative Learning of Feature Functions of Generative Type in Speech Translation
Discriminative Learning of Feature Functions of Generative Type in Speech Translation Xiaodong He Microsoft Research, One Microsoft Way, Redmond, WA 98052 USA Li Deng Microsoft Research, One Microsoft
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition Paul Hensch 21.01.2014 Seminar aus maschinellem Lernen 1 Large-Vocabulary Speech Recognition Complications 21.01.2014
Intro to Deep Learning for Core ML
Intro to Deep Learning for Core ML It s Difficult to Make Predictions. Especially About the Future. @JulioBarros Consultant E-String.com @JulioBarros http://e-string.com 1 Core ML "With Core ML, you can
Artificial Neural Networks. Andreas Robinson 12/19/2012
Artificial Neural Networks Andreas Robinson 12/19/2012 Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically
Deep Learning for Amazon Food Review Sentiment Analysis
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
COMP 551 Applied Machine Learning Lecture 11: Ensemble learning
COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551
(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
Scaling Quality On Quora Using Machine Learning
Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Goals Of The Talk Introducing specific product problems we need to solve to stay high-quality Describing
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
Python Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
Large Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Introduction to Deep Learning U Kang Seoul National University U Kang 1 In This Lecture Overview of deep learning History of deep learning and its recent advances
Programming Assignment2: Neural Networks
Programming Assignment2: Neural Networks Problem :. In this homework assignment, your task is to implement one of the common machine learning algorithms: Neural Networks. You will train and test a neural
CS 224N/229: Joint Final Project: Large-Vocabulary Continuous Speech Recognition with Linguistic Features for Deep Learning
CS 224N/229: Joint Final Project: Large-Vocabulary Continuous Speech Recognition with Linguistic Features for Deep Learning Peng Qi Abstract Until this day, automated speech recognition (ASR) still remains
L12: Template matching
Introduction to ASR Pattern matching Dynamic time warping Refinements to DTW L12: Template matching This lecture is based on [Holmes, 2001, ch. 8] Introduction to Speech Processing Ricardo Gutierrez-Osuna
COMP 551 Applied Machine Learning Lecture 12: Ensemble learning
COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
The 1997 CMU Sphinx-3 English Broadcast News Transcription System
The 1997 CMU Sphinx-3 English Broadcast News Transcription System K. Seymore, S. Chen, S. Doh, M. Eskenazi, E. Gouvêa, B. Raj, M. Ravishankar, R. Rosenfeld, M. Siegler, R. Stern, and E. Thayer Carnegie
Lecture 1: Introduction, ARPAbet, Articulatory Phonetics
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Lecture 1: Introduction, ARPAbet, Articulatory Phonetics April 3, Week 1 Course introduction
Twitter Sentiment Analysis with Recursive Neural Networks
Twitter Sentiment Analysis with Recursive Neural Networks Ye Yuan, You Zhou Department of Computer Science Stanford University Stanford, CA 94305 {yy0222, youzhou}@stanford.edu Abstract In this paper,
Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition
Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition Alex Graves 1, Santiago Fernández 1, Jürgen Schmidhuber 1,2 1 IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland {alex,santiago,juergen}@idsia.ch
Classification with Deep Belief Networks. HussamHebbo Jae Won Kim
Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief
Speech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
Word Sense Determination from Wikipedia. Data Using a Neural Net
1 Word Sense Determination from Wikipedia Data Using a Neural Net CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University By Qiao Liu May 2017 Word Sense Determination
Slides credited from Richard Socher
Slides credited from Richard Socher Sequence Modeling Idea: aggregate the meaning from all words into a vector Compositionality Method: Basic combination: average, sum Neural combination: Recursive neural
Lecture 10: Dialogue System Introduction and Frame-Based Dialogue
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 10: Dialogue System Introduction and Frame-Based Dialogue Original slides by Dan Jurafsky Dialog section
Linear Models Continued: Perceptron & Logistic Regression
Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function
Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis
Target Target Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis Vanika Singhal, Anupriya Gogna and Angshul Majumdar Indraprastha Institute of Information Technology,
Gender Classification Based on FeedForward Backpropagation Neural Network
Gender Classification Based on FeedForward Backpropagation Neural Network S. Mostafa Rahimi Azghadi 1, M. Reza Bonyadi 1 and Hamed Shahhosseini 2 1 Department of Electrical and Computer Engineering, Shahid
The Generalized Delta Rule and Practical Considerations
The Generalized Delta Rule and Practical Considerations Introduction to Neural Networks : Lecture 6 John A. Bullinaria, 2004 1. Training a Single Layer Feed-forward Network 2. Deriving the Generalized
Foreign Accent Classification
Foreign Accent Classification CS 229, Fall 2011 Paul Chen pochuan@stanford.edu Julia Lee juleea@stanford.edu Julia Neidert jneid@stanford.edu ABSTRACT We worked to create an effective classifier for foreign
CS224n: Homework 4 Reading Comprehension
CS224n: Homework 4 Reading Comprehension Leandra Brickson, Ryan Burke, Alexandre Robicquet 1 Overview To read and comprehend the human languages are challenging tasks for the machines, which requires that
An Introduction to Deep Learning. Labeeb Khan
An Introduction to Deep Learning Labeeb Khan Special Thanks: Lukas Masuch @lukasmasuch +lukasmasuch Lead Software Engineer: Machine Intelligence, SAP The Big Players Companies The Big Players Startups
Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions
CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI
Deep Semantic Encodings for Language Modeling
INERSPEECH 2015 Deep Semantic Encodings for Language Modeling Ali Orkan Bayer and Giuseppe Riccardi Signals and Interactive Systems Lab - University of rento, Italy {bayer, riccardi}@disi.unitn.it Abstract
Introduction to Deep Learning
Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.
Robust DNN-based VAD augmented with phone entropy based rejection of background speech
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Robust DNN-based VAD augmented with phone entropy based rejection of background speech Yuya Fujita 1, Ken-ichi Iso 1 1 Yahoo Japan Corporation
Detection of Insults in Social Commentary
Detection of Insults in Social Commentary CS 229: Machine Learning Kevin Heh December 13, 2013 1. Introduction The abundance of public discussion spaces on the Internet has in many ways changed how we
CS519: Deep Learning 1. Introduction
CS519: Deep Learning 1. Introduction Winter 2017 Fuxin Li With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim 1 Cutting Edge of Machine Learning: Deep Learning
Models of Dialog and Conversation
CS11-747 Neural Networks for NLP Models of Dialog and Conversation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Types of Dialog Who is talking? Human-human Human-computer Why are they talking?
Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models
Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models Yajie Miao Hao Zhang Florian Metze Language Technologies Institute School of Computer Science Carnegie Mellon University 1 / 23
Session 1: Gesture Recognition & Machine Learning Fundamentals
IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research
A study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
Pronunciation Modeling. Te Rutherford
Pronunciation Modeling Te Rutherford Bottom Line Fixing pronunciation is much easier and cheaper than LM and AM. The improvement from the pronunciation model alone can be sizeable. Overview of Speech
Speeding up ResNet training
Speeding up ResNet training Konstantin Solomatov (06246217), Denis Stepanov (06246218) Project mentor: Daniel Kang December 2017 Abstract Time required for model training is an important limiting factor
Load Forecasting with Artificial Intelligence on Big Data
1 Load Forecasting with Artificial Intelligence on Big Data October 9, 2016 Patrick GLAUNER and Radu STATE SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 2
Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time
Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably
Lecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
ROBUST SPEECH RECOGNITION FROM RATIO MASKS. {wangzhon,
ROBUST SPEECH RECOGNITION FROM RATIO MASKS Zhong-Qiu Wang 1 and DeLiang Wang 1, 2 1 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive and Brain Sciences,
Introduction to Deep Learning. Welcome. deeplearning.ai. Andrew Ng
Introduction to Deep Learning Welcome deeplearning.ai AI is the new Electricity Electricity had once transformed countless industries: transportation, manufacturing, healthcare, communications, and more
A user friendly translation system for first responders PTC Research Project
Humanitarian Babel Fish A user friendly translation system for first responders PTC Research Project US English Proof of Concept Cebuano Audio In Audio Out Automatic Speech Recognition Text to Speech
Tiny ImageNet Image Classification Alexei Bastidas Stanford University
Tiny ImageNet Image Classification Alexei Bastidas Stanford University alexeib@stanford.edu Abstract In this work, I investigate how fine-tuning and adapting existing models, namely InceptionV3[7] and
News Authorship Identification with Deep Learning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
arxiv: v1 [cs.cl] 2 Jun 2015
Learning Speech Rate in Speech Recognition Xiangyu Zeng 1,3, Shi Yin 1,4, Dong Wang 1,2 1 CSLT, RIIT, Tsinghua University 2 TNList, Tsinghua University 3 Beijing University of Posts and Telecommunications
Neural Network Based Pitch Control for Various Sentence Types. Volker Jantzen Speech Processing Group TIK, ETH Zürich, Switzerland
Neural Network Based Pitch Control for Various Sentence Types Volker Jantzen Speech Processing Group TIK, ETH Zürich, Switzerland Overview Introduction Preparation steps Prosody corpus Prosodic transcription
Deep learning for music genre classification
Deep learning for music genre classification Tao Feng University of Illinois taofeng1@illinois.edu Abstract In this paper we will present how to use Restricted Boltzmann machine algorithm to build deep
Improving Paragraph2Vec
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Artificial Intelligence with DNN
Artificial Intelligence with DNN Jean-Sylvain Boige Aricie jsboige@aricie.fr Please support our valuable sponsors Summary Introduction to AI What is AI? Agent systems DNN environment A Tour of AI in DNN
CS519: Deep Learning. Winter Fuxin Li
CS519: Deep Learning Winter 2017 Fuxin Li Course Information Instructor: Dr. Fuxin Li KEC 2077, lif@eecs.oregonstate.edu TA: Mingbo Ma: mam@oregonstate.edu Xu Xu: xux@oregonstate.edu My office hour: TBD
Dynamic Memory Networks for Question Answering
Dynamic Memory Networks for Question Answering Arushi Raghuvanshi Department of Computer Science Stanford University arushi@stanford.edu Patrick Chase Department of Computer Science Stanford University
Deep Learning Introduction
Deep Learning Introduction Christian Szegedy Geoffrey Irving Google Research Machine Learning Supervised Learning Task Assume Ground truth G Model architecture f Prediction metric σ Training samples Find
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean in Google Brain[2013] University of Gothenburg Master in Language Technology Sung Min Yang
Sentiment Analysis of Speech
Sentiment Analysis of Speech Aishwarya Murarka 1, Kajal Shivarkar 2, Sneha 3, Vani Gupta 4,Prof.Lata Sankpal 5 Student, Department of Computer Engineering, Sinhgad Academy of Engineering, Pune, India 1-4
Reinforcement Learning
Reinforcement Learning based Dialog Manager Speech Group Department of Signal Processing and Acoustics Katri Leino User Interface Group Department of Communications and Networking Aalto University, School
Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network
Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities
Lecture 16 Speaker Recognition
Lecture 16 Speaker Recognition Information College, Shandong University @ Weihai Definition Method of recognizing a Person form his/her voice. Depends on Speaker Specific Characteristics To determine whether
HYBRID SPEECH RECOGNITION WITH DEEP BIDIRECTIONAL LSTM. Alex Graves, Navdeep Jaitly and Abdel-rahman Mohamed
HYBRID SPEECH RECOGNITION WITH DEEP BIDIRECTIONAL LSTM Alex Graves, Navdeep Jaitly and Abdel-rahman Mohamed University of Toronto Department of Computer Science 6 King s College Rd. Toronto, M5S 3G4, Canada
Machine Learning: Neural Networks. Junbeom Park Radiation Imaging Laboratory, Pusan National University
Machine Learning: Neural Networks Junbeom Park (pjb385@gmail.com) Radiation Imaging Laboratory, Pusan National University 1 Contents 1. Introduction 2. Machine Learning Definition and Types Supervised
DEEP LEARNING AND ITS APPLICATION NEURAL NETWORK BASICS
DEEP LEARNING AND ITS APPLICATION NEURAL NETWORK BASICS Argument on AI 1. Symbolism 2. Connectionism 3. Actionism Kai Yu. SJTU Deep Learning Lecture. 2 Argument on AI 1. Symbolism Symbolism AI Origin Cognitive
First Workshop Data Science: Theory and Application RWTH Aachen University, Oct. 26, 2015
First Workshop Data Science: Theory and Application RWTH Aachen University, Oct. 26, 2015 The Statistical Approach to Speech Recognition and Natural Language Processing Hermann Ney Human Language Technology
Introducing Deep Learning with MATLAB
Introducing Deep Learning with MATLAB What is Deep Learning? Deep learning is a type of machine learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep
Mocking the Draft Predicting NFL Draft Picks and Career Success
Mocking the Draft Predicting NFL Draft Picks and Career Success Wesley Olmsted [wolmsted], Jeff Garnier [jeff1731], Tarek Abdelghany [tabdel] 1 Introduction We started off wanting to make some kind of
Perspective on HPC-enabled AI Tim Barr September 7, 2017
Perspective on HPC-enabled AI Tim Barr September 7, 2017 AI is Everywhere 2 Deep Learning Component of AI The punchline: Deep Learning is a High Performance Computing problem Delivers benefits similar
Speech Recognition for Dialects & Spoken Tutorials
Speech Recognition for Dialects & Spoken Tutorials M.Tech. 1 Seminar Topics Preethi Jyothi Department of CSE, IIT Bombay Automatic Speech Recognition Automatic Speech Recognition (ASR) is one of the oldest
Lip Reader: Video-Based Speech Transcriber
Lip Reader: Video-Based Speech Transcriber Bora Erden Max Wolff Sam Wood 1. Introduction We set out to build a lip-reader, which would take audio-free videos of people speaking and reconstruct their spoken
SEQUENCE TRAINING OF MULTIPLE DEEP NEURAL NETWORKS FOR BETTER PERFORMANCE AND FASTER TRAINING SPEED
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SEQUENCE TRAINING OF MULTIPLE DEEP NEURAL NETWORKS FOR BETTER PERFORMANCE AND FASTER TRAINING SPEED Pan Zhou 1, Lirong
arxiv: v3 [cs.lg] 9 Mar 2014
Learning Factored Representations in a Deep Mixture of Experts arxiv:1312.4314v3 [cs.lg] 9 Mar 2014 David Eigen 1,2 Marc Aurelio Ranzato 1 Ilya Sutskever 1 1 Google, Inc. 2 Dept. of Computer Science, Courant
Sphinx Benchmark Report
Sphinx Benchmark Report Long Qin Language Technologies Institute School of Computer Science Carnegie Mellon University Overview! uate general training and testing schemes! LDA-MLLT, VTLN, MMI, SAT, MLLR,
545 Machine Learning, Fall 2011
545 Machine Learning, Fall 2011 Final Project Report Experiments in Automatic Text Summarization Using Deep Neural Networks Project Team: Ben King Rahul Jha Tyler Johnson Vaishnavi Sundararajan Instructor:
A Review on Classification Techniques in Machine Learning
A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College
Speech Accent Classification
Speech Accent Classification Corey Shih ctshih@stanford.edu 1. Introduction English is one of the most prevalent languages in the world, and is the one most commonly used for communication between native
Evolution of Neural Networks. October 20, 2017
Evolution of Neural Networks October 20, 2017 Single Layer Perceptron, (1957) Frank Rosenblatt 1957 1957 Single Layer Perceptron Perceptron, invented in 1957 at the Cornell Aeronautical Laboratory by Frank