Supervised Learning with Neural Networks and Machine Translation with LSTMs
|
|
- Sylvia Skinner
- 5 years ago
- Views:
Transcription
1 Supervised Learning with Neural Networks and Machine Translation with LSTMs Ilya Sutskever in collaboration with: Minh-Thang Luong Quoc Le Oriol Vinyals Wojciech Zaremba Google Brain
2
3
4
5
6
7
8 Deep Neural Networks 1. Can perform an astonishingly wide range of computations 2. Can be learned automatically powerful models learnable models deep neural networks
9 Powerful models are necessary A weak model will never get good performance Examples of weak models: Single layer logistic regression Linear SVM Small neural nets Small conv nets A neural network needs to be large and deep to be powerful powerful models learnable models deep neural networks
10 Why are deep nets powerful? A single neuron can implement boolean logic, and thus arbitrary computation OR AND NOT
11 Why are deep nets powerful? A single neuron can implement boolean logic, and general thus computation and computers Mid-sized 2-hidden layer neural network can sort N N-bit numbers Intuitively, sorting requires log N parallel steps Backpropagation can find this circuit
12 Why are deep nets powerful? A single neuron can implement boolean logic, and general thus computation and computers Mid-sized 2-hidden layer neural network can sort N N-bit numbers Intuitively, sorting requires log N parallel steps Backpropagation can find this circuit Neurons are more economical than boolean logic
13 The Deep Learning Hypothesis Human perception is fast Neurons fire at most 100 times a second Humans solve perception in 0.1 seconds our neurons fire 10 times, at most Anything a human can do in 0.1 seconds, a big 10layer neural network can do, too! 20-layer neural networks can be trained well in practice Two years ago we could only train 10-layer networks
14 Implication DNNs, once trained, should do well on all perception problem Vision, speech, emotion, face recognition, instinct If there exists a human expert that can solve hard problems in a fraction of a second, then large deep neural networks could do so, too Instantaneous translation Speed reading Identifying the obvious thing to do in a complicated situation or a game
15 Learning Powerful models are useless unless we can train them Supervised backpropagation works! Not clear why 20-layer neural nets easily trainable with backprop powerful models learnable models deep neural networks
16 Learning Algorithm While not done Pick an example (x, y) Run the network of x to get a prediction p Use the gradient to bring p slightly closer to y Theory must make nontrivial assumptions about dataset powerful models learnable models deep neural networks
17 How to solve hard problems? Use a lot of good AND labelled training data Use a big deep neural network powerful models learnable models deep neural networks
18 How to solve hard problems? Use a lot of good AND labelled training data Use a big deep neural network Success is the only possible outcome powerful models learnable models deep neural networks
19 The deep learning hypothesis is true! Not a controversial statement Big deep nets get the best results ever on: Speech recognition Object recognition Language modelling
20 Summary Big deep nets with layers can do great things Supervised backpropagation can train layer nets Ergo: we can do a whole lot if we have a large, good supervised training set
21 Deep nets can t solve all problems???
22 Deep nets can t solve all problems DNNs couldn t solve problems where the input and the output are very structured Sequence to sequence Graph to graph So far, we ve addressed the most basic form of supervised learning
23 Key limitation Inputs and outputs must be fixed-sized vectors Great for images: input is a big image of a fixed size output is a 1-of-N encoding of category
24 Key limitation Inputs and outputs must be fixed-sized vectors Great for images: input is a big image of a fixed size output is a 1-of-N encoding of category output Bad news for machine translation, question answering, squiggle recognition The enemy: Unit-specific connections Input
25 Our contribution: solving the sequence to sequence problem It s a fundamental capability Applications: MT, Q&A, ASR, squiggle recognition Nice feature: Our approach has minimal innovation We demonstrate that the approach is viable by matching state-of-the-art results on machine translation State-of-the-art in MT is strong So approach should do well on many other tasks too
26 Recurrent Neural Networks (RNNs) RNNs can work with sequences t=1 t=2 t=3 t=4 t=5 t=6 out out out out out out hid hid hid hid hid hid inp inp inp inp inp inp Key idea: each timestep has a layer with the same weights Time Problem is solved
27 Recurrent Neural Networks (RNNs) Are neural networks that can process sequences well Very expressive models Backpropagation is applicable Fun fact: recurrent neural networks were trained in the original backpropagation paper in 1986 Has trouble learning long-term dependencies Vanishing gradient problem (Hochreiter 1991; Bengio et al., 1994) There are ways to learn RNNs but they are complicated
28 Long Short-Term Memory (LSTM) Modify / hack the RNN architecture so that the vanishing gradient problem goes away Do so without sacrificing expressive power A model that achieves this purpose will be useful
29 Long Short-Term Memory (LSTM) RNNs overwrite the hidden state LSTMs add to the hidden state Addition has nice gradients All terms in a sum get a nice gradient LSTM is good at noticing long-range correlations It uses sums instead of overwriting Main advantage: requires little tuning Hugely important
30 RNN t=1 t=2 t=3 t=4 t=5 t=6 out out out out out out hid hid hid hid hid hid inp inp inp inp inp inp
31 LSTM t=1 t=2 t=3 t=4 t=5 t=6 out out out out out out hid inp + hid inp + hid inp + hid inp + hid inp + hid inp
32 The heart of the LSTM The memory cell output X M M + X H I1 I2 F O X H X
33 Sequence to sequence Length of input sequence = length of output sequence = bad Not good for either ASR and MT Existing strategies for mapping sequences to sequences has an HMM-like component Normal ASR approaches have a big complicated transducer The Connectionist Sequence Classification (CTC) assumes monotonic alignments, uses an HMM But we want something simpler and more generic Should be applicable to any sequence-to-sequence problem Including MT, where words can be reordered in many ways
34 Main idea Neural nets are excellent at learning very complicated functions Coerce a neural network / LSTM to read one sequence and produce another Learning will take care of the rest
35 Main idea Target sequence A B C Input sequence D X Y Z Q X Y Z
36 That s it! The LSTM needs to read the entire input sequence, and then produce the target sequence from memory The input sequence is stored by a single LSTM hidden state
37 Relevant Related Work Independently and simultaneously, Kyunghyun Cho et al. (Bengio lab) invented basically the same approach, with a model that s related to the LSTM More interestingly and impressively, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio developed a model that could learn to attend to different parts of the input sentence No need to remember the entire input sequence Maximal benefit with smaller hidden states
38 Step 1: can the LSTM reconstruct the input sentence? Can this scheme learn the identity function? Target sequence A B C D A B C D A B C Answer: it can, and it can do it very easily. It just does it effortlessly and perfectly.
39 Step 2: small dataset experiments: EuroParl French to English Low-entropy parliament language 20M words in total Small vocabulary Sentence length no longer than 25 The net was doing something non-trivial We were inspired
40 Digression: decoding Formally, given an input sentence, the LSTM defines a distribution over output sentences Therefore, we should produce the sentence with the highest probability But there are exponentially many sentences, how to find it? We use a simple greedy strategy
41 Decoding in a nutshell Proceed left to right Maintain N partial translations Expand each translation with possible next words Discard all but the top N new partial translations 2 partial hypothesis I My expand and sort expand hypotheses I decided My decision I thought I tried My thinking My direction 2 new partial hypotheses prune I decided My decision
42 Why does simple beam-search work? The LSTM is trained to predict the next word given previous words Maintain a list of partial translations Extend each partial translation, evaluate each extensions, and discard all but the top-k Most search improvement is obtained with a beam of size 2 A full 1 BLEU point
43 Model for big experiments 160K input words 80k output words 4 layers of 1000D LSTM different LSTMs for input and output language 384M parameters
44 The model A B C D 80k softmax by 1000 dims This is very big! 1000 LSTM cells 2000 dims per timestep 2000 x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
45 Parallelization Parallelization is important More parallelization is better -- ongoing work 8 GPUs Speed: 6,700 words per second Idea: layer per GPU
46 Learning parameters Learning parameters are very simple and straightforward: Learning rate = 0.7 / batch_size init scale = norm of gradient is clipped to 5 learning rate is halved every 0.5 epochs after 5 epochs No momentum (may be a mistake )
47 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
48 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
49 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
50 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
51 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
52 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
53 GPU6 GPU5 A B C D A B C D 80k softmax by 1000 dims This is very big! Split softmax into 4 GPUs GPU4 GPU3 GPU LSTM cells 2000 dims per timestep GPU x 4 = 8k dims per sentence A B C D A B C 160k vocab in input language
54 Representations X Y Z. X Y Z This thing A B C D
55 Representations
56 Representations
57 Reversing source sentences It is natural to train the LSTM on sentences in the source language followed by the target language A, B, C, A, B, C, It turns out that reversing the words in the source sentence substantially improves performance. Why? Because we introduce many short-term dependencies to the dataset that make the learning problem easier
58 Reversing source sentences, C, B, A A, B, C, A is very close to A B is fairly close to B Close = close in time Backpropagation can notice that A is connected to A and establish this connection It makes it easier to learn that B is connected to B Natural bootstrapping Backprop can notice the short-term dependencies first, and slowly extend them to long range dependencies
59 Results on a big dataset Corpus: WMT 14 English French 384M French words, 303M English words 70K test words BLEU score: large is good Bahdanau et al. One LSTM Phrase-based SMT baseline Ensemble of 5 LSTM State of the art We are doing OK, but we re still far from state of the art
60 Performance vs sentence length
61 Performance on rare words
62 Big Problem Performance deteriorates on sentences that has many rare words Vocabulary is limited, many words cannot be translated for that reason Most obvious flaw, worth fixing
63 Solving the rare word problem Example: input: I trained a veryrareword output: I trained a <unk> veryrareword is a very rare word. But model could know where it came from
64 Our solution Use a traditional word alignment algorithm Replace each <unk> in the target translation with an <unk-d> where d is an integer d indicates the position of the aligned word in the source sentence, if it exists
65 Our solution The model no longer needs to have every word in its vocabulary It only needs to know where the word originated from!
66 Procedure Train time: Annotate the unknown tokens with an alignment algorithm Train the LSTM to emit the annotations Test time: Use the LSTM to produce the annotated translations Translate each word with the word dictionary
67 Example
68 Example
69 Example
70 It works really well! We match state-of-the-art
71 Using only 1/3rd of the training data! Model takes many days to train, so we trained on ⅓rd of the data Additional models are being trained on the entire dataset, results are expected to get better Much larger dataset exist, and much larger neural nets should get much better
72 Depth helps
73 Depth helps Most gains are due to larger hidden state Better models benefit more from the postprocessing. Why? Because better models compute the annotations more accurately
74 Conclusions We match state-of-the-art in MT and will likely exceed it Near future: train a huge, giant model on much more translation data Apply these models to other sequence to sequence problems This work brings us closer to the complete solution to the problem of supervised learning
75 Conclusions If you have a large big dataset
76 Conclusions If you have a large big dataset And you train a very big neural network
77 Conclusions If you have a large big dataset And you train a very big neural network Then success is guaranteed!
78 Questions?
79 Thank you!
Residual Stacking of RNNs for Neural Machine Translation
Residual Stacking of RNNs for Neural Machine Translation Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Akiva Miura Nara Institute of Science and Technology miura.akiba.lr9@is.naist.jp
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationarxiv: v1 [cs.lg] 7 Apr 2015
Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationSecond Exam: Natural Language Parsing with Neural Networks
Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationarxiv: v4 [cs.cl] 28 Mar 2016
LSTM-BASED DEEP LEARNING MODELS FOR NON- FACTOID ANSWER SELECTION Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou IBM Watson Core Technologies Yorktown Heights, NY, USA {mingtan,cicerons,bingxia,zhou}@us.ibm.com
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationLip Reading in Profile
CHUNG AND ZISSERMAN: BMVC AUTHOR GUIDELINES 1 Lip Reading in Profile Joon Son Chung http://wwwrobotsoxacuk/~joon Andrew Zisserman http://wwwrobotsoxacuk/~az Visual Geometry Group Department of Engineering
More informationarxiv: v1 [cs.cl] 27 Apr 2016
The IBM 2016 English Conversational Telephone Speech Recognition System George Saon, Tom Sercu, Steven Rennie and Hong-Kwang J. Kuo IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationFramewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures
Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures Alex Graves and Jürgen Schmidhuber IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland TU Munich, Boltzmannstr.
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationDropout improves Recurrent Neural Networks for Handwriting Recognition
2014 14th International Conference on Frontiers in Handwriting Recognition Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham,Théodore Bluche, Christopher Kermorvant, and Jérôme
More informationarxiv: v5 [cs.ai] 18 Aug 2015
When Are Tree Structures Necessary for Deep Learning of Representations? Jiwei Li 1, Minh-Thang Luong 1, Dan Jurafsky 1 and Eduard Hovy 2 1 Computer Science Department, Stanford University, Stanford, CA
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationImprovements to the Pruning Behavior of DNN Acoustic Models
Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence
More informationarxiv: v1 [cs.cv] 10 May 2017
Inferring and Executing Programs for Visual Reasoning Justin Johnson 1 Bharath Hariharan 2 Laurens van der Maaten 2 Judy Hoffman 1 Li Fei-Fei 1 C. Lawrence Zitnick 2 Ross Girshick 2 1 Stanford University
More informationForget catastrophic forgetting: AI that learns after deployment
Forget catastrophic forgetting: AI that learns after deployment Anatoly Gorshechnikov CTO, Neurala 1 Neurala at a glance Programming neural networks on GPUs since circa 2 B.C. Founded in 2006 expecting
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationA Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1
More informationTRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen
TRANSFER LEARNING OF WEAKLY LABELLED AUDIO Aleksandr Diment, Tuomas Virtanen Tampere University of Technology Laboratory of Signal Processing Korkeakoulunkatu 1, 33720, Tampere, Finland firstname.lastname@tut.fi
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationGetting Started with Deliberate Practice
Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationLearning to Schedule Straight-Line Code
Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationDual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-6) Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors Sang-Woo Lee,
More informationSEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING
SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING Sheng Li 1, Xugang Lu 2, Shinsuke Sakai 1, Masato Mimura 1 and Tatsuya Kawahara 1 1 School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501,
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationEDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016
EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016 Instructor: Dr. Katy Denson, Ph.D. Office Hours: Because I live in Albuquerque, New Mexico, I won t have office hours. But
More informationHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION Atul Laxman Katole 1, Krishna Prasad Yellapragada 1, Amish Kumar Bedi 1, Sehaj Singh Kalra 1 and Mynepalli Siva Chaitanya 1 1 Samsung
More information12- A whirlwind tour of statistics
CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationLEGO MINDSTORMS Education EV3 Coding Activities
LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationThe A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation
2014 14th International Conference on Frontiers in Handwriting Recognition The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation Bastien Moysset,Théodore Bluche, Maxime Knibbe,
More informationDistributed Learning of Multilingual DNN Feature Extractors using GPUs
Distributed Learning of Multilingual DNN Feature Extractors using GPUs Yajie Miao, Hao Zhang, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationMathematics process categories
Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationAn empirical study of learning speed in backpropagation
Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1988 An empirical study of learning speed in backpropagation networks Scott E. Fahlman Carnegie
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationShort vs. Extended Answer Questions in Computer Science Exams
Short vs. Extended Answer Questions in Computer Science Exams Alejandro Salinger Opportunities and New Directions April 26 th, 2012 ajsalinger@uwaterloo.ca Computer Science Written Exams Many choices of
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationDisciplinary Literacy in Science
Disciplinary Literacy in Science 18 th UCF Literacy Symposium 4/1/2016 Vicky Zygouris-Coe, Ph.D. UCF, CEDHP vzygouri@ucf.edu April 1, 2016 Objectives Examine the benefits of disciplinary literacy for science
More informationWORK OF LEADERS GROUP REPORT
WORK OF LEADERS GROUP REPORT ASSESSMENT TO ACTION. Sample Report (9 People) Thursday, February 0, 016 This report is provided by: Your Company 13 Main Street Smithtown, MN 531 www.yourcompany.com INTRODUCTION
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationAsk Me Anything: Dynamic Memory Networks for Natural Language Processing
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Ankit Kumar*, Ozan Irsoy*, Peter Ondruska*, Mohit Iyyer*, James Bradbury, Ishaan Gulrajani*, Victor Zhong*, Romain Paulus, Richard
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationTHE world surrounding us involves multiple modalities
1 Multimodal Machine Learning: A Survey and Taxonomy Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency arxiv:1705.09406v2 [cs.lg] 1 Aug 2017 Abstract Our experience of the world is multimodal
More informationarxiv: v2 [cs.cl] 18 Nov 2015
MULTILINGUAL IMAGE DESCRIPTION WITH NEURAL SEQUENCE MODELS Desmond Elliott ILLC, University of Amsterdam; Centrum Wiskunde & Informatica d.elliott@uva.nl arxiv:1510.04709v2 [cs.cl] 18 Nov 2015 Stella Frank
More informationI-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.
Information Systems Frontiers manuscript No. (will be inserted by the editor) I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers. Ricardo Colomo-Palacios
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationCultivating DNN Diversity for Large Scale Video Labelling
Cultivating DNN Diversity for Large Scale Video Labelling Mikel Bober-Irizar mikel@mxbi.net Sameed Husain sameed.husain@surrey.ac.uk Miroslaw Bober m.bober@surrey.ac.uk Eng-Jon Ong e.ong@surrey.ac.uk Abstract
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationUsing focal point learning to improve human machine tacit coordination
DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated
More informationSegmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition
Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationChapter 4 - Fractions
. Fractions Chapter - Fractions 0 Michelle Manes, University of Hawaii Department of Mathematics These materials are intended for use with the University of Hawaii Department of Mathematics Math course
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationCLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department
More informationГлубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках
Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,
More informationarxiv: v3 [cs.cl] 24 Apr 2017
A Network-based End-to-End Trainable Task-oriented Dialogue System Tsung-Hsien Wen 1, David Vandyke 1, Nikola Mrkšić 1, Milica Gašić 1, Lina M. Rojas-Barahona 1, Pei-Hao Su 1, Stefan Ultes 1, and Steve
More informationRover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes
Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes WHAT STUDENTS DO: Establishing Communication Procedures Following Curiosity on Mars often means roving to places with interesting
More informationFull text of O L O W Science As Inquiry conference. Science as Inquiry
Page 1 of 5 Full text of O L O W Science As Inquiry conference Reception Meeting Room Resources Oceanside Unifying Concepts and Processes Science As Inquiry Physical Science Life Science Earth & Space
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More information