Large Scale Data Analysis Using Deep Learning

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Artificial Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Basic Concepts of Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.lg] 15 Jun 2015

Modeling function word errors in DNN-HMM based LVCSR systems

Word Segmentation of Off-line Handwritten Documents

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CSL465/603 - Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Top US Tech Talent for the Top China Tech Company

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Artificial Neural Networks written examination

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Forget catastrophic forgetting: AI that learns after deployment

Learning Methods for Fuzzy Systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

THE enormous growth of unstructured data, including

Laboratorio di Intelligenza Artificiale e Robotica

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Speech Recognition at ICSI: Broadcast News and beyond

Evolution of Symbolisation in Chimpanzees and Neural Nets

GACE Computer Science Assessment Test at a Glance

LEGO MINDSTORMS Education EV3 Coding Activities

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Knowledge Transfer in Deep Convolutional Neural Nets

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v1 [cs.cv] 10 May 2017

Axiom 2013 Team Description Paper

Second Exam: Natural Language Parsing with Neural Networks

Knowledge-Based - Systems

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Softprop: Softmax Neural Network Backpropagation Learning

CS 446: Machine Learning

Time series prediction

Test Effort Estimation Using Neural Network

AQUA: An Ontology-Driven Question Answering System

Learning to Schedule Straight-Line Code

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Abstractions and the Brain

CS Machine Learning

TD(λ) and Q-Learning Based Ludo Players

(Sub)Gradient Descent

A Review: Speech Recognition with Deep Learning Methods

Lesson M4. page 1 of 2

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Software Maintenance

Rule Learning With Negation: Issues Regarding Effectiveness

Mathematics process categories

Speaker Identification by Comparison of Smart Methods. Abstract

Machine Learning and Development Policy

Discriminative Learning of Beam-Search Heuristics for Planning

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

University of Groningen. Systemen, planning, netwerken Bosman, Aart

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

Model Ensemble for Click Prediction in Bing Search Ads

Naviance Family Connection

Human Emotion Recognition From Speech

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Using Web Searches on Important Words to Create Background Sets for LSI Classification

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Evolutive Neural Net Fuzzy Filtering: Basic Description

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Deep Neural Network Language Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A study of speaker adaptation for DNN-based speech synthesis

Managing Printing Services

INPE São José dos Campos

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

arxiv: v1 [cs.lg] 7 Apr 2015

A student diagnosing and evaluation system for laboratory-based academic exercises

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Applications of data mining algorithms to analysis of medical data

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

An empirical study of learning speed in backpropagation

Using focal point learning to improve human machine tacit coordination

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Calibration of Confidence Measures in Speech Recognition

Seminar - Organic Computing

Transcription:

Large Scale Data Analysis Using Deep Learning Introduction to Deep Learning U Kang Seoul National University U Kang 1

In This Lecture Overview of deep learning History of deep learning and its recent advances U Kang 2

Outline Overview of Deep Learning Historical Trends in Deep Learning U Kang 3

Deep Learning Branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data Key technology in recent AI revolution U Kang 4

Artificial Intelligence (AI) Quickly growing field with many practical applications and active research topics Goal: intelligent software to automate routine labor, understand speech or images, make diagnoses in medicine, and support basic scientific research U Kang 5

Approaches to AI Knowledge base approach Hard-code knowledge about the world in formal language A computer can reason about statements in these formal languages using logical inference rules Problem: not flexible, and hard to get exact knowledge U Kang 6

Machine Learning (ML) ML alg. acquires its own knowledge by extracting patterns from raw data E.g., naïve Bayes can separate legitimate e- mail from spam e-mail, through training with e-mails and their labels ML depends heavily on the representation of the data E.g., in the above e-mail example, each e- mail is represented by the set of words contained in it U Kang 7

Importance of Representations U Kang 8

Representation Learning It is difficult to know what feature should be extracted E.g., features to detect cars in photographs? Representation learning: discover not only the mapping from representation to output, but also the representation itself U Kang 9

Challenges in Representation Learning How to separate factors of variation that explain the observed data? A factor means a separate source of influence E.g., image: a red car may look black at night E.g., speech: a word may sound differently based on the speaker s age, sex, and accent U Kang 10

Deep Learning Representation Deep learning solves the problem in representation learning by introducing representations that are expressed in terms of other simple representations Deep learning builds complex concepts out of simpler concepts U Kang 11

Deep Learning Representation Multi-layer perceptron U Kang 12

Perspectives on Deep Learning 1. Learns the right representation 2. Depth allows the computer to learn a multi-step computer program Each layer can be thought of as the state of computer s memory after executing another set of instructions Networks with greater depth can execute more instructions in sequence Sequential instructions offer great power since later instructions can refer back to the results of earlier instructions U Kang 13

Measuring the Depth of a Model Computational graph U Kang 14

AI hierarchy U Kang 15

Learning Multiple Components U Kang 16

Plan of Study U Kang 17

Outline Overview of Deep Learning Historical Trends in Deep Learning U Kang 18

Key Trends 1. Deep learning has a long and rich history with varying popularity over time 2. Deep learning has become more powerful as the amount of available training data has increased 3. Deep learning models have grown in size over time as computer hardware and software infrastructure for deep learning has improved 4. Deep learning has solved increasingly complicated applications with increasing accuracy over time U Kang 19

Waves in Deep Learning Cybernetics (1940s - 1960s) Theories of biological learning: perceptron Connectionism (1980s - 1990s) Back-propagation to train a neural network with one or two hidden layers Deep Learning (2006 - ) U Kang 20

Cybernetics (1940s - 1960s) Theories of biological learning Implementations of the first models such as the perceptron allowing the training of a single neuron Linear model: f(x,w) = x 1 w 1 + + x n w n + b Limitation: cannot learn the XOR function (Minsky 1969) The first major dip in the popularity of neural network U Kang 21

Connectionism (1980s - 1990s) Main idea: a large number of simple computational units can achieve intelligent behavior when networked together Universal approximation theorem (Cybenko 1989, Hornik 1991) A feed-forward network with a single hidden layer containing a finite number of neurons can approximate any continuous function It means simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not guarantee the algorithmic learnability of those parameters U Kang 22

Connectionism (1980s - 1990s) Key concepts arose during connectionism movement of the 1980s Distributed representation Back-propagation Modeling sequences with neural networks RNN, LSTM Limitation: believed to be very difficult to train model Especially for deep model The second major dip of neural network U Kang 23

Connectionism (1980s - 1990s) Distributed representation Each input to a system should be represented by many features, and each feature should be involved in the representation of many possible inputs E.g., A vision system can recognize cars, trucks, and birds, and these objects can each be red, green, or blue One way of representing these inputs is to have a separate neuron that activates for each of the nine possible combinations Distributed representation: three neurons for objects, three neurons for colors => total six neurons U Kang 24

Deep Learning (2006-) New technologies that enabled training deep neural networks New unsupervised learning techniques Deep belief network (Hinton, 2006): greedy layer-wise pretraining New activation functions (e.g., rectified linear unit) Powerful computing architecture Clusters and GPU U Kang 25

Growing Datasets U Kang 26

MNIST Dataset U Kang 27

Why Growing Datasets Matters? The age of Big Data has made machine learning much easier because the key burden of statistical estimation (generalize well to new data after observing only a small amount of data) has been considerably lightened Rule of thumb A supervised deep learning algorithm would achieve acceptable performance with ~5000 labeled examples per category Deep learning algorithm would exceed human performance when trained with a dataset with 10 million labeled examples U Kang 28

Increasing Model Sizes A main insight of connectionism: animals become intelligent when many of their neurons work together The # of connections per neuron is continuously increasing But, still smaller than that of human U Kang 29

Number of Neurons The total # of neurons of neural networks has been very small until recently Since the introduction of hidden units, artificial neural networks (ANN) have doubled in size roughly every 2.4 years Unless new technologies allow faster scaling, ANN will reach the same number of neurons as the human brain in 2050 The increase in model size is one of the most important trends in deep learning Due to faster CPU, GPU, faster network connectivity, and better software infrastructure for distributed computing U Kang 30

Number of Neurons U Kang 31

Increasing Accuracy, Complexity, and Real-World Impact Increasing accuracy: object recognition The deep learning revolution is recognized by many people when a CNN won the ILSVRC challenge by a wide-margin U Kang 32

More on Increasing Accuracy Increasing accuracy in other areas Speech recognition Deep learning decreased the error by 50% Image segmentation Machine translation U Kang 33

Increasing Complexity Neural networks become able to solve more complex problems Automatic image transscription Machine translation Neural Turing machine A neural network that learns to read from memory cells and write arbitrary content to memory cells Enables self-programming: learn simple programs from examples of desired behavior E.g., learn to sort list of numbers Playing video games U Kang 34

Real World Impact DL used in many top technology companies Google, Microsoft, Facebook, IBM, Many software infrastructure developed Tensorflow, Theano, Caffe, DL has made contributions to other sciences Neuroscience: CNN for object recognition provides a model of visual processing that neuroscientists can study Help develop new medication Automatically parse microscope images used to construct -3D map of the human brain U Kang 35

What you need to know Deep learning: an approach to machine learning learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simple concepts, and more abstract representations computed in terms of less abstract ones Deep learning benefits heavily from advances in human brain research, statistics, math, and computer science Recent tremendous growth of deep learning is based on powerful computers, larger datasets, and techniques for training deep networks Many opportunities and challenges for applications, theories, and methods U Kang 36

Questions? U Kang 37