Machine Learning & Deep Nets. Leon F. Palafox December 4 th, 2014

Similar documents
Python Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

CS 446: Machine Learning

CSL465/603 - Machine Learning

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

(Sub)Gradient Descent

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 1: Basic Concepts of Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Human Emotion Recognition From Speech

arxiv: v1 [cs.lg] 15 Jun 2015

CS Machine Learning

THE enormous growth of unstructured data, including

Generative models and adversarial training

Softprop: Softmax Neural Network Backpropagation Learning

Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning From the Past with Experiment Databases

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Semi-Supervised Face Detection

Assignment 1: Predicting Amazon Review Ratings

Artificial Neural Networks

Model Ensemble for Click Prediction in Bing Search Ads

Rule Learning With Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Applications of data mining algorithms to analysis of medical data

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Modeling function word errors in DNN-HMM based LVCSR systems

A Deep Bag-of-Features Model for Music Auto-Tagging

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Word Segmentation of Off-line Handwritten Documents

arxiv: v2 [cs.cl] 26 Mar 2015

Evolution of Symbolisation in Chimpanzees and Neural Nets

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Speaker Identification by Comparison of Smart Methods. Abstract

Evolutive Neural Net Fuzzy Filtering: Basic Description

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Rule Learning with Negation: Issues Regarding Effectiveness

Second Exam: Natural Language Parsing with Neural Networks

Welcome to. ECML/PKDD 2004 Community meeting

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Indian Institute of Technology, Kanpur

Knowledge-Based - Systems

A Review: Speech Recognition with Deep Learning Methods

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

INPE São José dos Campos

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

Issues in the Mining of Heart Failure Datasets

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

An OO Framework for building Intelligence and Learning properties in Software Agents

Reducing Features to Improve Bug Prediction

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Australian Journal of Basic and Applied Sciences

Universidade do Minho Escola de Engenharia

arxiv: v2 [cs.ir] 22 Aug 2016

Calibration of Confidence Measures in Speech Recognition

Axiom 2013 Team Description Paper

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.lg] 7 Apr 2015

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Deep Neural Network Language Models

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

arxiv: v1 [cs.cl] 27 Apr 2016

An empirical study of learning speed in backpropagation

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Learning Methods for Fuzzy Systems

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

arxiv: v4 [cs.cl] 28 Mar 2016

Switchboard Language Model Improvement with Conversational Data from Gigaword

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

A survey of multi-view machine learning

Cultivating DNN Diversity for Large Scale Video Labelling

Transcription:

Machine Learning & Deep Nets Leon F. Palafox December 4 th, 2014

Introduction What is Machine Learning? Is a rebranding of Artificial Intelligence, since we don t really care about replicating intelligence. Is a set of tools to analyze data to make predictions and get insights out of it. Is a sub- branch of computer science and statistics. Areas of Machine Learning Supervised Learning: Classification, Regression. Unsupervised Learning (Knowledge Discovery): Clustering, Mixture Models.

Data Labels Algorithm (Naïve Bayes, Deep Nets, SVMs, Logistic Regression) System Mail Inbox Not Spam Spam Spam Not Spam Not Spam Spam The set of elements that describe a single datum are called features, in this case, the features are the words in the e-mails. Each category (spam, not spam) will have features that will characterize them. Spam: Offer, Viagra, medicine, Free, Conference in China Not Spam: Hamilton, LPL, DTM, Mom, Dad

Data Algorithm (K-Means, LDA, Autoencoders) 80 60 40 20 0 Topics Mail Inbox The set of elements that describe a single datum are called features, in this case, the features are the words in the e-mails. Each topic (clusters) will have features that will characterize them. Research: Mars, Proposal, DTM, HiRISE, Machine Learning, Deep Nets, Bayesian Family: Mom, House, Mexico Promotions: Computer, PS4, Cheap, Amazon, Deal Classes: Grades, Homework, Questions, Office Time

http://cs.stanford.edu/people/karpathy/nips2014/ http://sarah-palin.herokuapp.com/

Preprocessing Data It s a pain, but is needed Antialiasing Filter Noise Filter Spectrogram Algorithm

Preprocessing Data

Who uses Machine Learning Google: Spam Detection (Gmail), Ranking Algorithms (Google Search), Image Recognition (Google Image) Amazon: Recommendation Engines Facebook: Feed personalization, News personalization. Disney, NTT, Toyota, Ford, etc.

So what are Deep Nets? First we need to understand what are Neural Networks (NN). NNs have gone through a heavy rebranding thorough the years. In 1943, McCulloch and Pitts created the first model of an artificial neuron. By 1958, Rosenblatt had come up with the Perceptron, the cornerstone of modern NN. In 1986, Rumelhart started the connectionism euphoria.

Background Processing power was still an issue and until 2006, common NNs were researched by only small clusters of people. Training was expensive, and the results only marginally better (or worse) than SVMs or Logistic Regression. In 2006, Hinton and Bengio made huge discoveries on how to train NNs and they rebranded them as Deep Nets. During this time, Convolutional Neural Networks (CNN) had been a great tool for image pattern recognition.

Motivation Deep Nets and CNNs, are by today standards the best algorithm for Image Pattern Recognition. The three Big Kahunas of NNs and Deep Nets, Geoffrey Hinton, Yann LeCun and Yoshua Bengio are working actively with Google, Facebook and University of Toronto, respectively.

Motivation In January Google bought DeepMind, a startup with no WebPage, no Product, a single NIPS (AI conference) Demo. They bought it for $500 million. Facebook was deeply interested as well.

Perceptron Tries to mimic a real NN, since it has a nucleus that processes some inputs and give an output. h w,b x is a function of all the inputs, and is composed of two terms.

Perceptron h w,b x = f 3 i=1 W i x i + b w 1 w 2 w 3 f is called the activation function, and it works as a way to discretize the outputs of the perceptron. One of the most common activations functions is the sigmoid function: f z = 1 1 + exp(z) This looks very familiar

Neural Network Naturally, a NN is going to be a set of perceptrons interconnected within each other.

Neural Network We can add as many layers and outputs as we want, for example a two binary output allows us to classify in four classes. We also regularize NNs, since they can be also prone to overfitting.

Problems of NNs We need to answer two questions: How many layers are enough to solve a problem? How many hidden units should we use per layer? As you can imagine, training complexity increases as we increase hidden units. This can be reduced by avoiding a full interconnection. The elephant in the room is called Vanishing Gradient

Autoencoders An autoencoder is a NN where the output and the input are the same.

MNIST Dataset Dataset of handwritten digits Has a training set of 60,000 examples, and a test set of 10,000 examples. Each digit is an 28x28 image (784 pixels) Each digit has a label that identifies which digit it represents. (9 labels)

Autoencoders Why would I want both the input and the output to be the same. MNIST dataset as an example (28x28 input images) 10 hidden units in Autoencoder 80 hidden units in Autoencoder

Autoencoders 196 hidden units in Autoencoder 500 hidden units in Autoencoder

Autoencoders and Deep Nets We train an autoencoder, and plug it in a NN then train. Autoencoder Input Layer Classification This simple modification is one of the most important advancements in NN practice in the past 20 years.

Demo http://www.clarifai.com/

Important notes We are still not entirely sure why it works: Some people say is because using this as a random start saves us much hassle. Some say that this artificially moves us to a better search space. Using the autoencoder as a preprocessing step, has been proven to help us save steps when it comes to preprocessing algorithms. The autoencoder can find circles, edges, etc by itself.