Machine Learning: Preliminaries & Overview

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Statewide Framework Document for:

CSL465/603 - Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Learning From the Past with Experiment Databases

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Generative models and adversarial training

Rule Learning With Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Probability and Statistics Curriculum Pacing Guide

Probabilistic Latent Semantic Analysis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Speech Emotion Recognition Using Support Vector Machine

Multi-Lingual Text Leveling

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

arxiv: v2 [cs.cv] 30 Mar 2017

Reducing Features to Improve Bug Prediction

Modeling function word errors in DNN-HMM based LVCSR systems

A study of speaker adaptation for DNN-based speech synthesis

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Australian Journal of Basic and Applied Sciences

Mathematics. Mathematics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

WHEN THERE IS A mismatch between the acoustic

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Math 96: Intermediate Algebra in Context

Modeling function word errors in DNN-HMM based LVCSR systems

12- A whirlwind tour of statistics

Knowledge Transfer in Deep Convolutional Neural Nets

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Linking Task: Identifying authors and book titles in verbose queries

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning with Negation: Issues Regarding Effectiveness

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Artificial Neural Networks written examination

Measurement. When Smaller Is Better. Activity:

A Case Study: News Classification Based on Term Frequency

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Time series prediction

Learning Methods for Fuzzy Systems

Lecture 10: Reinforcement Learning

Human Emotion Recognition From Speech

MGT/MGP/MGB 261: Investment Analysis

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Learning Methods in Multilingual Speech Recognition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Mathematics process categories

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Axiom 2013 Team Description Paper

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Lecture 2: Quantifiers and Approximation

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Calibration of Confidence Measures in Speech Recognition

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Detailed course syllabus

Self Study Report Computer Science

Using focal point learning to improve human machine tacit coordination

School of Innovative Technologies and Engineering

The Good Judgment Project: A large scale test of different methods of combining expert predictions

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Analysis of Enzyme Kinetic Data

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Chapter 2 Rule Learning in a Nutshell

The Strong Minimalist Thesis and Bounded Optimality

A Case-Based Approach To Imitation Learning in Robotic Agents

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

SARDNET: A Self-Organizing Feature Map for Sequences

Corrective Feedback and Persistent Learning for Information Extraction

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Issues in the Mining of Heart Failure Datasets

Switchboard Language Model Improvement with Conversational Data from Gigaword

Disambiguation of Thai Personal Name from Online News Articles

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Semi-Supervised Face Detection

Office Hours: Mon & Fri 10:00-12:00. Course Description

Extending Place Value with Whole Numbers to 1,000,000

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Softprop: Softmax Neural Network Backpropagation Learning

Introduction to Causal Inference. Problem Set 1. Required Problems

Model Ensemble for Click Prediction in Bing Search Ads

Introduction to Simulation

arxiv: v1 [cs.lg] 15 Jun 2015

Transcription:

Machine Learning: Preliminaries & Overview Winter 2018

LOL

What is machine learning? Textbook definitions of machine learning : Detecting patterns and regularities with a good and generalizable approximation ( model or hypothesis ) Execution of a computer program to optimize the parameters of the model using training data or past experience.

Machine Learning Automatically identifying patterns in data Automatically making decisions based on data Hypothesis: Data Learning Algorithm Behavior Data Programmer or Expert Behavior

Machine Learning in Computer Science Natural Language Processing Biomedical/Cheme dical Informatics Speech/Au dio Processing Human Computer Interaction Planning Machine Learning Analytics Robotics Vision/Imag e Processing Financial Modeling

Major Tasks Regression Predict a numerical value from other information ; Output is a real value (e.g., $35/share ) Classification Predict a categorical value; Output is one of a number of classes (e.g., A ) Clustering Identify groups of similar entities Optimization

A Small Subset of Machine Learning Applications (*) Speech Recognition (*) NLP (natural language processing); machine translation. (*) Computer Vision (*) Medical Diagnosis (*) Autonomous Driving (*) Statistical Arbitrage (*) Signal Processing (*) Recommender Systems (*) World Domination (*) Fraud Detection (*) Social Media (*) Data Security (*) Search (*) A.I. & Robotics (*) Genomics (*) Computational Creativity (*) Hi Scores

A Small Subset of Machine Learning Applications https://www.youtube.com/watch?v=v1eynij0rnk https://www.youtube.com/watch?v=sce-qedfxta

Mathematical Necessities Probability Statistics Calculus Vector Calculus Linear Algebra Algorithms

Why do we need so much math? Probability Density Functions allow the evaluation of how likely a data point is under a model. Want to identify good PDFs. (calculus) Want to evaluate against a known PDF. (algebra)

Gaussian Distributions We use Gaussian Distributions all over the place.

Gaussian Distributions We use Gaussian Distributions all over the place.

Types of Machine Learning Methods Supervised provide explicit training examples with correct answers e.g. neural networks with back-propagation Unsupervised no feedback information is provided e.g., unsupervised clustering based on similarity Semi-supervised some feedback information is provided but it is not detailed e.g., only a fraction of examples are labeled e.g., reinforcement learning: reinforcement single is singlevalued assessment of current state

Data Data Data There s no data like more data All machine learning techniques rely on the availability of data to learn from. There is an ever increasing amount of data being generated, but it s not always easy to process. Is all data equal? (Good) Data (can) trump a choice of model!

Key Ingredients for Any Machine Learning Method Features (or attributes ) Underlying Representation for hypothesis, model, or target function Hypothesis space Learning method Data: Training data Used to train the model Validation (or Development) data Used to select model hyperparameters, to determine when to stop training, or to alter training method Test data Used to evaluate trained model Evaluation method

Assumption of all ML methods Inductive learning hypothesis: Any hypothesis that approximates target concept well over sufficiently large set of training examples will also approximate the concept well over other examples outside of the training set. Q: What is the difference between induction and deduction?

Training Examples: Class 1 Training Examples: Class 2 Test example: Class =?

Training Examples: Class 1 Training Examples: Class 2 Test example: Class =?

Training Examples: Class 1 Training Examples: Class 2 Test example: Class =?

Training Examples: Class 1 Training Examples: Class 2 Test example: Class =?

Feature Representations How do we view data? Our Focus Entity in the World Feature Representation Machine Learning Algorithm Web Page User Behavior Speech or Audio Data Vision Wine People Etc. Feature Extraction 22

Feature Representations Height Weight Eye Color Gender 66 170 Blue Male 73 210 Brown Male 72 165 Green Male 70 180 Blue Male 74 185 Brown Male 68 155 Green Male 65 150 Blue Female 64 120 Brown Female 63 125 Green Female 67 140 Blue Female 68 165 Brown Female 66 130 Green Female 23

Classification Identify which of N classes a data point, x, belongs to. x is a column vector of features. OR 24

Target Values In supervised approaches, in addition to a data point, x, we will also have access to a target value, t. Goal of Classification Identify a function y, such that y(x) = t 25

Feature Representations Height Weight Eye Color Gender 66 170 Blue Male 73 210 Brown Male 72 165 Green Male 70 180 Blue Male 74 185 Brown Male 68 155 Green Male 65 150 Blue Female 64 120 Brown Female 63 125 Green Female 67 140 Blue Female 68 165 Brown Female 66 130 Green Female 26

Graphical Example of Classification 27

Graphical Example of Classification? 28

Graphical Example of Classification? 29

Graphical Example of Classification 30

Graphical Example of Classification 31

Graphical Example of Classification 32

Decision Boundaries 33

Regression Regression is a supervised machine learning task. So a target value, t, is given. Classification: nominal t Regression: continuous t Goal of Classification Identify a function y, such that y(x) = t 34

Differences between Classification and Regression Similar goals: Identify y(x) = t. What are the differences? The form of the function, y (naturally). Evaluation Root Mean Squared Error Absolute Value Error Classification Error Maximum Likelihood Evaluation drives the optimization operation that learns the function, y. 35

Graphical Example of Regression? 36

Graphical Example of Regression 37

Graphical Example of Regression 38

Generalization Problem in Prediction/Classification

Common ML Pipeline

Confusion Matrix, ROC curves, etc. Area under (the) curve (AUC) is a common metric used to assess/compare classifiers

Clustering Clustering is an unsupervised learning task. There is no target value to shoot for. Identify groups of similar data points, that are dissimilar from others. Partition the data into groups (clusters) that satisfy these constraints 1. Points in the same cluster should be similar. 2. Points in different clusters should be dissimilar. 42

Graphical Example of Clustering 43

Graphical Example of Clustering 44

Graphical Example of Clustering

60k training/10k test images MNIST Classification LeCun, Bengio, et al. (1998) used SVMs to get error rate of 0.8%. More recent research using CNNs (a type of neural network) yields 0.23% error.

The Curse of Dimensionality In ML we are faced with a fundamental dilemma: to maintain a given model accuracy in higher dimensions we need a huge amount of data! An exponential increase in data required to densely populate space as the dimension increases. Points are equally far apart in high dimensional space (this is counter-intuitive).

Dealing with High Dimensionality What can we do? Use Domain Knowledge -- Feature engineering Make assumptions about dimensions -- Independence: Count along each dimension separately -- Smoothness: Propagate class counts to neighboring regions -- Symmetry: e.g., invariance to order of dimensions Perform dimensionality reduction

Bias-Variance Tradeoff Whenever we train any type of ML algorithm/model we are making some model choices, and fitting the parameters of that model. The more degrees of freedom (dof) the algorithm has, the more complicated the model that can be fitted (recall: overfitting). Note that a model can be bad for (2) basic reasons: (1) it is inaccurate and doesn t match the data well; (2) it is not very precise, meaning that the there is a lot of variation in the results. (1) is known as bias; (2) is statistical variance.

Bias-Variance Tradeoff The MSE (mean-squared error) decouples to reflect what is known as the bias-variance tradeoff: Where: : true parameter value ˆ : parameter estimate

In pictures Bias-Variance Tradeoff