Epilogue: what have you learned this semester?

Similar documents
(Sub)Gradient Descent

Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

CS 446: Machine Learning

Learning From the Past with Experiment Databases

Reducing Features to Improve Bug Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

arxiv: v2 [cs.cv] 30 Mar 2017

CSL465/603 - Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Probability and Statistics Curriculum Pacing Guide

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Speech Emotion Recognition Using Support Vector Machine

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Multivariate k-nearest Neighbor Regression for Time Series data -

Lecture 1: Basic Concepts of Machine Learning

A survey of multi-view machine learning

Laboratorio di Intelligenza Artificiale e Robotica

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Australian Journal of Basic and Applied Sciences

Universidade do Minho Escola de Engenharia

Human Emotion Recognition From Speech

A Case Study: News Classification Based on Term Frequency

Knowledge Transfer in Deep Convolutional Neural Nets

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Mining Association Rules in Student s Assessment Data

Rule Learning With Negation: Issues Regarding Effectiveness

Semi-Supervised Face Detection

Corrective Feedback and Persistent Learning for Information Extraction

Generative models and adversarial training

BMBF Project ROBUKOM: Robust Communication Networks

Model Ensemble for Click Prediction in Bing Search Ads

Go fishing! Responsibility judgments when cooperation breaks down

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v1 [cs.lg] 15 Jun 2015

Assignment 1: Predicting Amazon Review Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

STA 225: Introductory Statistics (CT)

WHEN THERE IS A mismatch between the acoustic

Applications of data mining algorithms to analysis of medical data

Softprop: Softmax Neural Network Backpropagation Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Data Fusion Through Statistical Matching

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Methods for Fuzzy Systems

Indian Institute of Technology, Kanpur

Time series prediction

Discriminative Learning of Beam-Search Heuristics for Planning

Welcome to. ECML/PKDD 2004 Community meeting

Detailed course syllabus

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

On the Polynomial Degree of Minterm-Cyclic Functions

Handling Concept Drifts Using Dynamic Selection of Classifiers

Issues in the Mining of Heart Failure Datasets

Comparison of network inference packages and methods for multiple networks inference

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Word Segmentation of Off-line Handwritten Documents

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED MECHANICS MET 2025

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

arxiv: v1 [cs.cl] 2 Apr 2017

On-Line Data Analytics

Speech Recognition at ICSI: Broadcast News and beyond

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

EGRHS Course Fair. Science & Math AP & IB Courses

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Comparison of Two Text Representations for Sentiment Analysis

The stages of event extraction

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Math 96: Intermediate Algebra in Context

Writing Research Articles

Content-based Image Retrieval Using Image Regions as Query Examples

School of Innovative Technologies and Engineering

Conference Presentation

INPE São José dos Campos

Beyond the Pipeline: Discrete Optimization in NLP

MTH 141 Calculus 1 Syllabus Spring 2017

Ensemble Technique Utilization for Indonesian Dependency Parser

Firms and Markets Saturdays Summer I 2014

The Boosting Approach to Machine Learning An Overview

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Transcription:

Epilogue: what have you learned this semester? ʻViagraʼ =0 =1 ʻlotteryʼ ĉ(x) = spam =0 =1 ĉ(x) = ham ĉ(x) = spam 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1

What did you get out of this course? What skills have you learned in this course that you feel would be useful? What are the most important insights you gained this semester? What advice would you give future students? What was your biggest challenge in this course? What would you like me to do differently? 2

What I hope you got out of this course The machine learning toolbox Formulating a problem as an ML problem Understanding a variety of ML algorithms Running and interpreting ML experiments Understanding what makes ML work theory and practice

Learning scenarios we covered Classification: discrete/categorical labels Regression: continuous labels Clustering: no labels 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 4

Supervised learning A variety of learning tasks Unsupervised learning Semi-supervised learning Access to a lot of unlabeled data Multi-label classification Each example can belong to multiple classes Multi-task classification Solving multiple related tasks

A variety of learning tasks Outlier/novelty detection Novelty: anything that is not part of the normal behavior of a system. Reinforcement learning Learn action to maximize payoff Structured output learning 2. Hints (invariances an rotational invariance, m 3. Removing noisy data symmetry intensity 4. Advanced validation More efficient than CV, brace c AM L Creator: Malik Magdon-Ismail

Learning in structured output spaces Handle prediction problems with complex output spaces Structured outputs: multivariate, correlated, constrained General way to solve many learning problems Examples taken from Ben Taskar s 07 NIPS tutorial

Local vs. Global Global classification takes advantage of correlations and satisfies the constraints in the problem brace

Other techniques ² Graphical models (conditional random fields, Bayesian networks) ² Bayesian model averaging 9

The importance of features and their representation Choosing the right features is one of the most important aspects of applying ML. What you can do with features: ² Normalization ² Selection ² Construction ² Fill in missing values 10

Types of models Geometric q q Ridge-regression, SVM, perceptron Neural networks w w T x + b > 0 Distance-based q K-nearest-neighbors w T x + b < 0 Probabilistic q Naïve-bayes P (Y = spam Viagara, lottery) ʻViagraʼ Logical models: Tree/Rule based q Decision trees ʻlotteryʼ =0 =0 =1 =1 ĉ(x) = spam Ensembles 11 ĉ(x) = ham ĉ(x) = spam

Loss + regularization Many of the models we studied are based on a cost function of the form: loss + regularization Example: Ridge regression 1 N NX (y i w x i ) 2 + w w i=1 12

Loss + regularization for classification SVM C N NX i=1 max [1 y i h w (x i ), 0] + 1 2 w w Hinge loss L 2 regularizer The hinge loss is a margin maximizing loss function Can use other regularizers: w 1 (L 1 norm) Leads to very sparse solutions and is non-differentiable. Elastic Net regularizer: w 1 +(1 ) w 2 2 13

Loss + regularization for classification SVM Logistic regression 1 N NX i=1 C N NX i=1 max [1 Hinge loss Log loss y i h w (x i ), 0] + 1 2 w w L 2 regularizer log(1 + exp(y i h w (x i )) + 2 w w L 2 regularizer AdaBoost can be shown to optimize the exponential loss 1 N NX exp( y i h w (x i )) i=1 14

Loss + regularization for regression Ridge regression i=1 Closed form solution; sensitivity to outliers Lasso 1 N 1 N Sparse solutions; non-differentiable NX (y i w x) 2 + w w NX (y i w x) 2 + w 1 i=1 Can use alternative loss functions 15

Comparison of learning methods TABLE 10.1. Some characteristics of different learning methods. Key: = good, =fair, and =poor. Characteristic Neural SVM Trees MARS k-nn, Natural handling of data of mixed type Nets random forests Kernels Handling of missing values Robustness to outliers in input space Insensitive to monotone transformations of inputs Computational scalability (large N) Ability to deal with irrelevant inputs Ability to extract linear combinations of features Interpretability Predictive power Table 10.1 from Elements of statistical learning 16

Comparison of learning methods TABLE 10.1. Some characteristics of different learning methods. Key: = good, =fair, and =poor. Characteristic Neural SVM Trees MARS k-nn, Natural handling of data of mixed type Nets Kernels Handling of missing values Robustness to outliers in input space Insensitive to monotone transformations of inputs Computational scalability (large N) Ability to deal with irrelevant inputs Ability to extract linear combinations of features Interpretability Predictive power Table 10.1 from Elements of statistical learning 17

The scikit-learn algorithm cheat sheet http://scikit-learn.org/stable/tutorial/machine_learning_map/ 18

https://medium.com/@chris_bour/an-extended-version-of-the-scikit-learn-cheat-sheet-5f46efc6cbb#.g942x8l3d 19

Applying machine learning Always try multiple models What would you start with? If accuracy is not high enough Design new features Collect more data 20