Outline. Little green men INTRODUCTION TO STATISTICAL MACHINE LEARNING. Representing things in Machine Learning 10/22/2010

Similar documents
Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Generative models and adversarial training

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CSL465/603 - Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Speech Emotion Recognition Using Support Vector Machine

A Case Study: News Classification Based on Term Frequency

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Rule Learning With Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Human Emotion Recognition From Speech

Multivariate k-nearest Neighbor Regression for Time Series data -

Probabilistic Latent Semantic Analysis

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Speech Recognition at ICSI: Broadcast News and beyond

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Issues in the Mining of Heart Failure Datasets

Switchboard Language Model Improvement with Conversational Data from Gigaword

Multi-Lingual Text Leveling

Learning From the Past with Experiment Databases

Time series prediction

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Word Segmentation of Off-line Handwritten Documents

Model Ensemble for Click Prediction in Bing Search Ads

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v1 [cs.lg] 15 Jun 2015

Rule Learning with Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Universidade do Minho Escola de Engenharia

SARDNET: A Self-Organizing Feature Map for Sequences

Learning Methods in Multilingual Speech Recognition

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Reducing Features to Improve Bug Prediction

The stages of event extraction

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Artificial Neural Networks written examination

A survey of multi-view machine learning

Knowledge Transfer in Deep Convolutional Neural Nets

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Using focal point learning to improve human machine tacit coordination

CS 446: Machine Learning

Unit 3: Lesson 1 Decimals as Equal Divisions

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

AP Statistics Summer Assignment 17-18

Algebra 2- Semester 2 Review

Learning Methods for Fuzzy Systems

Welcome to. ECML/PKDD 2004 Community meeting

Semi-Supervised Face Detection

EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 4. Working with measures

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Laboratorio di Intelligenza Artificiale e Robotica

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Ohio s Learning Standards-Clear Learning Targets

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Content-based Image Retrieval Using Image Regions as Query Examples

Large vocabulary off-line handwriting recognition: A survey

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Learning Distributed Linguistic Classes

Data Fusion Through Statistical Matching

Australian Journal of Basic and Applied Sciences

Axiom 2013 Team Description Paper

Characteristics of Functions

Math 96: Intermediate Algebra in Context

Copyright by Sung Ju Hwang 2013

Attributed Social Network Embedding

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

TextGraphs: Graph-based algorithms for Natural Language Processing

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Online Updating of Word Representations for Part-of-Speech Tagging

Modeling user preferences and norms in context-aware systems

Word learning as Bayesian inference

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

A Comparison of Two Text Representations for Sentiment Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

arxiv: v2 [cs.ro] 3 Mar 2017

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Visit us at:

Ordered Incremental Training with Genetic Algorithms

EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS

Activity Recognition from Accelerometer Data

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Automatic document classification of biological literature

TCC Jim Bolen Math Competition Rules and Facts. Rules:

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Self-Supervised Acquisition of Vowels in American English

Transcription:

Outline INTRODUCTION TO STATISTICAL MACHINE LEARNING Representing things Feature vector Training sample Unsupervised learning Clustering Supervised learning Classification Regression Xiaojin Zhu jerryzhu@cs.wisc.edu Little green men Representing things in Machine Learning The weight and height of 100 little green men What can you learn from this data? An instance x represents a specific object ( thing ) x often represented by a D-dimensional feature vector x = (x 1,..., x D ) R D Each dimension is called a feature. Continuous or discrete. x is a dot in the D-dimensional feature space Abstraction of object. Ignores any other aspects (two men having the same weight, height will be identical) 1

Feature Representation Example Text document Vocabulary of size D (~100,000): 000) aardvark zulu bag of word : counts of each vocabulary entry To marry my true love (3531:1 13788:1 19676:1) I wish that I find my soulmate this year (3819:1 13448:1 19450:1 20514:1) Often remove stopwords: the, of, at, in, Special out-of-vocabulary (OOV) entry catches all unknown words More Feature Representations Image Color histogram Software Execution profile: the number of times each line is executed Bank account Credit rating, balance, #deposits in last day, week, month, year, #withdrawals You and me Medical test1, test2, test3, Training Sample A training sample is a collection of instances x 1,..., x n, which h is the input to the learning process x i = (x i1,..., x id ) Assume these instances are sampled independently from an unknown (population) distribution, P(x) We denote this by x i P(x), where stands for independent and identically distributed Training Sample A training sample is the experience given to a learning algorithm What the algorithm can learn from it varies We introduce two basic learning paradigms: unsupervised learning supervised learning 2

Unsupervised Learning Unsupervised Learning No teacher Training sample x 1,..., x n, that s it No teacher providing supervision as to how individual instances should be handled Common tasks: clustering, separate the n instances into groups novelty detection, find instances that are very different from the rest dimensionality reduction, represent each instance with a lower dimensional feature vector while maintaining key characteristics of the training samples Clustering Hierarchical Agglomerative Clustering Group training sample into k clusters, such that instances in the same cluster are similar, and instances in different clusters are dissimilar How many clusters do you see? Many clustering algorithms Euclidean distance What about the distance between two clusters? Single linkage Complete linkage: replace min with max Demo 3

Label Supervised Learning Teacher shows labels Little green men: Predict gender (M, F) from weight, height? h Predict adult, juvenile from weight, height? A label y is the desired prediction on an instance x Discrete label: classes M, F; A, J: often encode as 0,1 or -1,1 Multiple classes: 1, 2, 3,, C. No class order implied. Continuous label: e.g., blood pressure Supervised Learning A labeled training sample is a collection of instances (x 1, y 1 )..., (x n, y n ) Assume (x i, y i ) P(x, y). Again, P(x, y) is unknown Supervised learning learns a function f: X Y in some function family F, such that f(x) predicts the true label y on future data x, where (x, y) P(x, y) Classification: if y discrete Regression: if y continuous Evaluation Training set error 0-1 loss for classification: i squared loss for regression overfitting Test set error: use a separate test set True error of f:, where c() is an appropriate loss function Goal of supervised learning is to find 4

k-nearest-neighbor (knn) knn 1NN for little green men: Decision boundary What if we want regression? Instead of majority vote, take average of neighbors y How to pick k? Split data into training and tuning sets Classify tuning set with different k Pick k that produces least tuning-set error Summary Feature representation Unsupervised learning / Clustering Hierarchical Agglomerative Clustering Single linkage Complete linkage Supervised learning / Classification k-nearest-neighbor decision trees neural networks 5