Machine Learning. Dimensionality i Reduction

Similar documents
Probabilistic Latent Semantic Analysis

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

(Sub)Gradient Descent

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

WHEN THERE IS A mismatch between the acoustic

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 1: Basic Concepts of Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Semi-Supervised Face Detection

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mathematics. Mathematics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

A survey of multi-view machine learning

Australian Journal of Basic and Applied Sciences

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Comparison of network inference packages and methods for multiple networks inference

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Laboratorio di Intelligenza Artificiale e Robotica

Comment-based Multi-View Clustering of Web 2.0 Items

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Extending Place Value with Whole Numbers to 1,000,000

Grade 6: Correlated to AGS Basic Math Skills

Unit 3: Lesson 1 Decimals as Equal Divisions

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.lg] 3 May 2013

Modeling function word errors in DNN-HMM based LVCSR systems

Computers Change the World

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Laboratorio di Intelligenza Artificiale e Robotica

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

A Comparison of Two Text Representations for Sentiment Analysis

BENCHMARK TREND COMPARISON REPORT:

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Investigation on Mandarin Broadcast News Speech Recognition

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Modeling function word errors in DNN-HMM based LVCSR systems

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

An Online Handwriting Recognition System For Turkish

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

arxiv: v2 [cs.cv] 30 Mar 2017

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

A Case Study: News Classification Based on Term Frequency

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Issues in the Mining of Heart Failure Datasets

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Empowering Students Learning Achievement Through Project-Based Learning As Perceived By Electrical Instructors And Students

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Corpus Linguistics (L615)

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Learning Distributed Linguistic Classes

Speaker recognition using universal background model on YOHO database

Kenya: Age distribution and school attendance of girls aged 9-13 years. UNESCO Institute for Statistics. 20 December 2012

Course Specifications

Sagor s Model: The Action Research Cycle (Sagor, 2005)

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Learning From the Past with Experiment Databases

Statistical Parametric Speech Synthesis

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

MYCIN. The MYCIN Task

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

AN ABSTRACT OF THE DISSERTATION OF

Visual CP Representation of Knowledge

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Guinea. Out of School Children of the Population Ages Percent Out of School 46% Number Out of School 842,000

Word Segmentation of Off-line Handwritten Documents

Discriminative Learning of Beam-Search Heuristics for Planning

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

GDP Falls as MBA Rises?

Use of CIM in AEP Enterprise Architecture. Randy Lowe Director, Enterprise Architecture October 24, 2012

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Networks and the Diffusion of Cutting-Edge Teaching and Learning Knowledge in Sociology

A Deep Bag-of-Features Model for Music Auto-Tagging

Softprop: Softmax Neural Network Backpropagation Learning

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

Speech Recognition by Indexing and Sequencing

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

CS 446: Machine Learning

Humboldt-Universität zu Berlin

Rule Learning With Negation: Issues Regarding Effectiveness

Transcription:

Machine Learning Dimensionality i Reduction slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011) Jeff Howbert Introduction to Machine Learning Winter 2012 1

Dimensionality reduction Many modern data domains involve huge numbers of features / dimensions Documents: thousands of words, millions of bigrams Images: thousands to millions of pixels Genomics: thousands of genes, millions of DNA polymorphisms Jeff Howbert Introduction to Machine Learning Winter 2012 2

Why reduce dimensions? High dimensionality has many costs Redundant and irrelevant features degrade performance of some ML algorithms Difficulty in interpretation and visualization Computation may become infeasible what if your algorithm scales as O( n 3 )? Curse of dimensionality Jeff Howbert Introduction to Machine Learning Winter 2012 3

Jeff Howbert Introduction to Machine Learning Winter 2012 4

Jeff Howbert Introduction to Machine Learning Winter 2012 5

Jeff Howbert Introduction to Machine Learning Winter 2012 6

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Jeff Howbert Introduction to Machine Learning Winter 2012 8

Jeff Howbert Introduction to Machine Learning Winter 2012 9

Jeff Howbert Introduction to Machine Learning Winter 2012 10

Repeat until m lines Jeff Howbert Introduction to Machine Learning Winter 2012 11

Steps in principal component analysis Mean center the data Compute covariance matrix Σ Calculate eigenvalues and eigenvectors of Σ Eigenvector with largest eigenvalue λ 1 is 1 st principal i component (PC) Eigenvector with k th largest eigenvalue λ th k is k PC λ k / Σ i λ i = proportion of variance captured by k th PC Jeff Howbert Introduction to Machine Learning Winter 2012 12

Applying a principal component analysis Full set of PCs comprise a new orthogonal basis for feature space, whose axes are aligned with the maximum variances of original data. Projection of original data onto first k PCs gives a reduced dimensionality representation of the data. Transforming reduced dimensionality projection back into original space gives a reduced dimensionality reconstruction of the original data. Reconstruction will have some error, but it can be small and often is acceptable given the other benefits of dimensionality reduction. Jeff Howbert Introduction to Machine Learning Winter 2012 13

PCA example original data mean centered data with PCs overlayed Jeff Howbert Introduction to Machine Learning Winter 2012 14

PCA example original data projected Into full PC space original data reconstructed using only a single PC Jeff Howbert Introduction to Machine Learning Winter 2012 15

Jeff Howbert Introduction to Machine Learning Winter 2012 16

Choosing the dimension k Jeff Howbert Introduction to Machine Learning Winter 2012 17

Jeff Howbert Introduction to Machine Learning Winter 2012 18

Jeff Howbert Introduction to Machine Learning Winter 2012 19

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Jeff Howbert Introduction to Machine Learning Winter 2012 21

PCA: a useful preprocessing step Helps reduce computational complexity. Can help supervised learning. Reduced dimension simpler hypothesis space. Smaller VC dimension less risk of overfitting. PCA can also be seen as noise reduction. Caveats: Fails when data consists of multiple separate clusters. Directions of greatest variance may not be most informative (i.e. greatest classification power). Jeff Howbert Introduction to Machine Learning Winter 2012 22

Jeff Howbert Introduction to Machine Learning Winter 2012 23

Jeff Howbert Introduction to Machine Learning Winter 2012 24

Jeff Howbert Introduction to Machine Learning Winter 2012 25

Jeff Howbert Introduction to Machine Learning Winter 2012 26

Jeff Howbert Introduction to Machine Learning Winter 2012 27

Jeff Howbert Introduction to Machine Learning Winter 2012 28

Jeff Howbert Introduction to Machine Learning Winter 2012 29

Jeff Howbert Introduction to Machine Learning Winter 2012 30

Jeff Howbert Introduction to Machine Learning Winter 2012 31

Jeff Howbert Introduction to Machine Learning Winter 2012 32

Jeff Howbert Introduction to Machine Learning Winter 2012 33

Jeff Howbert Introduction to Machine Learning Winter 2012 34

Off-the-shelf classifiers Per Tom Dietterich: Methods that can be applied directly to data without requiring a great deal of time-consuming data preprocessing or careful tuning of the learning procedure. Jeff Howbert Introduction to Machine Learning Winter 2012 35

Off-the-shelf criteria slide thanks to Tom Dietterich (CS534, Oregon State Univ., 2005) Jeff Howbert Introduction to Machine Learning Winter 2012 36

Practical advice on machine learning from Andrew Ng at Stanford slides: http://cs229.stanford.edu/materials/mlstanford edu/materials/ml-advice.pdf video: http://www.youtube.com/v/sq8t9b-ugve (starting at 24:56) Jeff Howbert Introduction to Machine Learning Winter 2012 37