W4240 Data Mining. Frank Wood. September 6, 2010

Similar documents
Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Probabilistic Latent Semantic Analysis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Basic Concepts of Machine Learning

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Assignment 1: Predicting Amazon Review Ratings

Generative models and adversarial training

Lecture 10: Reinforcement Learning

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Rule Learning With Negation: Issues Regarding Effectiveness

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speech Recognition at ICSI: Broadcast News and beyond

Applications of data mining algorithms to analysis of medical data

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Speech Emotion Recognition Using Support Vector Machine

Evolutive Neural Net Fuzzy Filtering: Basic Description

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

INPE São José dos Campos

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Methods for Fuzzy Systems

Reducing Features to Improve Bug Prediction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Semi-Supervised Face Detection

Human Emotion Recognition From Speech

CS 446: Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Case Study: News Classification Based on Term Frequency

Word Segmentation of Off-line Handwritten Documents

Australian Journal of Basic and Applied Sciences

School of Innovative Technologies and Engineering

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Rule Learning with Negation: Issues Regarding Effectiveness

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Introduction to Simulation

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Issues in the Mining of Heart Failure Datasets

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CS Machine Learning

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Modeling function word errors in DNN-HMM based LVCSR systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

EGRHS Course Fair. Science & Math AP & IB Courses

An Online Handwriting Recognition System For Turkish

Using Web Searches on Important Words to Create Background Sets for LSI Classification

(Sub)Gradient Descent

STA 225: Introductory Statistics (CT)

Probability and Game Theory Course Syllabus

Speaker recognition using universal background model on YOHO database

Artificial Neural Networks written examination

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Laboratorio di Intelligenza Artificiale e Robotica

WHEN THERE IS A mismatch between the acoustic

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Modeling function word errors in DNN-HMM based LVCSR systems

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Experts Retrieval with Multiword-Enhanced Author Topic Model

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Learning From the Past with Experiment Databases

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Time series prediction

A study of speaker adaptation for DNN-based speech synthesis

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Georgetown University at TREC 2017 Dynamic Domain Track

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Truth Inference in Crowdsourcing: Is the Problem Solved?

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Using the Artificial Neural Networks for Identification Unknown Person

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Bayesian Learning Approach to Concept-Based Document Classification

Calibration of Confidence Measures in Speech Recognition

Switchboard Language Model Improvement with Conversational Data from Gigaword

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Universidade do Minho Escola de Engenharia

Mining Association Rules in Student s Assessment Data

Software Maintenance

arxiv: v2 [cs.cv] 30 Mar 2017

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Using EEG to Improve Massive Open Online Courses Feedback Interaction

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Statistics and Data Analytics Minor

Transcription:

W4240 Data Mining Frank Wood September 6, 2010

Introduction Data mining is the search for patterns in large collections of data Learning models Applying models to large quantities of data Pattern recognition is concerned with automatically finding patterns in data / learning models Machine learning is pattern recognition with concern for computational tractability and full automation Data mining = Machine Learning = Applied Statistics Scale Computation

Example Application: ALARM, expert diagnostic system Goal: Inference in given/known/hand-specified Bayesian network Figure: ALARM stands for A Logical Alarm Reduction Mechanism. This is a medical diagnostic system for patient monitoring. It is a nontrivial belief network with 8 diagnoses, 16 findings and 13 intermediate variables. Described in [2]

Graphical Models ALARM network and most other probabilistic models can be expressed in the language of graphical models. Inference procedures such as the sum-product algorithm and belief propagation are general inference techniques that can be run on any discrete or linear-gaussian graphical model. a f e b c Figure: Directed Graphical Model : Chapter 8, Figure 22a, PRML [3]

Graphical Models Cont. Results Ability to compute marginal distribution of any subset of variable in the graphical model conditioned on any other subset of variables (values observed / fixed) Generalizes many, many inference procedures such as Kalman filter, forward-backward, etc. Can be used for parameter estimation in the case where all latent, unknown variables are parameters and all observations are fixed, known variables.

Another Application: Classification of handwritten digits Goal Build a machine that can identify handwritten digits automatically Approaches Hand craft a set of rules that separate each digit from the next Set of rules invariably grows large and unwieldy and requires many exceptions Learn a set of models for each digit automatically from labeled training data, i.e. mine a large collection of handwritten digits and produce a model of each Use model to do classification Formalism Each digit is 28x28 pixel image Vectorized into a 784 entry vector x

Handwritten Digit Recognition Training Data Figure: Hand written digits from the USPS

Machine learning approach to digit recognition Recipe Obtain a of N digits {x 1,..., x N } called the training set. Label (by hand) the training set to produce a label or target t for each digit image x Learn a function y(x) which takes an image x as input and returns an output in the same format as the target vector. Terminology The process of determining the precise shape of the function y is known as the training or learning phase. After training, the model (function y) can be used to figure out what digit unseen images might be of. The set comprised of such data is called the test set

Tools for the handwriting recognition job Supervised Regression/Classification Models Logistic regression Neural networks Support vector machines Naive Bayes classifiers Unsupervised Clustering Gaussian mixture model Model Parameter Estimation Maximum likelihood / Expectation Maximization Variational inference Sampling Sequential Monte Carlo... for all, batch or online

Example Application: Trajectory Inference From Noisy Data Goal Build a machine that can uncover and compute the true trajectory of an indirectly and noisily observed moving target Approaches Hand craft a set of rules that govern the possible movements of said target Set of rules invariably grows large and unwieldy and requires many exceptions Learn a model of the kind of movements such a target can make and perform inference in said model Formalism Example observed trajectories {x n } N n=1 Unobserved latent trajectories {z n } N n=1

Latent trajectory Inference Problem Schematic (x t, y t ) r t µ t (0,0) Figure: Schematic of trajectory inference problem

Tools for Latent Trajectory Inference Known/hand-crafted model, inference only Belief propagation Kalman filter Particle filter Switching variants thereof Hidden Markov Models Learning too / Model Parameter Estimation Maximum likelihood / Expectation Maximization Variational inference Sampling Sequential Monte Carlo... for all, batch or online Trajectory need not be physical, could be an economic indicator, completely abstract, etc.

Cool Trajectory Inference Application : Neural Decoding reconstructed true Figure: Actual and predicted hand positions (predicted from neural firing rates alone using a Kalman filter) [5]

er I in 10pt., 12pt. and 14pt. Another Application: Unsupervised Clustering Forensic analysis of printed documents, infer printer used to print document from visual features. Printer 1 Printer 2 $ -6 )*+,-.,/01)234/3-5-(! " # $ % $ 78#%9% 78&%%% 78&$%% :04';-<&#9% =>,/0<5?5'(@78#%9%A analysis using 1D projected sig- # "! " # $ % $ # "! & '( )*+,-.,/01)234/3-5-(, we have used the printers in imental procedure is depicted Figure: PCA projection of printer features [1] Figure 3: Representation of the projected signal by the first two principal components.

Another Unsupervised Clustering Application Automatic discovery of number of neurons and assignment of waveforms to neurons. Essential to electrophysiological study of the brain. Figure: Automatically sorted action potential PCA projections [4] m sorting six channels of the pursuit tracking neural data using. Projections of waveforms from channels 1-6 onto the first two

A Big Unsupervised Clustering Application Multinomial mixture model automatic document clustering for information retrieval. z n π Discrete(π) x n z n = k, Θ Multinomial(θ zn ) where x n is a bag of words or feature representation of a document, z n is a per document class indicator variable, Θ = {θ k } K k=1 is a collection of probability vectors over types (or features) (per cluster k), and π = [π 1,..., π K ], k π k = 1 is the class prior. Such a model can be used to cluster similar documents together for information retrieval (Google, Bing, etc.) purposes.

Tools for Unsupervised Clustering Known/hand-crafted model, inference only K-means Gaussian mixture models Multinomial mixture models Learning too / Model Parameter Estimation Maximum likelihood / Expectation Maximization Variational inference Sampling Sequential Monte Carlo... for all, batch or online

Tools for All Maximum likelihood / Expectation Maximization Variational inference Sampling Sequential Monte Carlo... for all, batch or online

Links and Syllabus Course home page : http://www.stat.columbia.edu/ fwood/w4240/ Guest lectures may be sprinkled throughout the course.

Prerequisites Linear Algebra Multivariate Calculus (Matrix and Vector calculus) Probability and Statistics at a Graduate Level Programming experience in some language like pascal, matlab, c++, java, c, fortran, scheme, etc. Good idea to familiarize yourself with PRML [3] Chapter 2 and Appendices B,C,D, and E. In particular Multivariate Gaussian distribution Discrete, Multinomial, and Dirichlet distributions Lagrange Multipliers Matlab

Bibliograpy I [1] G.N. Ali, P.J. Chiang, A.K. Mikkilineni, G.T.C. Chiu, E.J. Delp, and J.P. Allebach. Application of principal components analysis and gaussian mixture models to printer identification. In Proceedings of the IS&Ts NIP20: International Conference on Digital Printing Technologies, volume 20, pages 301 305. Citeseer, 2004. [2] I. Beinlich, H.J. Suermondt, R. Chavez, G. Cooper, et al. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. 256, 1989. [3] C. Bishop. Pattern Recognition and Machine Learning. Springer, New York, NY, 2006. [4] F. Wood and M. J. Black. A nonparametric Bayesian alternative to spike sorting. Journal of Neuroscience Methods, page to appear, 2008.

Bibliograpy II [5] W. Wu, M. J. Black, Y. Gao, E. Bienenstock, M. Serruya, and J. P. Donoghue. Inferring hand motion from multi-cell recordings in motor cortex using a Kalman filter. In SAB 02-Workshop on Motor Control in Humans and Robots: On the Interplay of Real Brains and Artificial Devices, pages 66 73, August 2002.