Introduction to ML. URL:

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Lecture 1: Basic Concepts of Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

CSL465/603 - Machine Learning

A Case Study: News Classification Based on Term Frequency

(Sub)Gradient Descent

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Laboratorio di Intelligenza Artificiale e Robotica

Rule Learning with Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS 446: Machine Learning

Linking Task: Identifying authors and book titles in verbose queries

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Using dialogue context to improve parsing performance in dialogue systems

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Australian Journal of Basic and Applied Sciences

Reducing Features to Improve Bug Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Universidade do Minho Escola de Engenharia

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Detecting English-French Cognates Using Orthographic Edit Distance

Multivariate k-nearest Neighbor Regression for Time Series data -

Artificial Neural Networks written examination

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Semi-Supervised Face Detection

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Corpus Linguistics (L615)

Indian Institute of Technology, Kanpur

Word Segmentation of Off-line Handwritten Documents

Speech Recognition at ICSI: Broadcast News and beyond

Cross-lingual Short-Text Document Classification for Facebook Comments

Human Emotion Recognition From Speech

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Radius STEM Readiness TM

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

arxiv: v2 [cs.cv] 30 Mar 2017

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Reinforcement Learning by Comparing Immediate Reward

Disambiguation of Thai Personal Name from Online News Articles

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Beyond the Pipeline: Discrete Optimization in NLP

Go fishing! Responsibility judgments when cooperation breaks down

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Exposé for a Master s Thesis

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.lg] 3 May 2013

A Bayesian Learning Approach to Concept-Based Document Classification

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

The stages of event extraction

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Truth Inference in Crowdsourcing: Is the Problem Solved?

arxiv: v1 [cs.cl] 2 Apr 2017

AQUA: An Ontology-Driven Question Answering System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Learning Methods for Fuzzy Systems

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Welcome to. ECML/PKDD 2004 Community meeting

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Calibration of Confidence Measures in Speech Recognition

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Emotion Recognition Using Support Vector Machine

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

An investigation of imitation learning algorithms for structured prediction

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Time series prediction

Evolutive Neural Net Fuzzy Filtering: Basic Description

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

A Vector Space Approach for Aspect-Based Sentiment Analysis

Applications of memory-based natural language processing

Discriminative Learning of Beam-Search Heuristics for Planning

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Modeling function word errors in DNN-HMM based LVCSR systems

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Transcription:

Introduction to ML Abhijit Mishra Research Scholar Center for Indian Language Technology Department of Computer Science and Engineering Indian Institute of Technology Bombay Email: abhijitmishra@cse.iitb.ac.in URL: http://www.cse.iitb.ac.in/~abhijitmishra

Task: Get mangoes of a particular type from the market

Task 1: Solve an equation Task 2: Get mangoes of a particular type from the market Randomness?? Ambiguity?? Nuances??

Randomness Slight Variation in shape, size, color and odor etc. Ambiguity Similarity in size, color but belong to different categories Nuances?? Differences in size, color but belong to the same category How to make machines understand the

Introduction to ML Roadmap Definition of Machine Learning Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Probabilistic approaches Non-probabilistic approaches Example - Text Classification Books, Online Courses and Tools

Definition of Machine Learning Machine learning1 is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Explores the study and construction of algorithms that can learn and make predictions on data Applications: Pattern Recognition (e.g., Handwriting Recognition, Face detection, Gesture detection) Prediction of events (e.g., Stock market predictions, weather forecasting, prediction of diseases based on symptoms) Almost all popular online services (e.g., Google, Facebook, Amazon) use ML. https://en.wikipedia.org/wiki/machine_learning

Introduction to ML Roadmap Definition of Machine Learning Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Probabilistic approaches Non-probabilistic approaches Example - Text Classification

Learning to Predict Classification Classification is the problem of predicting to which of a set of categories (sub-populations) a new observation belongs. Input: Properties of the new observation Output: or the class of the new observation When, the problem is called binary classification problem (e.g., classifying emails into spam or non-spam categories) When the problem is called N-class/multi-class classification problem (e.g., classifying documents into multiple categories like sports, health, politics etc.).

Learning to Predict Regression When the out-put space of a predictor is a real number instead of (nominal categories as in classification), the prediction problem is referred to as statistical regression or simply regression. Input: Properties of the new observation Output: where Example: Predicting the temperature of a day given the climatic conditions of the previous day, estimating number of units of a new product to be sold in an year.

Note: Structured prediction Deals with more complex output (instead of scalar output as in cases of classification and regression) Output: where N Example: Automatic text translation (output is a sentence in another language), Parse tree generation (output is a tree structure), Image Captioning We will only focus on classification problems.

Introduction to ML Roadmap Definition of Machine Learning Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Probabilistic approaches Non-probabilistic approaches Example - Text Classification

Learning Objective Back to Mangoes Task: Given some basic measurable properties of a certain mango, predict which category it belongs to. Color Weight Smell Dimensions Taste?? Alphons o/alice/ir win (Classes) (Measurable properties/ Attributes/ Features)

Learning Objectives What to learn? Correspondences between various attributes of the input object and the classes How to learn? Rule based learning Statistical learning Example based learning

Learning Paradigms Rule Based Learning is based on a set of rules handcrafted by humans. If (weight<0.5 && color == yellow color== green ) { category = Alphonso ; } else if ( ) { category = Alice ; } The collection of rules or the rule-base has to be exhaustive enough to capture all the corner cases. Problems: Extremely hard, needs domain expertise and is highly time-consuming

Learning Paradigms Example Based A very small set examples having of complete information (both input and classes) are available. Templates for each classes are learned automatically. When a new observation arrives, class prediction is made based on the template that fits the observation best. Problems: Templates are generic representatives of classes that are supposed to represent the whole sub-population belonging to certain classes. For many problems, it is quite hard to come up with such representatives with small number of examples. Susceptible to change in the nature of the input data

Learning Paradigms Statistical Beneficial if a large set of diversified examples are available. Feature-Class correspondences are learned better. Easy to update classifier if the nature of the input data changes. Leverage huge volume of available webdata Problems: Overlearning can happen sometime (referred to as overfitting). Feature selection affects system

Introduction to ML Roadmap Definition of Machine Learning Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Probabilistic approaches Non-probabilistic approaches Example - Text Classification

Statistical Machine Learning- Supervised Approaches Learning is based on a set of observations for class labels are available. Alphonso Learned Model Alice Irwin Alphonso

Statistical Machine Learning- Semi-Supervised Approaches Learning is based on a set of observations for class labels are available AND another set (typically of larger volume than labelled set) of observations for which class labels are not available Alphonso Learned Model Alice Irwin Alphonso

Statistical Machine Learning- Un-Supervised Approaches Learning when no class labels are available.

Statistical Machine Learning- Reinforcement Learning Learning happens with the objective of maximizing the reward associated with the task. Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented. Association is captured in terms of rewards.

Introduction to ML Roadmap Definition of Machine Learning Books, Online Courses and Tools Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Probabilistic approaches Non-probabilistic approaches Example - Text Classification

Supervised Approaches Recap: Color Weight Smell Dimensions Taste?? Alphons o/alice/ir win (Classes) (Measurable properties/ Attributes/ Features)

Supervised Approaches Probabilistic Models Given a set of features the classification decision of probabilistic models can be expressed as where,

Supervised Approaches Naïve Bayes Posterior Prior Likelihood The prior can be assumed to be a multinomial distribution for classification problems

Supervised Approaches Naïve Bayes (1) Now if we assume that features are independent of each other. Note: The independent assumption may not hold true for many real life problems.

Supervised Approaches Logistic Regression Remember: In Logistic Regression is directly estimated Where u follows a regular weighted linear equation The coefficients ( and have to be learned during training).

Supervised Approaches: Non-Probabilistic Models Class-1 Class-1 Class-2 Class-2 (x1,x2)

Supervised Approaches: KNearest Neighbor K-closest neighbors are decided Class-1 Class-2 based on a pre-defined distance measure. The class to which maximum number of close neighbors belong to becomes the winner class

Distance/similarity measures Euclidian Distance (between vectors X1 and X2) Which is a special case of Minkowski Distance Cosine Distance

Supervised Approaches: Support Vector Machines Class-1 w. x b = 0 Class-2 f(x,w,b) = sign(w. x - b)

SVMs: Specifying the boundary e r P s= s la e C t n zo dic +1 s= s Cla e t ic zon d e Pr M = Margin Width = -1 2 w.w w. x b = 1 Plusw. x b = 0 PlaneClassifier Boundary Minus-Plane w. x b = -1 Given a guess of w and b we can Compute whether all data points in the correct half-planes Compute the width of the margin So now we just need to write a program to search the space of w s and b s to find the widest margin that matches all the datapoints. This is primarily done

Supervised Approaches Decision Tree l al a us c c i i o or or nu ss g g i a t e t te n cl o ca ca c Small MarSt A Big, Medium Color Green Yellow A TaxInc < 80 A B There could be more than one tree that fits the same data!

Supervised Approaches Note It is important to decide a set of features that adequately explains the data. Selecting extremely small number of features may underspecify the data and may not help the classifier to learn properly As the number of features increases, the modelcomplexity increases (i.e., more number of parameters to be learned and chances of overfitting increases). Very high dimensional feature vectors make it unintuitive to analyze them, design distance functions and performing combinatorics and optimizations. This is known as Curse of Dimensionality

Introduction to ML Roadmap Definition of Machine Learning Books, Online Courses and Tools Learning to predict Classification Regression Learning Paradigms Rule based Statistical Example Based Statistical Machine Learning Supervised Semi-supervised Unsupervised Reinforcement Supervised approaches Generative approaches Discriminative approaches Example - Text Classification

Example Text Classification Text classification is an important problem in the field of Natural Language Processing and Machine Learning. Objective: Assign labels to a given text with a class Example: 1: Obama won the election: Politics 2: Brasil lost the football match: Sports

Problems in Text Classification Lexical Problems: Presence of ambiguous words e.g., Cricket (game) vs Cricket (insect) Structural Problems: Complexity at the syntactic level e.g., Mohd. Kaif, who was the hero of the Natwest final match against England in 2002, has joined BJP and will be running for an MP position. (Politics) Semantic Problems: Complexity at the semantic level e.g., With the humiliating defeat in Bihar, INC s innings seems to be over. Pragmatic Problems: e.g., India lost to Zimbawe yesterday (Sports) Bernie lost to Clinton in Newyork. (Politics)

Text Classification Method Any Unseen Document Compute Features Features Some Documents Annotation Training Data Labels MODEL (Naïve Bayes, SVM, Decision Tree etc.) Prediction

Text Classification Feature Extraction Example: Training Sample: (Domain classification) 1: Obama won the election: Politics 2: Brasil lost the football match: Sports Features: Vocabulary: <Obama, won, the, election, Brasil, lost, football, match> Bag of Word Features based on presence/absence: 1: <1,1,1,1,0,0,0,0>:0 2: <0,0,1,0,1,1,1,1>:1

Text Classification Training and Testing Training: Weight of each feature towards a label is computed by training algorithm. Weight decides predictability. Test: Based on the features presented in the test data, the combined weightage is computed and a label is decided. Problem: When a feature is not seen in the training data (Data sparsity problem). Solution instead of taking Bag of Word based features, consider bag of senses, word

Text Classification Evaluation Metric Performance of classifiers are typically measured by Accuracy, Precision, Recall and FMeasure For a binary classification problem, if the class lables are positive and negative True Positive (TP): Number of test documents that are actually positive, are predicted positive True Negative (TN): Number of test documents that are actually negative, are predicted negative. False Positive (FP): Number of test documents that are actually negative, are predicted positive. False Negative (FN): Number of test documents that are actually positive, are predicted negative.

Text Classification Evaluation Metric (1)

Text Classification - DEMO Package: Scikit-learn (install numpy, scipy, matplotlib and scikit-learn packages) Demo: Naïve Bayes SVM KNN Decision Tree

Books and Online Courses Books Machine Learning by Tom Mitchell Pattern Recognition and Machine Learning by Christopher M. Bishop Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar Machine learning: a Probabilistic Perspective Kevin Murphy Bayesian Reasoning and Machine Learning - David Barber Probabilistic Graphical Models: Principles and Techniques by Daphne Koller, Nir Friedman Courses Machine Learning - Stanford University (Coursera) Andrew Ng Mining Massive Datasets Stanford Online

Tools Java Weka (for supervised/semi-supervised) (www.cs.waikato.ac.nz/ml/weka/) Mallet (for unsupervised) (www.mallet.cs.umass.edu) Python Scikit-Learn (http://scikit-learn.org/) Statsmodel (www.statsmodels.sourceforge.net) R statistical packages (https://cran.r-project.org/web/packages/)

Thank you

Questions?

References C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html Statistical Learning Theory by Vladimir Vapnik, WileyInterscience; 1998 Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).

Image URLS depositphotos.com vizagcityonline.com en.wikipedia.org/wiki/list_of_mango_culti vars tropicalfloridagardens.com alphonsomango.net alamy.com