Machine Learning. Lecture 1: Introduction to Machine Learning. Nevin L. Zhang

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Generative models and adversarial training

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Machine Learning Basics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CS Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v2 [cs.cv] 30 Mar 2017

Probabilistic Latent Semantic Analysis

Human Emotion Recognition From Speech

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Artificial Neural Networks written examination

A Reinforcement Learning Variant for Control Scheduling

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Assignment 1: Predicting Amazon Review Ratings

A study of speaker adaptation for DNN-based speech synthesis

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Lecture 10: Reinforcement Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Word Segmentation of Off-line Handwritten Documents

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Axiom 2013 Team Description Paper

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Welcome to. ECML/PKDD 2004 Community meeting

Laboratorio di Intelligenza Artificiale e Robotica

Evolutive Neural Net Fuzzy Filtering: Basic Description

Interactive Whiteboard

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Emotion Recognition Using Support Vector Machine

A Case Study: News Classification Based on Term Frequency

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v1 [cs.cv] 10 May 2017

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

arxiv: v2 [cs.ro] 3 Mar 2017

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Machine Learning and Development Policy

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Disambiguation of Thai Personal Name from Online News Articles

Learning Methods for Fuzzy Systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Reinforcement Learning by Comparing Immediate Reward

Rule Learning with Negation: Issues Regarding Effectiveness

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Mining Association Rules in Student s Assessment Data

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.lg] 15 Jun 2015

A survey of multi-view machine learning

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Reducing Features to Improve Bug Prediction

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Switchboard Language Model Improvement with Conversational Data from Gigaword

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS 446: Machine Learning

Issues in the Mining of Heart Failure Datasets

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

On the Formation of Phoneme Categories in DNN Acoustic Models

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Modeling function word errors in DNN-HMM based LVCSR systems

Forget catastrophic forgetting: AI that learns after deployment

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Australian Journal of Basic and Applied Sciences

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Semi-Supervised Face Detection

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Model Ensemble for Click Prediction in Bing Search Ads

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Modeling user preferences and norms in context-aware systems

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Universidade do Minho Escola de Engenharia

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Transcription:

Machine Learning Lecture 1: Introduction to Machine Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set of notes is based on internet resources and KP Murphy (2012). Machine learning: a probabilistic perspective. MIT Press. (Chapter 1) Nevin L. Zhang (HKUST) Machine Learning 1 / 24

What is Machine Learning? We are in the era of big data. There are about 1 trillion web pages; One hour of video is uploaded to YouTube every second, amounting to 10 years of content every day; The genomes of 1000s of people, each of which has a length of 3.8 9 base pairs, have been sequenced by various labs; Walmart handles more than 1M transactions per hour and has databases containing more than 2.5 petabytes (2.5 10 1 5) of information;... This deluge of data calls for automated methods of data analysis, which is what machine learning provides. Nevin L. Zhang (HKUST) Machine Learning 2 / 24

What is Machine Learning? We define machine learning as a set of methods that can automatically detect patterns in data, use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty. Nevin L. Zhang (HKUST) Machine Learning 3 / 24

Types of Machine Learning Machine learning algorithms are divided into three main types: Supervised learning Unsupervised learning Reinforcement learning Deep learning can be applied in all those three types of tasks. Nevin L. Zhang (HKUST) Machine Learning 4 / 24

Types of Machine Learning Machine learning algorithms are divided into three main types: Supervised learning Unsupervised learning Reinforcement learning Deep learning can be applied in all those three types of tasks. There are many relatively more specialized algorithms: Semi-supervised learning Active learning Ensemble learning Transfer learning... Nevin L. Zhang (HKUST) Machine Learning 4 / 24

Supervised Learning Problem statement: Given: A labeled training set D = {x i, y i } N i=1 To Learn: A mapping y = f (x) from inputs x to outputs y Training input x i can be Simply a vector of features (aka attributes, covariates), or Complex structured objects such as an image, a document, a graph, etc Output (aka response variable) y i is can be A categorical/nominal variable. In this case we have a classification problem. Or a real-valued variable. In this case we have a regression problem Nevin L. Zhang (HKUST) Machine Learning 5 / 24

Classification From labeled training data, learn a mapping y = f (x) where y {1,..., C}. When C = 2, we have a binary classification problem. When C > 2, we have a multiclass classification problem. We regard it as a function approximation problem: We assume that x and y are related by an unknown function y = f (x) The task is to obtain an estimate ˆf of f from the labeled training data. We want to use ˆf to make predictions on novel inputs, meaning ones that we have not seen before (this is called generalization). Nevin L. Zhang (HKUST) Machine Learning 6 / 24

Classification: An Illustrative Example Left: A training set of colored shapes. Right: The representation of the data as a design matrix There are three test cases to classify It is clear how to classify the blue crescent. The other two cases are less clear. This example shows that we need to use probability in classification. Nevin L. Zhang (HKUST) Machine Learning 7 / 24

A Probabilistic Perspective on Classification A probabilistic formulation of classification: From training data D = {x i, y i } N i=1, learn a conditional distribution p(y x). Assign an instance x to the classification with the maximum probability: ŷ = ˆf (x) = arg max p(y x) c=1 An advantage of the probabilistic method is that uncertainty is explicitly modeled. If the probability max C c=1 p(y x) is not high enough, we might want to delay the decision until more information becomes available. C Nevin L. Zhang (HKUST) Machine Learning 8 / 24

Real-World Classification Problems Object recognition and image classification (ImageNet) Character recognition (recognize handwritten characters) Document classification (Is a customer review positive and negative) Spam detection and filtering Intrusion detection Medical diagnosis... Nevin L. Zhang (HKUST) Machine Learning 9 / 24

Regression From labeled training data, learn a mapping y = f (x) where y is continuous. Example: Each training example consists of a single real-valued input x i R, and a single real-valued response y i R. Two possible models to fit to the data: a straight line and a quadratic function. In general, the inputs are high dimensional. Nevin L. Zhang (HKUST) Machine Learning 10 / 24

Real-World Regression Problems Predict tomorrows stock market price given current market conditions and other possible side information. Predict the age of a viewer watching a given video on YouTube. Predict the location in 3d space of a robot arm end effector, given control signals (torques) sent to its various motors. Predict the amount of prostate specific antigen (PSA) in the body as a function of a number of different clinical measurements. Predict the temperature at any location inside a building using weather data, time, door sensors.... Nevin L. Zhang (HKUST) Machine Learning 11 / 24

Unsupervised Learning Sometimes, we have only unlabeled data D = {x i } N i=1, where there isn t a response variable. The goal of unsupervised learning is to discover interesting structures/patterns in the data. Some examples of supervised learning: Clustering Dimension reduction Structure discovery... Nevin L. Zhang (HKUST) Machine Learning 12 / 24

Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that Objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters) Nevin L. Zhang (HKUST) Machine Learning 13 / 24

Real-World Clustering Problems Market researchers use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships between different groups of consumers/potential customers, and for use in market segmentation, Product positioning, New product development and Selecting test markets. In the study of social networks, clustering may be used to recognize communities within large groups of people. In human genetic clustering, the similarity of genetic data is used in clustering to infer population structures. Recommender systems are designed to recommend new items based on a user s tastes. They sometimes use clustering algorithms to predict a user s preferences based on the preferences of other users in the user s cluster.... Nevin L. Zhang (HKUST) Machine Learning 14 / 24

Dimensionality Reduction When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace. This is called dimensionality reduction Often used in data visualization. Nevin L. Zhang (HKUST) Machine Learning 15 / 24

Structure Discovery Sometimes we would like to discover a graph structure about how a set of variables are related. In the following example, we have a structure about how word occurrences are related in a collection of document. There are latent variables, which can be interpreted as topics. Nevin L. Zhang (HKUST) Machine Learning 16 / 24

Reinforcement Learning In reinforcement learning, an agent learns how to act or behave from occasional reward or punishment signals It is the way Dolphins in Ocean Park learn amazing tricks. Currently, the most famous reinforcement learning system is AlphaGo. Nevin L. Zhang (HKUST) Machine Learning 17 / 24

Deep Learning Deep learning is a class of machine learning algorithms that: Use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. Learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners. Learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. Nevin L. Zhang (HKUST) Machine Learning 18 / 24

Deep Learning Deep learning has a unique advantage, i.e., automatic feature extraction. It means that this algorithm automatically grasps the relevant features required for the solution of the problem. It reduces the burden on the programmer to select the features explicitly. Nevin L. Zhang (HKUST) Machine Learning 19 / 24

A Brief History of AI and Machine learning Nevin L. Zhang (HKUST) Machine Learning 20 / 24

A Brief History of Machine learning Nevin L. Zhang (HKUST) Machine Learning 21 / 24

We will cover... Supervised Learning Linear and Polynomial Regression Logistic and Softmax Regression Generative Models for Classification Learning Theory Deep Learning Deep Feedforward Networks Convolutional Neural Networks Recurrent Neural Networks Unsupervised Learning Variational Autoencoders Generative Adversarial Networks Mixture Models Reinforcement Learning Basic RL Value-Based Deep RL, Policy-Based Deep RL Nevin L. Zhang (HKUST) Machine Learning 22 / 24

The No Free Lunch Theorem The No Free Lunch theorem states that there is no one algorithm that works best for every problem. The assumptions of a great algorithm for one problem may not hold for another problem. It is common in machine learning to try multiple algorithms and find one that works best for a particular problem. Nevin L. Zhang (HKUST) Machine Learning 23 / 24

Questions about an ML Algorithm 1 What does it do? (User) 2 How does it work? (Programmer) 3 Why does it work the way it does? (Algorithm Designer) Pros and cons w.r.t alternatives 4 Why can it achieve its goal? (Theoretician) We will focus mostly on the first three questions. Nevin L. Zhang (HKUST) Machine Learning 24 / 24