Theme Introduction : Learning from Data. Ke Chen Machine Learning and Optimization Research Group

Similar documents
Mathematics. Mathematics

Python Machine Learning

Probabilistic Latent Semantic Analysis

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CSL465/603 - Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

CS 446: Machine Learning

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

While you are waiting... socrative.com, room number SIMLANG2016

School of Innovative Technologies and Engineering

Axiom 2013 Team Description Paper

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Syllabus ENGR 190 Introductory Calculus (QR)

EGRHS Course Fair. Science & Math AP & IB Courses

Lecture 1: Machine Learning Basics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

WHEN THERE IS A mismatch between the acoustic

Australian Journal of Basic and Applied Sciences

Computerized Adaptive Psychological Testing A Personalisation Perspective

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS


A Case Study: News Classification Based on Term Frequency

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Reducing Features to Improve Bug Prediction

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

BENCHMARK MA.8.A.6.1. Reporting Category

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

B.S/M.A in Mathematics

A survey of multi-view machine learning

COMS 622 Course Syllabus. Note:

Software Maintenance

Rule Learning With Negation: Issues Regarding Effectiveness

Math 96: Intermediate Algebra in Context

Lecture 1: Basic Concepts of Machine Learning

Artificial Neural Networks written examination

Control Tutorials for MATLAB and Simulink

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

Using dialogue context to improve parsing performance in dialogue systems

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning Disability Functional Capacity Evaluation. Dear Doctor,

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Introduction to CS 100 Overview of UK. CS September 2015

Switchboard Language Model Improvement with Conversational Data from Gigaword

Mathematics process categories

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

(Sub)Gradient Descent

Extending Place Value with Whole Numbers to 1,000,000

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

Lecture Videos to Supplement Electromagnetic Classes at Cal Poly San Luis Obispo

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Statistics and Data Analytics Minor

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

A Comparison of Two Text Representations for Sentiment Analysis

Semi-Supervised Face Detection

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

Universidade do Minho Escola de Engenharia

Human Emotion Recognition From Speech

Evolution of Symbolisation in Chimpanzees and Neural Nets

arxiv: v2 [cs.cv] 30 Mar 2017

Grade 6: Correlated to AGS Basic Math Skills

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

MYCIN. The MYCIN Task

Reflective problem solving skills are essential for learning, but it is not my job to teach them

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Course Content Concepts

Learning From the Past with Experiment Databases

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Laboratorio di Intelligenza Artificiale e Robotica

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

MTH 141 Calculus 1 Syllabus Spring 2017

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

STA 225: Introductory Statistics (CT)

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Mathematics subject curriculum

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Humboldt-Universität zu Berlin

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Transcription:

Theme Introduction : Learning from Data Ke Chen Machine Learning and Optimization Research Group

Learning from Data Where does all this fit? Artificial Intelligence Statistics / Mathematics Data Mining Machine Perception Computer Vision Computational Audition. Robotics Learning from Data (No definition of a field is perfect the diagram above is just one interpretation, mine ;-)

Learning from Data The world is drowning in data. Book sales : Amazon makes 250,000 sales/deliveries per day Genetics : 100,000 genes sequenced while-u-wait (almost) Search : ~10 billion Google Images / ~50hrs video per min uploaded to YouTube Health records : NHS plan to have 60m+ electronic records in place by 2016 This theme studies algorithms that enable us to extract meaning from data.

Learning from Data Data is recorded from some real-world phenomenon. What might we want to do with that data? Prediction - what can we predict about this phenomenon? Description - how can we describe/understand this phenomenon in a new way?

Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description An introductory course unit of Machine Learning Lecturers: Gavin Brown and Ross King

Machine Learning and Data Mining Spam emails How can we predict if something is spam/genuine?

Machine Learning and Data Mining Medical Records / Novel Drugs What characteristics of a patient indicate they may react well/badly to a new drug? How can we predict whether it will potentially hurt rather then help them?

Building Models of the Data HISTORICAL HEALTH RECORDS x1 x2 Label 98.7 157.6 1 93.6 138.8 0 42.8 171.9 0 92.8 154.5 1 Learning Algorithm x1 x2 85.2, 160.3 Model Predicted Health Status 1 (healthy)

Building Models of the Data Model

Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description An advanced course unit of Machine Learning. Lecturer: Ke Chen

Modeling and Visualization of High Dimensional Data Feature extraction A small number of salient facial features can be learned from facial images for different applications, e.g. face recognition How can we extract such features?

Modeling and Visualization of High Dimensional Data Gene Maps The human body has about 24,000 active genes soon you will be able to buy your own gene map for a few hundred pounds. How can we visualize this?

Modeling and Visualization of High Dimensional Data Image processing Gesture recognition how can we represent the motion of a human with so many complex joints and angles?

Pre-requisite knowledge Vectors Matrix properties, e.g. determinant, rank, inverse Vector Space properties, e.g. orthonormal basis Eigenvectors and Eigenvalues Matrix Calculus, e.g. derivatives in matrix form Optimisation basics, e.g. Lagrange multipliers

Learning from Data.. Prerequisites MATHEMATICS This is a mathematical subject. You must be comfortable with probabilities and algebra. PROGRAMMING You must be able to program, and pick up a new language relatively easily. We provide support for Matlab. http://studentnet.cs.manchester.ac.uk/pgt/comp61011 http://studentnet.cs.manchester.ac.uk/pgt/comp61021

Matlab MATrix LABoratory Interactive scripting language Interpreted (i.e. no compiling) Objects possible, not compulsory Dynamically typed Flexible GUI / plotting framework Large libraries of tools Highly optimized for maths Available free from Uni, but usable only when connected to our network (e.g. via VPN) Module-specific software supported on school machines only.

Learning from Data.. Why NOT to do this! 1. If you don t like maths. 61011 is reasonably challenging. But 61021 is really very HARD. Another valid name for machine learning is Computational Statistics. 2. If you are not a confident programmer. This is an MSc in computer science. You HAVE to be able to code well. You are highly likely to fail this unit if you cannot. People did last year. 3. If you have the I want to use machine learning to do X syndrome This is a real technical subject. It s not magic BTW You will learn nothing about Big Data, or how to deal with it

Syllabus COMP61011 (Foundations of Machine Learning) Linear Models Support Vector Machines Nearest Neighbour Methods Decision Trees Combining Models - ensemble methods, mixtures of experts, boosting Feature Selection Probabilistic Classifiers and Bayes Theorem Algorithm assessment - overfitting, generalisation, comparing two algorithms COMP61021 (Modeling and Visualizing High Dimensional Data) Background/introduction Mathematics basics Principal component analysis (PCA) Linear discriminative analysis (LDA) Self-organising map (SOM) Multi-dimensional scaling (MDS) Isometric feature mapping (ISOMAP) Locally linear embedding (LLE)

Textbooks Not compulsory textbook. Lecture notes will be provided in class. Ethem Alpaydin (2014): Introduction to Machine Learning (3 rd Ed.), MIT Press.