Theme Introduction : Learning from Data Ke Chen Machine Learning and Optimization Research Group
Learning from Data Where does all this fit? Artificial Intelligence Statistics / Mathematics Data Mining Machine Perception Computer Vision Computational Audition. Robotics Learning from Data (No definition of a field is perfect the diagram above is just one interpretation, mine ;-)
Learning from Data The world is drowning in data. Book sales : Amazon makes 250,000 sales/deliveries per day Genetics : 100,000 genes sequenced while-u-wait (almost) Search : ~10 billion Google Images / ~50hrs video per min uploaded to YouTube Health records : NHS plan to have 60m+ electronic records in place by 2016 This theme studies algorithms that enable us to extract meaning from data.
Learning from Data Data is recorded from some real-world phenomenon. What might we want to do with that data? Prediction - what can we predict about this phenomenon? Description - how can we describe/understand this phenomenon in a new way?
Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description An introductory course unit of Machine Learning Lecturers: Gavin Brown and Ross King
Machine Learning and Data Mining Spam emails How can we predict if something is spam/genuine?
Machine Learning and Data Mining Medical Records / Novel Drugs What characteristics of a patient indicate they may react well/badly to a new drug? How can we predict whether it will potentially hurt rather then help them?
Building Models of the Data HISTORICAL HEALTH RECORDS x1 x2 Label 98.7 157.6 1 93.6 138.8 0 42.8 171.9 0 92.8 154.5 1 Learning Algorithm x1 x2 85.2, 160.3 Model Predicted Health Status 1 (healthy)
Building Models of the Data Model
Period 1 Oct/Nov Period 2 Nov/Dec COMP61011 Foundations of Machine Learning COMP61021 Modeling & Visualization of High Dimensional Data Prediction Description An advanced course unit of Machine Learning. Lecturer: Ke Chen
Modeling and Visualization of High Dimensional Data Feature extraction A small number of salient facial features can be learned from facial images for different applications, e.g. face recognition How can we extract such features?
Modeling and Visualization of High Dimensional Data Gene Maps The human body has about 24,000 active genes soon you will be able to buy your own gene map for a few hundred pounds. How can we visualize this?
Modeling and Visualization of High Dimensional Data Image processing Gesture recognition how can we represent the motion of a human with so many complex joints and angles?
Pre-requisite knowledge Vectors Matrix properties, e.g. determinant, rank, inverse Vector Space properties, e.g. orthonormal basis Eigenvectors and Eigenvalues Matrix Calculus, e.g. derivatives in matrix form Optimisation basics, e.g. Lagrange multipliers
Learning from Data.. Prerequisites MATHEMATICS This is a mathematical subject. You must be comfortable with probabilities and algebra. PROGRAMMING You must be able to program, and pick up a new language relatively easily. We provide support for Matlab. http://studentnet.cs.manchester.ac.uk/pgt/comp61011 http://studentnet.cs.manchester.ac.uk/pgt/comp61021
Matlab MATrix LABoratory Interactive scripting language Interpreted (i.e. no compiling) Objects possible, not compulsory Dynamically typed Flexible GUI / plotting framework Large libraries of tools Highly optimized for maths Available free from Uni, but usable only when connected to our network (e.g. via VPN) Module-specific software supported on school machines only.
Learning from Data.. Why NOT to do this! 1. If you don t like maths. 61011 is reasonably challenging. But 61021 is really very HARD. Another valid name for machine learning is Computational Statistics. 2. If you are not a confident programmer. This is an MSc in computer science. You HAVE to be able to code well. You are highly likely to fail this unit if you cannot. People did last year. 3. If you have the I want to use machine learning to do X syndrome This is a real technical subject. It s not magic BTW You will learn nothing about Big Data, or how to deal with it
Syllabus COMP61011 (Foundations of Machine Learning) Linear Models Support Vector Machines Nearest Neighbour Methods Decision Trees Combining Models - ensemble methods, mixtures of experts, boosting Feature Selection Probabilistic Classifiers and Bayes Theorem Algorithm assessment - overfitting, generalisation, comparing two algorithms COMP61021 (Modeling and Visualizing High Dimensional Data) Background/introduction Mathematics basics Principal component analysis (PCA) Linear discriminative analysis (LDA) Self-organising map (SOM) Multi-dimensional scaling (MDS) Isometric feature mapping (ISOMAP) Locally linear embedding (LLE)
Textbooks Not compulsory textbook. Lecture notes will be provided in class. Ethem Alpaydin (2014): Introduction to Machine Learning (3 rd Ed.), MIT Press.