Lecture 25. Revision

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

A study of speaker adaptation for DNN-based speech synthesis

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Probabilistic Latent Semantic Analysis

Probability and Statistics Curriculum Pacing Guide

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

AP Statistics Summer Assignment 17-18

Artificial Neural Networks written examination

Grade 6: Correlated to AGS Basic Math Skills

Mathematics process categories

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Assignment 1: Predicting Amazon Review Ratings

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Speech Emotion Recognition Using Support Vector Machine

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Generative models and adversarial training

CSL465/603 - Machine Learning

arxiv: v1 [math.at] 10 Jan 2016

UNIT ONE Tools of Algebra

Human Emotion Recognition From Speech

Lecture 1: Basic Concepts of Machine Learning

Word Segmentation of Off-line Handwritten Documents

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Probabilistic Mission Defense and Assurance

Enduring Understandings: Students will understand that

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Hardhatting in a Geo-World

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

EGRHS Course Fair. Science & Math AP & IB Courses

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Rule Learning with Negation: Issues Regarding Effectiveness

Functional Skills Mathematics Level 2 assessment

Radius STEM Readiness TM

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The Evolution of Random Phenomena

arxiv: v2 [cs.cv] 30 Mar 2017

Exposé for a Master s Thesis

MTH 215: Introduction to Linear Algebra

Algebra 2- Semester 2 Review

Truth Inference in Crowdsourcing: Is the Problem Solved?

Australian Journal of Basic and Applied Sciences

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Unit 2. A whole-school approach to numeracy across the curriculum

This scope and sequence assumes 160 days for instruction, divided among 15 units.

arxiv: v1 [cs.lg] 15 Jun 2015

Calibration of Confidence Measures in Speech Recognition

Discriminative Learning of Beam-Search Heuristics for Planning

Semi-Supervised Face Detection

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CSC200: Lecture 4. Allan Borodin

Learning From the Past with Experiment Databases

Multimedia Application Effective Support of Education

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule-based Expert Systems

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

A survey of multi-view machine learning

Action Recognition and Video

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Activity 2 Multiplying Fractions Math 33. Is it important to have common denominators when we multiply fraction? Why or why not?

Model Ensemble for Click Prediction in Bing Search Ads

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

Dublin City Schools Mathematics Graded Course of Study GRADE 4

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Math Grade 3 Assessment Anchors and Eligible Content

Detecting English-French Cognates Using Orthographic Edit Distance

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Reducing Features to Improve Bug Prediction

STA 225: Introductory Statistics (CT)

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Missouri Mathematics Grade-Level Expectations

The Good Judgment Project: A large scale test of different methods of combining expert predictions

(Sub)Gradient Descent

Mathematics. Mathematics

Why Did My Detector Do That?!

Primary National Curriculum Alignment for Wales

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

SURVIVING ON MARS WITH GEOGEBRA

BMBF Project ROBUKOM: Robust Communication Networks

The Strong Minimalist Thesis and Bounded Optimality

Average Number of Letters

Mathematics Scoring Guide for Sample Test 2005

Learning to Schedule Straight-Line Code

Physics 270: Experimental Physics

arxiv: v1 [cs.cv] 10 May 2017

Welcome to ACT Brain Boot Camp

Transcription:

Lecture 25. Revision (the content of this deck is non-examinable) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturers: Trevor Cohn Copyright: University of Melbourne

This lecture Project wrap-up Exam tips Reflections on the subject Q&A session 2

Project 2 Well done everyone! 3

SVHN: House Numbers from photos Taken from Google Street view images Manual bounding boxes by AMT workers Becoming a new standard benchmark problem, following MNIST 200k images, about 600k digits Varying resolution Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. Deep Learning and Unsupervised Feature Learning Workshop, NIPS.

Statistical Machine Learning (S2 2017) Processing pipeline Deck 25 1. Extract images from bounding boxes for each digit 2. Normalise colours 3. Flatten to greyscale 4. Filter out instances with low contrast Figure 3: Samples from SVHN dataset. Notice the large variation in font, color, 5. Resize to 64x64 light conditions etc. Blue bounding aracters dimensions height in the origiboxes refer to AMT worker marked large. (Median: 28 pixels. bounding boxes of the different characofficial dataset: SOTA ~90%, human 98% ters.

Kaggle rankings Often large change in ranking vs public leaderboard Scored based on ranking, where ties assigned equal rank 6

Exam Tips Don t panic J 7

Don t panic! Exam tips Attempt all questions * Do your best guess whenever you don t know the answer Finish easy questions first (do q s in any order) Start questions on a new page (not sub-questions) If you can t answer part of the question, skip over this and do the rest of the question * you can still get marks for later parts of the question * we don t multiply penalise for carrying errors forward Answers in point form are fine 8

What s non-examinable? Green slides This deck (well, it s just a review) Something that was in workshops but not in lectures Note that material covered in the reading is fairgame 9

Changes from last year Last year s exam questions are representative of what you will get at the exam * Make sure you understand the solutions! Dropped topics in 2017 * active learning * semi-supervised learning New topics in 2017 * independence semantics in PGMs, HMM details * deeper coverage of kernels & basis functions, optimisation, regularisation 10

Exam format Four parts A, B, C, D; worth 13, 17, 10, 10 marks Total of 50 marks, split into 11 questions 180 minutes (3 hours), so 3.6 min / mark A = short answer (1-2 sentences, based on #marks) B = method questions C = numeric / algebraic questions D = design & application scenarios 11

Sample A questions (each 1-2 marks) P 2. In words or a mathematical expression, what is the marginal likelihood for a Bayesian probabilistic model? [1 mark] Acceptable: the joint likelihood of the data and prior, after marginalising out the model parameters Acceptable: p(x) = R p(x )p( )d where x is the data, the model parameter(s), and p(x ) the likelihood and p( ) the prior Acceptable: the expected likelihood of the data, under the prior 4. In words, what does Pr(A, B C) =Pr(A C)Pr(B C) say about the dependence of A, B, C? [1 mark] A and B are conditionally independent given C. 12

Sample B question (each 3-6 marks) Question 3: Kernel methods [2 marks] 1. Consider a 2-dimensional dataset, where each point is represented by two features and the label (x 1,x 2,y). The features are binary, the label is the result of XOR function, and so the data consists of four points (0, 0, 0), (0, 1, 1), (1, 0, 1) and (1, 1, 0). Design a feature space transformation that would make the data linearly separable. [1 mark] 2. Intuitively what does the Representer Theorem say? [1 mark] Acceptable: new feature space (x 3 ),wherex 3 =(x 1 x 2 ) 2 Acceptable: a large class of linear models can be formulated such that both training and making predictions require data only in a form of a dot product Acceptable: The solution to the SVM (the weight vector) lies in the span of the data. Acceptable: w? = P n i=1 iy i x i or something similar. 13

Sample C question (each 2-3 marks) Question 5: Statistical Inference Consider the following directed PGM [3 marks] where each random variable is Boolean-valued (True or False). 1. Write the format (with empty values) of the conditional probability tables for this graph. [1 mark] 2. Suppose we observe n sets of values of A, B, C (complete observations). The maximum-likelihood principle is a popular approach to training a model such as above. What does it say to do? [1 mark] 3. Suppose we observe 5 training examples: for (A, B, C) (F, F, F); (F, F, T); (F, T, F); (T,F,T); (T,T; T ). Determine maximum-likelihood estimates for your tables. [1 mark] 14

Sample C question (cont) 1. CPTs [1 mark] 2. MLE [1 mark] ------------ Pr(A=True) ------------? ------------ ------------ Pr(B=True) ------------? ------------ ------------------ A B Pr(C=True A,B) ------------------ T T? T F? F T? F F? ------------------ Acceptable: It says to choose Q values in the tables that maximise the likelihood of the data. Acceptable: arg max Q n tables i=1 Pr(A = a i)pr(b = b i )Pr(C = c i A = a i,b = b i ) 3. Show MLE [1 mark] The MLE decouples when we have fully-observed data, and for discrete data as in this case where the variables are all Boolean we just count. The Pr(A = True) is 2/5 since we observe A as true out of five observations. Similarly for B we have the probability of True being 2/5. Finally for each configuration TT, TF, FT, FF of AB we can count the times we see C as True as a fraction of total times we observe the configuration. So we get for these probability of C = True as 1.0, 1.0, 0.0, 0.5 respectively. 15

A Deeper Insight A selection of additional topics with the aim to provide a deeper insight into main lectures content 16

Networks in real life: the Internet Image: OPTE Project Map (CC2) 17

Networks in real life: gene regulatory network Fragment of the network model by Hamid Bolouri and Eric Davidson 18

Networks in real life: transport map 19

Network analysis (1/4) Analysis of large scale real world networks has recently attracted considerable attention from research and engineering communities Networks/graphs is a list of pairwise relations (edges) between a set of objects (vertices) Example problems / types of analysis * Link prediction * Identifying frequent subgraphs * Identifying influential vertices * Community finding 20

Network analysis (2/4) Community is a group of vertices that interact more frequently within its own group than to those outside the group * Families * Friend circles * Websites (communities of webpages) * Groups of proteins that maintain a specific function in a cell This is essentially a definition of a cluster in unsupervised learning Image: Girvan and Newman, Community structure in social and biological networks, PNAS, 2002 21

Network analysis (3/4) Why community detection? * Understanding the system behind the network (e.g., structure of society) * Identifying roles of vertices (e.g., hubs, mediators) * Summary graphs (vertices communities, edges connections between communities) * Facilitate distributed computing (e.g., place data from the same community to the same server or core) There are many community detection algorithms, let s have a look at only one of the ideas 22

Network analysis (4/4) Communities are connected by a few connections, which tends to form bridges Cut the bridges to obtain communities One of the algorithms is called normalised cuts which is equivalent to spectral clustering Santa Fe institute collaboration network. Different vertex shapes correspond to primary divisions of the institute Image: Girvan and Newman, Community structure in social and biological networks, PNAS, 2002 23

Reflections on the Subject 24

Supervised learning Essentially a task of function approximation A function can be defined * Theoretically, by listing the mapping * Algorithmically * Analytically Every equation is an algorithm, but not every algorithm is an equation 25

Supervised learning Simple and more interpretable methods (e.g., linear regression) vs more complicated black box models (e.g., random forest) Apparent dichotomy: prediction quality vs interpretability However, some complex models are interpretable * Convolutional Neural Networks * In any black box model, one can study effects of removing features to get insights what is a useful feature 26

What is Machine Learning? Machine learning * a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty (such as planning how to collect more data!) (Murphy) Data mining Pattern recognition Statistics Data science Artificial intelligence 27

I ll first stay here, then move to the office hour room 28

Thank you and good luck! 29