Stat 613 Fall 2017 Genevera Allen. Material on the Midterm & What you need to know: 1. Regression, Penalized Regression, Non-linear Regression.

Similar documents
Python Machine Learning

(Sub)Gradient Descent

Probability and Statistics Curriculum Pacing Guide

Assignment 1: Predicting Amazon Review Ratings

Learning From the Past with Experiment Databases

Lecture 1: Machine Learning Basics

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Reducing Features to Improve Bug Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Comment-based Multi-View Clustering of Web 2.0 Items

Probabilistic Latent Semantic Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Australian Journal of Basic and Applied Sciences

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v1 [cs.lg] 3 May 2013

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Universityy. The content of

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Human Emotion Recognition From Speech

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

CS 446: Machine Learning

Speech Emotion Recognition Using Support Vector Machine

Mathematics. Mathematics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Copyright by Sung Ju Hwang 2013

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Data Fusion Through Statistical Matching

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

A survey of multi-view machine learning

Comparison of network inference packages and methods for multiple networks inference

Evolutive Neural Net Fuzzy Filtering: Basic Description

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Speech Recognition at ICSI: Broadcast News and beyond

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Individual Differences & Item Effects: How to test them, & how to test them well

Axiom 2013 Team Description Paper

Semi-Supervised Face Detection

arxiv: v1 [cs.lg] 15 Jun 2015

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Multivariate k-nearest Neighbor Regression for Time Series data -

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Time series prediction

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Applications of data mining algorithms to analysis of medical data

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Rule Learning with Negation: Issues Regarding Effectiveness

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Generative models and adversarial training

Activity Recognition from Accelerometer Data

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Stopping rules for sequential trials in high-dimensional data

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Indian Institute of Technology, Kanpur

Statistics and Data Analytics Minor

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Issues in the Mining of Heart Failure Datasets

Medical Complexity: A Pragmatic Theory

Artificial Neural Networks written examination

Statewide Framework Document for:

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

INPE São José dos Campos

Rule Learning With Negation: Issues Regarding Effectiveness

Tun your everyday simulation activity into research

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Knowledge Transfer in Deep Convolutional Neural Nets

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

A Case Study: News Classification Based on Term Frequency

How to Judge the Quality of an Objective Classroom Test

Introduction to Causal Inference. Problem Set 1. Required Problems

The Strong Minimalist Thesis and Bounded Optimality

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Wenguang Sun CAREER Award. National Science Foundation

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Lecture 1: Basic Concepts of Machine Learning

Cross-Year Stability in Measures of Teachers and Teaching. Heather C. Hill Mark Chin Harvard Graduate School of Education

Universidade do Minho Escola de Engenharia

CS Machine Learning

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Model Ensemble for Click Prediction in Bing Search Ads

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

WHEN THERE IS A mismatch between the acoustic

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Calibration of Confidence Measures in Speech Recognition

2 nd grade Task 5 Half and Half

Honors Mathematics. Introduction and Definition of Honors Mathematics

Transcription:

Stat 613 Fall 2017 Genevera Allen Material on the Midterm & What you need to know: 1. Regression, Penalized Regression, Non-linear Regression. For an applied word problem, you should be able to decide which method would be appropriate and justify your choice. You should be able to mathematically characterize properties of penalties and penalized regression estimators. 2. Classification: KNN, Nearest Centroid / Naive Bayes, Discriminant Analysis, Logistic / Multinomial Regression, SVMs / Kernel SVMs. For an applied word problem, you should be able to decide which method would be appropriate and justify your choice. You should be able to mathematically characterize properties of various classifiers. 3. Model Validation. You should be able to recognize situations where the process of statistical learning, model selection and/or model assessment is done incorrectly or in a way that will bias You should be able to set up correct procedures for selecting tuning parameters and assessing the model fit in applied scenarios. 4. Matrix Factorizations: PCA, Sparse PCA, ICA, NMF, and MDS. From an applied word problem, you should be able to decide which method would be appropriate and justify your choice. 1

5. Clustering: K-means, Hierarchical, and Biclustering. From an applied word problem, you should be able to decide which method would be appropriate and justify your choice. 6. Additionally, you should be able to examine a new problem mathematically to understand its properties and relate it to statistical learning methods covered in class. 2

Stat 613 Fall 2017 Genevera Allen Sample Midterm Exam Questions Disclaimer: These are sample questions from past exams that are meant to serve as examples of the types of questions that may be on your midterm. They are not comprehensive and do not reflect the full scope of problems that will be on the exam. Your actual exam questions may be harder and/or easier than these questions. 1. Suppose you are fitting a linear regression model with response Y R n and predictors X R n p where the columns of X are orthogonal. (a) What is the solution to the lasso problem: minimize 1 2 Y X β 2 2 + λ β 1? (b) What is the solution to the non-negative lasso problem: minimize 1 2 Y X β 2 2 + λ β 1 Subject to β j 0 for j = 1,..., p? 2. A business analyst is trying to predict market demand for a product over the next six months. He has 90 features of interest measured from 275 stores and decides to use the elastic net for his prediction. To select the optimal regularization parameters, he uses five-fold cross-validation. As his boss wants an estimate of the prediction error, he runs five-fold cross-validation again. For each fold, he fits the elastic net with the previously selected regularization parameter value to fourthfifths of the data and uses the one-fifth left out to estimate the prediction error. He averages the prediction error over each of the five folds and reports this to his boss. Is this an unbiased estimate of the prediction error? If so, why? If not, why not and how would you alter the procedure to obtain an unbiased estimate? 1 3. A friend proposes a new penalty function P γ (t) = log(γ+1) log(γ t + 1) for some parameter γ > 0. Suppose you use this function in a linear regression setting minimizing 1 2 Y X β 2 2 + λp γ(β). 3

(a) What is the behavior of this penalty function? Justify this mathematically. (b) If γ 0, to which other penalty is this most similar? (c) If γ, to which other penalty is this most similar? (d) Is this penalty convex? (e) Describe a scenario in which you may want to use this penalty over other more common penalties. 4. For each of the following classification scenarios, which method would you recommend? Why? If you feel like you need more information, specify what information you need and how this information would change your recommendation. (a) A scientist cares only about misclassification error. She is trying to predict 10 classes based on 530 samples and 62 predictors. (b) A scientist wants to find out which variables are most important for classifying between two classes. He has 180 samples and 5600 features. (c) A scientist has data that is highly correlated. She wants to find out which variables are most important for classifying between two classes. (d) A scientist wants to classify between two classes with 8000 observations and 64 features. He cares only about prediction error. 5. A medical researcher runs PCA on his microarray data consisting of 24,000 genes and 105 Glioblastoma tumor samples. The scatterplot of PC1 verses PC2 reveals three tight clusters of the samples. The researcher is elated as he thinks he has discovered three new subtypes (groups of patients exhibiting similar genomic profiles) of Glioblastoma. To check his discovery, he runs PCA on a similar microarray data set with 19,000 genes and 72 samples obtained from a colleague. The scatterplot of PC1 verses PC2 no longer shows any clustering of the samples. The researcher is now confused and unsure of which set of results he should believe. (a) What happened here? What could explain the researchers findings? (b) Would you recommend another approach? If so, what? Justify your responses. 6. For each of the following, choose the best combination of loss function plus 4

penalty from the following lists. Justify your choice. Loss functions: absolute error squared error logistic loss hinge loss Penalties: lasso ridge adaptive lasso elastic net (a) An oil company has measured p 10, 000 geological features (very complex, highly correlated features) for n 1, 000, 000 samples of prospective locations for drilling a new well. They want to find out which features are most important for predicting a well s two year productivity levels (continuous). (b) An online advertising company is trying to predict whether an individual will like a youtube video based on their demographic information and browsing history. They have a sample of n = 11, 923 likes or dislikes for the video and p = 62 features. (c) A scientist has tested n = 52 rats for sensitivity to a particular drug (continuous) along with a custom-built protein-array p = 648. She wants to know not only which proteins are associated with drug sensitivity, but also the extent to which they are associated. (d) A neuroscientist wants to build a neural decoder that can most accurately classify between when a rat is moving to the left or the right in a maze based on the firing patterns of p 5, 000 recorded neurons. The rat was in the maze for a total of n = 320 time segments for which the direction (left or right) of movement was recorded. 7. List similarities AND differences between the two given methods. (a) Quadratic Discriminant Analysis vs. Support Vector Machines with a second degree polynomial kernel. (b) Hierarchical Clustering vs. Forward Step-wise Regression. 5

(c) K-Means Clustering vs. Naive Bayes classifier. (d) Adaptive Lasso vs. Lasso. (e) Multi-Dimensional Scaling vs. Principal Components Analysis. 6