Foundations of Small-Sample-Size Statistical Inference and Decision Making

Similar documents
Lecture 1: Machine Learning Basics

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

STA 225: Introductory Statistics (CT)

Python Machine Learning

Learning From the Past with Experiment Databases

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Probability and Statistics Curriculum Pacing Guide

(Sub)Gradient Descent

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evaluation of Teach For America:

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Word Segmentation of Off-line Handwritten Documents

Word learning as Bayesian inference

Learning Methods in Multilingual Speech Recognition

Truth Inference in Crowdsourcing: Is the Problem Solved?

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

A Program Evaluation of Connecticut Project Learning Tree Educator Workshops

Australian Journal of Basic and Applied Sciences

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Proof Theory for Syntacticians

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Version Space Approach to Learning Context-free Grammars

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Applications of data mining algorithms to analysis of medical data

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Introduction to Simulation

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

OUTLINE OF ACTIVITIES

Analysis of Enzyme Kinetic Data

AP Statistics Summer Assignment 17-18

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

An overview of risk-adjusted charts

Case study Norway case 1

Rule Learning With Negation: Issues Regarding Effectiveness

How do adults reason about their opponent? Typologies of players in a turn-taking game

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CSC200: Lecture 4. Allan Borodin

What is related to student retention in STEM for STEM majors? Abstract:

Probability estimates in a scenario tree

Mathacle PSet Stats, Concepts in Statistics and Probability Level Number Name: Date:

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

Why Did My Detector Do That?!

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Rule Learning with Negation: Issues Regarding Effectiveness

Scientific Method Investigation of Plant Seed Germination

How to Judge the Quality of an Objective Classroom Test

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

On-Line Data Analytics

School Size and the Quality of Teaching and Learning

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

Lecture 1: Basic Concepts of Machine Learning

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Probabilistic Latent Semantic Analysis

Australia s tertiary education sector

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Learning to Rank with Selection Bias in Personal Search

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Grade 6: Correlated to AGS Basic Math Skills

Probability Therefore (25) (1.33)

College Pricing and Income Inequality

College Pricing and Income Inequality

learning collegiate assessment]

Artificial Neural Networks written examination

Softprop: Softmax Neural Network Backpropagation Learning

Chapter 2 Rule Learning in a Nutshell

Multi-Lingual Text Leveling

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

Introduction to the Practice of Statistics

Julia Smith. Effective Classroom Approaches to.

Moderator: Gary Weckman Ohio University USA

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Early Warning System Implementation Guide

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

w o r k i n g p a p e r s

Software Maintenance

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Individual Differences & Item Effects: How to test them, & how to test them well

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Generative models and adversarial training

Calibration of Confidence Measures in Speech Recognition

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

Introduction to Causal Inference. Problem Set 1. Required Problems

Transcription:

Foundations of Small-Sample-Size Statistical Inference and Decision Making Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee November 3, 2016

Outline Tests of Significance for the mean population Caveats Other tests of significance Alternatives Concluding remarks V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 2 / 31

Introduction Significance test is a formal procedure for comparing observed data with a hypothesis whose truth we want to assess. The hypothesis is a statement about the parameters in a population or model. The results of a test are expressed in terms of a probability that measures how well the data and the hypothesis agree. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 3 / 31

Terminology Null Hypothesis denoted by H 0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. The null hypothesis is usually a statement of no effect" or no difference" (the default assumption that nothing happened or changed). V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 4 / 31

Terminology Null Hypothesis denoted by H 0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. The null hypothesis is usually a statement of no effect" or no difference" (the default assumption that nothing happened or changed). Alternative Hypothesis denoted by either H 1 or H a. It is the competing argument with respect to H 0, however it needs to be decided if it is one-sided or two-sided. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 4 / 31

Terminology Null Hypothesis denoted by H 0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. The null hypothesis is usually a statement of no effect" or no difference" (the default assumption that nothing happened or changed). Alternative Hypothesis denoted by either H 1 or H a. It is the competing argument with respect to H 0, however it needs to be decided if it is one-sided or two-sided. Test Statistic measures compatibility between the null hypothesis and the data. It is employed for calculating the probability needed for our test of significance. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 4 / 31

Terminology Null Hypothesis denoted by H 0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. The null hypothesis is usually a statement of no effect" or no difference" (the default assumption that nothing happened or changed). Alternative Hypothesis denoted by either H 1 or H a. It is the competing argument with respect to H 0, however it needs to be decided if it is one-sided or two-sided. Test Statistic measures compatibility between the null hypothesis and the data. It is employed for calculating the probability needed for our test of significance. p value is the probability, computed assuming that H 0 is true, that the test statistic would a take a value as extreme or more extreme than what was actually observed. The smaller the p value, the stronger the evidence against H 0 V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 4 / 31

Terminology Null Hypothesis denoted by H 0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. The null hypothesis is usually a statement of no effect" or no difference" (the default assumption that nothing happened or changed). Alternative Hypothesis denoted by either H 1 or H a. It is the competing argument with respect to H 0, however it needs to be decided if it is one-sided or two-sided. Test Statistic measures compatibility between the null hypothesis and the data. It is employed for calculating the probability needed for our test of significance. p value is the probability, computed assuming that H 0 is true, that the test statistic would a take a value as extreme or more extreme than what was actually observed. The smaller the p value, the stronger the evidence against H 0 α level of significance is the decisive value of p. If p α then we say that the data is statistically significant at level α. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 4 / 31

Example 1 In agricultural modeling earth s temperature plays an important role. We want to compare ground vs air-based temperature sensors. Ground-based sensors are expensive, and air-based (from satellites or airplanes) of infrared wavelengths may be biased. Temperature data were collected by ground and air-based sensors at 10 locations, and we want to test if they are different. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 5 / 31

Null vs Alternative hypothesis Hypotheses always refer to some population or model, not to a particular outcome. For this, we state H 0, H 1 in terms of population parameters. µ is the population s difference between ground and air temperatures. H 0 : µ = 0 vs H 1 : µ 0 If there is a reason to believe before any data collection that the parameter being tested is necessarily restricted to one particular side" of H 0 then H 1 is one-sided. Left-tailed test H 0 : µ = 0 vs H 1 : µ < 0 or Right-tailed test H 0 : µ = 0 vs H 1 : µ > 0 V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 6 / 31

Test statistic The test is based on a statistic that estimates the parameter that appears in the hypotheses. If H 0 is true then we expect the estimate to take a value close" to the parameter value specified by H 0. Values of the estimate far from the parameter value in H 0 yield evidence against H 0. test-statistic = estimate hypothesized value standard deviation of estimate The test statistic is a random variable with a distribution that we know. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 7 / 31

Test statistic for Example 1 Recall: test-statistic = The hypothesized value is µ = 0. estimate hypothesized value standard deviation of estimate The estimate of the the mean is the average of differences provided by the data., i.e. for this data d = 1.55. Let s assume that we know (typically not true) that the standard deviation of population is σ = 2 z = d 0 σ/ n = 1.55 0 2/ = 2.4508 10 V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 8 / 31

p value Density 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-4 -3-2 -1 0 1 2 3 4 Density 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-4 -3-2 -1 0 1 2 3 4 Density 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-4 -3-2 -1 0 1 2 3 4 The key to calculating the p value is the sampling distribution of the test statistic. Assuming that the data is normal (needs to be checked), z is a realization of Z from the standard normal distribution N(0, 1). Probability Less than Upper Bound is 0.024998 Probability Greater than Lower Bound is 0.024998 Probability Outside Limits is 0.049996 Critical Value Critical Value Critical Value H 1 : µ < µ 0, p = P(Z z ) H 1 : µ > µ 0, p = P(Z z ) H 1 : µ µ 0, p = 2P(Z z ) V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 9 / 31

Back to example Density 0.4 Probability Outside Limits is 0.014254 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0-4 -3-2 -1 0 1 2 3 4 Critical Value Example: p = 2P(Z 2.4508 ) = 0.0143. A mean difference as large as that observed would occur fewer than 14 times in 1000 samples (of size 10) if the population mean difference were 0. This is convincing evidence that the mean difference between ground and air-based measured temperatures is not zero. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 10 / 31

α level of significance A p value is more informative than a reject-or-not" the H 0. However, a quick way of assessment is needed. α level of significance shows how much evidence against H 0 you need as decisive. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 11 / 31

α level of significance A p value is more informative than a reject-or-not" the H 0. However, a quick way of assessment is needed. α level of significance shows how much evidence against H 0 you need as decisive. If p value α, reject H 0 (accept H 1 ). If p value > α, then the data do not provide sufficient evidence to reject H 0. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 11 / 31

V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 12 / 31

Assumption: known variance H 0 : µ = c vs H 1 : µ c Recall z statistic = x c σ/ n Typically variance is unknown and needs to be estimated We do by the sample variance, s Test-statistic (mean of population): t statistic = x c s/ n Test follows the same strategy (compute p value and compare it with α) V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 13 / 31

Example 1 In agricultural modeling earth s temperature plays an important role. We want to compare ground vs air-based temperature sensors. Ground-based sensors are expensive, and air-based (from satellites or airplanes) of infrared wavelengths may be biased. Temperature data were collected by ground and air-based sensors at 10 locations, and we want to test if they are different. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 14 / 31

Example 1 H 0 : µ = 0 vs H 1 : µ 0 t = 1.55 0 0.7706/ 10 = 6.458 p value = 2P(T 9 6.458) 0.0002 A mean difference as large as that observed would occur fewer than 2 times in 10,000 samples (of size 10) if the population mean difference were 0. p value < α so reject H 0. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 15 / 31

Robustness of t tests t tests are not robust against outliers ( x, s not resistant to outliers). Average height of soybean plants at R1 stage of their growth is 16". Imagine 3 plants with height 16" and 3 with 20", their average now is 18". t tests robust against deviations from normality but not to outliers and presence of strong skewness Right-skewed data V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 16 / 31

Some advice Small sample size: use t test if the data are close to normal. If outliers are present do not use t. Moderate sample size: use t test except in the presence of strong skewness or outliers. Large sample size: use t test even for clearly skewed distributions (transform the data first, e.g. use logarithm) Right-skewed data Log-transformed data V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 17 / 31

Checking for outliers and skewness Quantiles of Input Sample Normal quantile plot Stemplot Boxplot 0 QQ Plot of Sample Data versus Standard Normal -0.5-1 -1.5-2 -2.5-3 -2-1.5-1 -0.5 0 0.5 1 1.5 2 Standard Normal Quantiles Example 1 V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 18 / 31

Inference for standard deviations, or proportions or parameters related to regression. Different hypotheses but same strategy. What only changes if the test-statistic and its associated distribution. if small sample size: proportions use the binomial distribution if large sample size: proportions use normal distribution V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 19 / 31

Summary The point of a test of significance is to provide a clear statement of the degree of evidence provided by the sample against H 0. We wrote p value α, however there is no sharp border between significant and not significant. There is an increasingly strong evidence to reject H 0 as the p value decreases. When H 0 (no effect or no difference) can be rejected at the usual level α = 0.05, there is good evidence that an effect is present (could be small). Design carefully your study and plot your data. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 20 / 31

To p or not to p? A Bayesian approach to hypothesis testing Attempt a statistical learning approach. classification clustering V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 21 / 31

Statistical Learning Example: Classification Consider a set of data obtained from soybean plants. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 22 / 31

Statistical Learning Example: Classification Consider a set of data obtained from soybean plants. Each soybean has exactly one disease. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 22 / 31

Statistical Learning Example: Classification Consider a set of data obtained from soybean plants. Each soybean has exactly one disease. Goal is to understand" the characteristics of (4) different types of soybean diseases given features extracted from the plant so that when we are given a new soybean crop to be able to predict accurately what kind of disease it may have. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 22 / 31

Statistical Learning Example: Classification Consider a set of data obtained from soybean plants. Each soybean has exactly one disease. Goal is to understand" the characteristics of (4) different types of soybean diseases given features extracted from the plant so that when we are given a new soybean crop to be able to predict accurately what kind of disease it may have. p = 35 predictors. Based on condition and attributes of leaves, fruitpods, seeds, etc. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 22 / 31

Statistical Learning Example: Classification Consider a set of data obtained from soybean plants. Each soybean has exactly one disease. Goal is to understand" the characteristics of (4) different types of soybean diseases given features extracted from the plant so that when we are given a new soybean crop to be able to predict accurately what kind of disease it may have. p = 35 predictors. Based on condition and attributes of leaves, fruitpods, seeds, etc. Only n = 12 examples, 3 for each disease class! Dataset sampled from UC Irvine data Repository: https://archive.ics.uci.edu/ml/datasets/soybean+(small) V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 22 / 31

A Small Dataset of Soybeans Want to maximize the amount of data we can use to build the model on due to small sample size. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 23 / 31

A Small Dataset of Soybeans Want to maximize the amount of data we can use to build the model on due to small sample size. Can we use all of the data to build the model? V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 23 / 31

A Small Dataset of Soybeans Want to maximize the amount of data we can use to build the model on due to small sample size. Can we use all of the data to build the model? No! Need to validate the model to ensure our accuracy results are not biased! V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 23 / 31

A Small Dataset of Soybeans Want to maximize the amount of data we can use to build the model on due to small sample size. Can we use all of the data to build the model? No! Need to validate the model to ensure our accuracy results are not biased! One option: leave one out cross validation. Train the model on all but one data point, and see how the model performs on the held out instance. Average out the error over all the instances. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 23 / 31

Logistic Regression: A Statistics Approach We first model using Logistic Regression. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 24 / 31

Logistic Regression: A Statistics Approach We first model using Logistic Regression. Logistic Regression attempts to model the log probability ratio log linearly in the predictors probability of disease 1 probability of disease 2 V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 24 / 31

Logistic Regression: A Statistics Approach We first model using Logistic Regression. Logistic Regression attempts to model the log probability ratio log linearly in the predictors probability of disease 1 probability of disease 2 Parameters are estimated by some optimization method (maximum likelihood approach) and significance of predictors can be tested using significance tests (similar to what we discussed earlier). V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 24 / 31

Logistic regression for the soybeans dataset Employ logistic regression on 11 points Predict using the 12th point Measure the error (or accuracy) by answering the question did I get it right?" Repeat 12 times so all points get held out once V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 25 / 31

Logistic regression for the soybeans dataset Employ logistic regression on 11 points Predict using the 12th point Measure the error (or accuracy) by answering the question did I get it right?" Repeat 12 times so all points get held out once Model Accuracy Logistic Regression 91.67% V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 25 / 31

Logistic regression for the soybeans dataset Employ logistic regression on 11 points Predict using the 12th point Measure the error (or accuracy) by answering the question did I get it right?" Repeat 12 times so all points get held out once Model Accuracy Logistic Regression 91.67% 91.67% means that 11 out of 12 times I got it right. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 25 / 31

Something different: Decision Tree Decision trees are recursive partitioning algorithms that come-up with a tree-like structure. These structures represent patterns in an underlying data set. The top node is the root node specifying a testing condition of which the outcome corresponds to a branch leading up to an internal node. The terminal nodes (leaf nodes) of the tree assign the classifications. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 26 / 31

Decision tree Splitting decision Strategy is to minimize the impurity at the leaves level Stopping decision Avoid overfitting: if you split too much, one gets many pure classes but with very few members in it. Assignment decision: what class to assign to a leaf node? Look at the majority class within the leaf node. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 27 / 31

Back to soybean problem Now attempt to model using a decision-tree. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 28 / 31

Back to soybean problem Now attempt to model using a decision-tree. Model attempts to build a tree (using 11 data) to create the most pure nodes at each step, and leaf nodes are labeled according to the majority class. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 28 / 31

Back to soybean problem Now attempt to model using a decision-tree. Model attempts to build a tree (using 11 data) to create the most pure nodes at each step, and leaf nodes are labeled according to the majority class. New examples (the 12th ) are then sent down the tree and classified according to the label of the leaf they end up in. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 28 / 31

Back to soybean problem Now attempt to model using a decision-tree. Model attempts to build a tree (using 11 data) to create the most pure nodes at each step, and leaf nodes are labeled according to the majority class. New examples (the 12th ) are then sent down the tree and classified according to the label of the leaf they end up in. Model Accuracy Logistic Regression 91.67% Decision Tree 75% This means 9 out of 12 were classified correctly Can we do better? V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 28 / 31

Turning Decision Trees into Random Forests Stochastically generate a large number of decision trees. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 29 / 31

Turning Decision Trees into Random Forests Stochastically generate a large number of decision trees. At each split within each tree use a random subset of predictors instead of all of them. Predict on a new example (soybean) by taking the majority class prediction out of the K trees. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 29 / 31

Turning Decision Trees into Random Forests Stochastically generate a large number of decision trees. At each split within each tree use a random subset of predictors instead of all of them. Predict on a new example (soybean) by taking the majority class prediction out of the K trees. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 29 / 31

Take home message Model Accuracy Logistic Regression 91.67% Decision Tree 75% Random Forest 100% Statistical Learning methods sometimes may be more appropriate than more traditional methods. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 30 / 31

Take home message Model Accuracy Logistic Regression 91.67% Decision Tree 75% Random Forest 100% Statistical Learning methods sometimes may be more appropriate than more traditional methods. When dealing with a small dataset, statistical learning techniques such as leave one out cross validation allow training on a large portion of the dataset while giving a good estimate for the true error. V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 30 / 31

Conclusion Dived into hypothesis testing bolts and nuts Use with caution hypothesis testing especially when small sample size data (e.g., look for outliers and skewness) Nothing is wrong with p value however need to take it for what it is (a probability such that the smaller it is the stronger the evidence against the H 0 ). There are alternatives, e.g. statistical learning V. Maroulas (maroulas@math.utk.edu) (University of Tennessee) Inference and Decision Making November 3, 2016 31 / 31