Data Mining with Weka

Similar documents
Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

CS Machine Learning

Appendix L: Online Testing Highlights and Script

Python Machine Learning

Introduction to the Revised Mathematics TEKS (2012) Module 1

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Creating Your Term Schedule

Lecture 1: Basic Concepts of Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Minitab Tutorial (Version 17+)

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial

Content-based Image Retrieval Using Image Regions as Query Examples

USER GUIDANCE. (2)Microphone & Headphone (to avoid howling).

SECTION 12 E-Learning (CBT) Delivery Module

Mining Association Rules in Student s Assessment Data

16.1 Lesson: Putting it into practice - isikhnas

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

BLACKBOARD TRAINING PHASE 2 CREATE ASSESSMENT. Essential Tool Part 1 Rubrics, page 3-4. Assignment Tool Part 2 Assignments, page 5-10

We re Listening Results Dashboard How To Guide

CHANCERY SMS 5.0 STUDENT SCHEDULING

Rule Learning With Negation: Issues Regarding Effectiveness

Office of Planning and Budgets. Provost Market for Fiscal Year Resource Guide

BADM 641 (sec. 7D1) (on-line) Decision Analysis August 16 October 6, 2017 CRN: 83777

TotalLMS. Getting Started with SumTotal: Learner Mode

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

InCAS. Interactive Computerised Assessment. System

CSL465/603 - Machine Learning

Quick Start Guide 7.0

i>clicker Setup Training Documentation This document explains the process of integrating your i>clicker software with your Moodle course.

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Rule Learning with Negation: Issues Regarding Effectiveness

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Lecture 1: Machine Learning Basics

Applications of data mining algorithms to analysis of medical data

Houghton Mifflin Online Assessment System Walkthrough Guide

Creating an Online Test. **This document was revised for the use of Plano ISD teachers and staff.

Test How To. Creating a New Test

EdX Learner s Guide. Release

The Revised Math TEKS (Grades 9-12) with Supporting Documents

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Driving Author Engagement through IEEE Collabratec

IVY TECH COMMUNITY COLLEGE

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Probability and Statistics Curriculum Pacing Guide

Test Administrator User Guide

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Preparing for the School Census Autumn 2017 Return preparation guide. English Primary, Nursery and Special Phase Schools Applicable to 7.

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

STUDENT MOODLE ORIENTATION

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Managing the Student View of the Grade Center

FORCE : TECHNIQUES DE DESSIN DYNAMIQUE POUR L'ANIMATION FROM PEARSON EDUCATION

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

Spring 2015 Achievement Grades 3 to 8 Social Studies and End of Course U.S. History Parent/Teacher Guide to Online Field Test Electronic Practice

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Using SAM Central With iread

Online Testing - Quick Troubleshooting Tips

Experience College- and Career-Ready Assessment User Guide

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Australian Journal of Basic and Applied Sciences

Shockwheat. Statistics 1, Activity 1

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Outreach Connect User Manual

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

LEARN TO PROGRAM, SECOND EDITION (THE FACETS OF RUBY SERIES) BY CHRIS PINE

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

DegreeWorks Advisor Reference Guide

Workshop Guide Tutorials and Sample Activities. Dynamic Dataa Software

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Android App Development for Beginners

SkillPort Quick Start Guide 7.0

Activity Recognition from Accelerometer Data

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Urban Analysis Exercise: GIS, Residential Development and Service Availability in Hillsborough County, Florida

Once your credentials are accepted, you should get a pop-window (make sure that your browser is set to allow popups) that looks like this:

Getting Started with TI-Nspire High School Science

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CS 446: Machine Learning

Introduction to Mobile Learning Systems and Usability Factors

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Case study Norway case 1

Modeling function word errors in DNN-HMM based LVCSR systems

Multivariate k-nearest Neighbor Regression for Time Series data -

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

(Sub)Gradient Descent

Survey and Analysis of University Clustering

Interactive Whiteboard

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Justin Raisner December 2010 EdTech 503

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Classify: by elimination Road signs

Transcription:

Data Mining with Weka Class 1 Lesson 1 Introduction

Data Mining with Weka a practical course on how to use Weka for data mining explains the basic principles of several popular algorithms 2

Data Mining with Weka What s data mining? We are overwhelmed with data Data mining is about going from data to information, information that can give you useful predictions Example? You re at the supermarket checkout. You re happy with your bargains and the supermarket is happy you ve bought some more stuff Data mining vs. machine learning 3

Data Mining with Weka What s Weka? A bird found only in New Zealand? Data mining workbench Waikato Environment for Knowledge Analysis Machine learning algorithms for data mining tasks 100+ algorithms for classification 75 for data preprocessing 25 to assist with feature selection 20 for clustering, finding association rules, etc 4

Data Mining with Weka What will you learn? Load data into Weka and look at it Use filters to preprocess it Explore it using interactive visualization Apply classification algorithms Interpret the output Understand evaluation methods and their implications Understand various representations for models Explain how popular machine learning algorithms work Be aware of common pitfalls with data mining Use Weka on your own data and understand what you are doing! 5

Class 1: Getting started with Weka Install Weka Explore the Explorer interface Explore some datasets Build a classifier Interpret the output Use filters Visualize your data set 6

Course organization Class 1 Getting started with Weka Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Class 5 Putting it all together Lesson 1.1 Lesson 1.2 Lesson 1.3 Lesson 1.4 Lesson 1.5 Lesson 1.6 Activity 1 Activity 2 Activity 3 Activity 4 Activity 5 Activity 6 7

Textbook This textbook discusses data mining, and Weka, in depth: Data Mining: Practical machine learning tools and techniques, by Ian H. Witten, Eibe Frank and Mark A. Hall. Morgan Kaufmann, 2011 The ebook format is available 8

Data Mining with Weka Class 1 Lesson 2 Exploring the Explorer

Lesson 1.2: Exploring the Explorer Class 1 Getting started with Weka Lesson 1.1 Introduction Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Lesson 1.2 Exploring the Explorer Lesson 1.3 Exploring datasets Lesson 1.4 Building a classifier Lesson 1.5 Using a filter Class 5 Putting it all together Lesson 1.6 Visualizing your data 10

Lesson 1.2: Exploring the Explorer Download from http://www.cs.waikato.ac.nz/ml/weka (for Windows, Mac, Linux) Weka 3.6.10 (the latest stable version of Weka) (includes datasets for the course) (it s important to get the right version, 3.6.10) 11

Lesson 1.2: Exploring the Explorer Performance comparisons Graphical interface Command line interface 12

Lesson 1.2: Exploring the Explorer 13

Lesson 1.2: Exploring the Explorer attributes instances 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No 14

Lesson 1.2: Exploring the Explorer open file weather.nominal.arff 15

Lesson 1.2: Exploring the Explorer attributes attribute values 16

Lesson 1.2: Exploring the Explorer Install Weka Get datasets Open Explorer Open a dataset (weather.nominal.arff) Look at attributes and their values Edit the dataset Save it? Course text Section 1.2 The weather problem Chapter 10 Introduction to Weka 17

Data Mining with Weka Class 1 Lesson 3 Exploring datasets

Lesson 1.3: Exploring datasets Class 1 Getting started with Weka Lesson 1.1 Introduction Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Lesson 1.2 Exploring the Explorer Lesson 1.3 Exploring datasets Lesson 1.4 Building a classifier Lesson 1.5 Using a filter Class 5 Putting it all together Lesson 1.6 Visualizing your data

Lesson 1.3: Exploring datasets attributes instances 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No 20

Lesson 1.3: Exploring datasets open file weather.nominal.arff attributes attribute values class 21

Lesson 1.3: Exploring datasets Classification sometimes called supervised learning Dataset: classified examples Model that classifies new examples classified example attribute 1 attribute 2 attribute n class instance: fixed set of features discrete ( nominal ) continuous ( numeric ) discrete: classification problem continuous: regression problem 22

Lesson 1.3: Exploring datasets open file weather.numeric.arff attributes attribute values class 23

Lesson 1.3: Exploring datasets open file glass.arff 24

Lesson 1.3: Exploring datasets The classification problem weather.nominal, weather.numeric Nominal vs numeric attributes ARFF file format glass.arff dataset Sanity checking attributes 29 Course text Section 11.1 Preparing the data Loading the data into the Explorer

Data Mining with Weka Class 1 Lesson 4 Building a classifier

Lesson 1.4: Building a classifier Class 1 Getting started with Weka Lesson 1.1 Introduction Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Lesson 1.2 Exploring the Explorer Lesson 1.3 Exploring datasets Lesson 1.4 Building a classifier Lesson 1.5 Using a filter Class 5 Putting it all together Lesson 1.6 Visualizing your data 27

Lesson 1.4: Building a classifier Use J48 to analyze the glass dataset Open file glass.arff (or leave it open from the last lesson) Check the available classifiers Choose the J48 decision tree learner (trees>j48) Run it Examine the output Look at the correctly classified instances and the confusion matrix 28

Lesson 1.4: Building a classifier Investigate J48 Open the configuration panel Check the More information Examine the options Use an unpruned tree Look at leaf sizes Set minnumobj to 15 to avoid small leaves Visualize tree using right click menu 29

Lesson 1.4: Building a classifier From C4.5 to J48 ID3 (1979) C4.5 (1993) C4.8 (1996?) C5.0 (commercial) J48 30

Lesson 1.4: Building a classifier Classifiers in Weka Classifying the glass dataset Interpreting J48 output J48 configuration panel option: pruned vs unpruned trees option: avoid small leaves J48 ~ C4.5 35 Course text Section 11.1 Building a decision tree Examining the output

Data Mining with Weka Class 1 Lesson 5 Using a filter

Lesson 1.5: Using a filter Class 1 Getting started with Weka Lesson 1.1 Introduction Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Lesson 1.2 Exploring the Explorer Lesson 1.3 Exploring datasets Lesson 1.4 Building a classifier Lesson 1.5 Using a filter Class 5 Putting it all together Lesson 1.6 Visualizing your data 33

Lesson 1.5: Using a filter Use a filter to remove an attribute Open weather.nominal.arff (again!) Check the filters supervised vs unsupervised attribute vs instance Choose the unsupervised attribute filter Remove Check the More information; look at the options Set attributeindices to 3 and click OK Apply the filter Recall that you can Save the result Press Undo 34

Lesson 1.5: Using a filter Remove instances where humidity is high Supervised or unsupervised? Attribute or instance? Look at them Select RemoveWithValues Set attributeindex Set nominalindices Apply Undo 35

Lesson 1.5: Using a filter Fewer attributes, better classification! Open glass.arff Run J48 (trees>j48) Remove Fe Remove all attributes except RI and MG Look at the decision trees Use right click menu to visualize decision trees 36

Lesson 1.5: Using a filter Filters in Weka Supervised vs unsupervised, attribute vs instance To find the right one, you need to look! Filters can be very powerful Judiciously removing attributes can improve performance increase comprehensibility Course text Section 11.2 Loading and filtering files 41

Data Mining with Weka Class 1 Lesson 6 Visualizing your data

Lesson 1.6: Visualizing your data Class 1 Getting started with Weka Lesson 1.1 Introduction Class 2 Evaluation Class 3 Simple classifiers Class 4 More classifiers Lesson 1.2 Exploring the Explorer Lesson 1.3 Exploring datasets Lesson 1.4 Building a classifier Lesson 1.5 Using a filter Class 5 Putting it all together Lesson 1.6 Visualizing your data 39

Lesson 1.6: Visualizing your data Using the Visualize panel Open iris.arff Bring up Visualize panel Click one of the plots; examine some instances Set x axis to petalwidth and y axis to petallength Click on Class colour to change the colour Bars on the right change correspond to attributes: click for x axis; right click for y axis Jitter slider Show Select Instance: Rectangle option Submit, Reset, Clear and Save 40

Lesson 1.6: Visualizing your data Visualizing classification errors Run J48 (trees>j48) Visualize classifier errors (from Results list) Plot predictedclass against class Identify errors shown by confusion matrix 41

Lesson 1.6: Visualizing your data Get down and dirty with your data Visualize it Clean it up by deleting outliers Look at classification errors (there s a filter that allows you to add classifications as a new attribute) Course text Section 11.2 Visualization 42

Data Mining with Weka