Getting started with Weka. Yishuang Geng, Kexin Shi, Pei Zhang, Angel Trifonov, Jiefeng He, Xiaolu Xiong

Similar documents
CS Machine Learning

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning From the Past with Experiment Databases

(Sub)Gradient Descent

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Python Machine Learning

Content-based Image Retrieval Using Image Regions as Query Examples

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CSL465/603 - Machine Learning

Lecture 1: Machine Learning Basics

We re Listening Results Dashboard How To Guide

16.1 Lesson: Putting it into practice - isikhnas

Activity Recognition from Accelerometer Data

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

PowerTeacher Gradebook User Guide PowerSchool Student Information System

The Revised Math TEKS (Grades 9-12) with Supporting Documents

TotalLMS. Getting Started with SumTotal: Learner Mode

Probabilistic Latent Semantic Analysis

On-Line Data Analytics

Australian Journal of Basic and Applied Sciences

Lecture 1: Basic Concepts of Machine Learning

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Introduction to Causal Inference. Problem Set 1. Required Problems

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Mining Association Rules in Student s Assessment Data

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Modeling function word errors in DNN-HMM based LVCSR systems

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Driving Author Engagement through IEEE Collabratec

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

Calibration of Confidence Measures in Speech Recognition

Introduction to the Revised Mathematics TEKS (2012) Module 1

Issues in the Mining of Heart Failure Datasets

Utilizing FREE Internet Resources to Flip Your Classroom. Presenter: Shannon J. Holden

Mining Student Evolution Using Associative Classification and Clustering

Creating Your Term Schedule

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Data Fusion Models in WSNs: Comparison and Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

FIS Learning Management System Activities

Humboldt-Universität zu Berlin

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Detecting Student Emotions in Computer-Enabled Classrooms

Laboratorio di Intelligenza Artificiale e Robotica

Computerized Adaptive Psychological Testing A Personalisation Perspective

Ericsson Wallet Platform (EWP) 3.0 Training Programs. Catalog of Course Descriptions

Speech Emotion Recognition Using Support Vector Machine

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Android App Development for Beginners

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Using dialogue context to improve parsing performance in dialogue systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CS 446: Machine Learning

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Human Emotion Recognition From Speech

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Modeling function word errors in DNN-HMM based LVCSR systems

Beyond the Pipeline: Discrete Optimization in NLP

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

An OO Framework for building Intelligence and Learning properties in Software Agents

Laboratorio di Intelligenza Artificiale e Robotica

Customized Question Handling in Data Removal Using CPHC

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

IVY TECH COMMUNITY COLLEGE

STUDENT MOODLE ORIENTATION

Preferences...3 Basic Calculator...5 Math/Graphing Tools...5 Help...6 Run System Check...6 Sign Out...8

Exploring the Feasibility of Automatically Rating Online Article Quality

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Tools and Techniques for Large-Scale Grading using Web-based Commercial Off-The-Shelf Software

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Team Love <3. Because it s all about heart.

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Concordia Language Villages STARTALK Teacher Program Curriculum

Assignment 1: Predicting Amazon Review Ratings

Word Segmentation of Off-line Handwritten Documents

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Linking Task: Identifying authors and book titles in verbose queries

Conference Presentation

Office of Planning and Budgets. Provost Market for Fiscal Year Resource Guide

The following information has been adapted from A guide to using AntConc.

EdX Learner s Guide. Release

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Transcription:

Getting started with Weka Yishuang Geng, Kexin Shi, Pei Zhang, Angel Trifonov, Jiefeng He, Xiaolu Xiong

Lesson 1.1 - Introduction

Purpose of this course Take the mystery out of data mining. How to use the Weka workbench for data mining. Explain the basic principles of several popular algorithms

Data mining with Weka What s data mining? We are overwhelmed with data Data mining is about going from the raw data to information. What could data mining do? You re at the supermarket checkout and you re happy with your bargains and the supermarket is happy you ve bought some more stuff You want a child, but you and your partner can t have one.

What is Weka? 1. A bird found only in New Zealand 2. Waikato Environment for Knowledge Analysis Weka includes: 100+ algorithms for classification 75 for data preprocessing 25 to assist with feature selection 20 for clustering, finding association rules, etc

Textbook Data Mining: Practical machine learning tools and techniques, by Ian H. Witten, Eibe Frank and Mark A. Hall. Morgan Kaufmann, 2011

Learning outcome of the course Load data into Weka and look at it Use filters to preprocess it Explore it using interactive visualization Apply classification algorithms Interpret the output Understand evaluation methods and their implications Understand various representations for models Explain how popular machine learning algorithms work Be aware of common pitfalls with data mining Use Weka on your own data and understand what you are doing!

A simple application You want to monitor the firefighters status but you cannot get into the burning houses to watch them.

A simple application Motion Detection Using RF Signals for the First Responder in Emergency Operations Firefighters Sensor to monitor their physiological information, which personal area communication capability to a centroid node. Centroid node has local area communication capability to link the terminals out of burning house. If we want to monitor their motion, what should we do?

Existing approaches Pros High detection rate. Low computational cost. Cons Add extra load to firefighter. Limited sensor location, usually on shoes. Lack of capability on detecting multiple motions,mainly used for fall detection.

Raw data

Data mining

Information from the raw data

Summary Why taking that course Materials Weka Textbook Course schedule Lectures Activities Assessments Learning outcome A simple application

Lesson 1.2 - Exploring the Explorer

Setting up Weka Download latest (Weka 3.6.10) from http://www.cs. waikato.ac.nz/ml/weka/downloading.html Self-extracting executable Java VM included (if needed) Create shortcut to Data folder in your Computer s My Documents Use the Weka shortcut from the program folder

Weka Interface Weka interfaces Explorer Experimenter GUI Command-line Explorer will be used the most

Explorer Interface Explorer Panels Preprocess Opening datasets File Filter Supervised Unsupervised

Filters Difference An additional two kinds of filtering Instances Attributes

More Preprocess Information Relation Attributes Instances Selected Attribute Name Type Other Info Attributes Editing Removing Class Visualization Status and log

Lesson 1.3 - Exploring datasets

Classification

Nominal vs. Numerical

ARFF file format

Lesson 1.4 - Building a classifier Classifying the glass dataset Interpreting J48 output J48 configuration panel... option: pruned vs unpruned trees... option: avoid small leaves Jiefeng

Click Here

What the percentage classified instances? Use theis3 confusion matrix ofcorrectly to determine how many headlamps instances were misclassified as build wind float?

Turning pruning off results in larger trees, and often yields worse results because the classifier may "overfit" the data. However, in some cases the unpruned tree performs better

1.4 Summary Building a classifier Classifying the glass dataset Interpreting J48 output J48 configuration panel... option: pruned vs unpruned trees... option: avoid small leaves

Lesson 1.5 - Using a filter

Use a filter to remove an attribute Open weather.nominal.arff

Check the filters

Set attributeindices to 3 and click OK

Apply the filter

Lesson 1.6 - Visualizing your data

Raw data visualization

Sepalwidth vs. petalwidth

Zoom in

Zoom in

Error visualization

Error visualization

Thank you! Questions?