Getting started with Weka Yishuang Geng, Kexin Shi, Pei Zhang, Angel Trifonov, Jiefeng He, Xiaolu Xiong
Lesson 1.1 - Introduction
Purpose of this course Take the mystery out of data mining. How to use the Weka workbench for data mining. Explain the basic principles of several popular algorithms
Data mining with Weka What s data mining? We are overwhelmed with data Data mining is about going from the raw data to information. What could data mining do? You re at the supermarket checkout and you re happy with your bargains and the supermarket is happy you ve bought some more stuff You want a child, but you and your partner can t have one.
What is Weka? 1. A bird found only in New Zealand 2. Waikato Environment for Knowledge Analysis Weka includes: 100+ algorithms for classification 75 for data preprocessing 25 to assist with feature selection 20 for clustering, finding association rules, etc
Textbook Data Mining: Practical machine learning tools and techniques, by Ian H. Witten, Eibe Frank and Mark A. Hall. Morgan Kaufmann, 2011
Learning outcome of the course Load data into Weka and look at it Use filters to preprocess it Explore it using interactive visualization Apply classification algorithms Interpret the output Understand evaluation methods and their implications Understand various representations for models Explain how popular machine learning algorithms work Be aware of common pitfalls with data mining Use Weka on your own data and understand what you are doing!
A simple application You want to monitor the firefighters status but you cannot get into the burning houses to watch them.
A simple application Motion Detection Using RF Signals for the First Responder in Emergency Operations Firefighters Sensor to monitor their physiological information, which personal area communication capability to a centroid node. Centroid node has local area communication capability to link the terminals out of burning house. If we want to monitor their motion, what should we do?
Existing approaches Pros High detection rate. Low computational cost. Cons Add extra load to firefighter. Limited sensor location, usually on shoes. Lack of capability on detecting multiple motions,mainly used for fall detection.
Raw data
Data mining
Information from the raw data
Summary Why taking that course Materials Weka Textbook Course schedule Lectures Activities Assessments Learning outcome A simple application
Lesson 1.2 - Exploring the Explorer
Setting up Weka Download latest (Weka 3.6.10) from http://www.cs. waikato.ac.nz/ml/weka/downloading.html Self-extracting executable Java VM included (if needed) Create shortcut to Data folder in your Computer s My Documents Use the Weka shortcut from the program folder
Weka Interface Weka interfaces Explorer Experimenter GUI Command-line Explorer will be used the most
Explorer Interface Explorer Panels Preprocess Opening datasets File Filter Supervised Unsupervised
Filters Difference An additional two kinds of filtering Instances Attributes
More Preprocess Information Relation Attributes Instances Selected Attribute Name Type Other Info Attributes Editing Removing Class Visualization Status and log
Lesson 1.3 - Exploring datasets
Classification
Nominal vs. Numerical
ARFF file format
Lesson 1.4 - Building a classifier Classifying the glass dataset Interpreting J48 output J48 configuration panel... option: pruned vs unpruned trees... option: avoid small leaves Jiefeng
Click Here
What the percentage classified instances? Use theis3 confusion matrix ofcorrectly to determine how many headlamps instances were misclassified as build wind float?
Turning pruning off results in larger trees, and often yields worse results because the classifier may "overfit" the data. However, in some cases the unpruned tree performs better
1.4 Summary Building a classifier Classifying the glass dataset Interpreting J48 output J48 configuration panel... option: pruned vs unpruned trees... option: avoid small leaves
Lesson 1.5 - Using a filter
Use a filter to remove an attribute Open weather.nominal.arff
Check the filters
Set attributeindices to 3 and click OK
Apply the filter
Lesson 1.6 - Visualizing your data
Raw data visualization
Sepalwidth vs. petalwidth
Zoom in
Zoom in
Error visualization
Error visualization
Thank you! Questions?