Lesson 1: Import Datasets, Basic Statistics, Descriptive Statistics, and Statistics by Category/Group

Similar documents
MOODLE 2.0 GLOSSARY TUTORIALS

Using SAM Central With iread

SECTION 12 E-Learning (CBT) Delivery Module

Adult Degree Program. MyWPclasses (Moodle) Guide

Managing the Student View of the Grade Center

ecampus Basics Overview

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Creating an Online Test. **This document was revised for the use of Plano ISD teachers and staff.

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

Outreach Connect User Manual

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Your School and You. Guide for Administrators

MyUni - Turnitin Assignments

Introduction to Moodle

NCAA Eligibility Center High School Portal Instructions. Course Module

POWERTEACHER GRADEBOOK

Online ICT Training Courseware

Office of Planning and Budgets. Provost Market for Fiscal Year Resource Guide

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial

Moodle Student User Guide

Manipulative Mathematics Using Manipulatives to Promote Understanding of Math Concepts

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Justin Raisner December 2010 EdTech 503

Sapphire Elementary - Gradebook Setup

Houghton Mifflin Online Assessment System Walkthrough Guide

Storytelling Made Simple

An Introductory Blackboard (elearn) Guide For Parents

CS Machine Learning

Introduction to the Revised Mathematics TEKS (2012) Module 1

Longman English Interactive

Student Handbook. This handbook was written for the students and participants of the MPI Training Site.

Field Experience Management 2011 Training Guides

Probability and Statistics Curriculum Pacing Guide

Once your credentials are accepted, you should get a pop-window (make sure that your browser is set to allow popups) that looks like this:

Minitab Tutorial (Version 17+)

Schoology Getting Started Guide for Teachers

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

ACADEMIC TECHNOLOGY SUPPORT

Excel Intermediate

STUDENT MOODLE ORIENTATION

Level 1 Mathematics and Statistics, 2015

Preparing for the School Census Autumn 2017 Return preparation guide. English Primary, Nursery and Special Phase Schools Applicable to 7.

Many instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories.

INSTRUCTOR USER MANUAL/HELP SECTION

ALEKS. ALEKS Pie Report (Class Level)

The Revised Math TEKS (Grades 9-12) with Supporting Documents

Introduction to WeBWorK for Students

Connect Microbiology. Training Guide

INTERMEDIATE ALGEBRA PRODUCT GUIDE

LMS - LEARNING MANAGEMENT SYSTEM END USER GUIDE

Experience College- and Career-Ready Assessment User Guide

/ On campus x ICON Grades

DegreeWorks Advisor Reference Guide

Tour. English Discoveries Online

Beginning Blackboard. Getting Started. The Control Panel. 1. Accessing Blackboard:

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

TotalLMS. Getting Started with SumTotal: Learner Mode

16.1 Lesson: Putting it into practice - isikhnas

Preferences...3 Basic Calculator...5 Math/Graphing Tools...5 Help...6 Run System Check...6 Sign Out...8

POFI 2301 WORD PROCESSING MS WORD 2010 LAB ASSIGNMENT WORKSHEET Office Systems Technology Daily Flex Entry

Using NVivo to Organize Literature Reviews J.J. Roth April 20, Goals of Literature Reviews

Parent s Guide to the Student/Parent Portal

Shockwheat. Statistics 1, Activity 1

Moodle MyFeedback update April 2017

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

TK20 FOR STUDENT TEACHERS CONTENTS

The Moodle and joule 2 Teacher Toolkit

TIPS PORTAL TRAINING DOCUMENTATION

LESSON PLANS: AUSTRALIA Year 6: Patterns and Algebra Patterns 50 MINS 10 MINS. Introduction to Lesson. powered by

Appendix L: Online Testing Highlights and Script

ACCESSING STUDENT ACCESS CENTER

SkillPort Quick Start Guide 7.0

Creating a Test in Eduphoria! Aware

EdX Learner s Guide. Release

Naviance Family Connection

Millersville University Degree Works Training User Guide

CHANCERY SMS 5.0 STUDENT SCHEDULING

Measures of the Location of the Data

Quick Start Guide 7.0

FACULTY Tk20 TUTORIALS: PORTFOLIOS & FIELD EXPERIENCE BINDERS

2 User Guide of Blackboard Mobile Learn for CityU Students (Android) How to download / install Bb Mobile Learn? Downloaded from Google Play Store

INTERNAL MEDICINE IN-TRAINING EXAMINATION (IM-ITE SM )

Connecting Middle Grades Science and Mathematics with TI-Nspire and TI-Nspire Navigator Day 1

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

EMPOWER Self-Service Portal Student User Manual

Algebra 2- Semester 2 Review

Test Administrator User Guide

Home Access Center. Connecting Parents to Fulton County Schools

AP Statistics Summer Assignment 17-18

Welcome to California Colleges, Platform Exploration (6.1) Goal: Students will familiarize themselves with the CaliforniaColleges.edu platform.

Updated: 7/17/12. User Manual v. 2

INTRODUCTION TO GENERAL PSYCHOLOGY (PSYC 1101) ONLINE SYLLABUS. Instructor: April Babb Crisp, M.S., LPC

Workshop Guide Tutorials and Sample Activities. Dynamic Dataa Software

Completing the Pre-Assessment Activity for TSI Testing (designed by Maria Martinez- CARE Coordinator)

Starting an Interim SBA

Star Math Pretest Instructions

Getting Started Guide

The Heart of Philosophy, Jacob Needleman, ISBN#: LTCC Bookstore:

Transcription:

Lesson 1: Import Datasets, Basic Statistics, Descriptive Statistics, and Statistics by Category/Group Welcome to the very first lesson for Azure ML Studio. Since Microsoft just announce the product in July of 2014, the usage of ML Studio is still quite a mystery to many analytics and business intelligence professionals. Therefore, Neal Analytics will present hands-on tutorials on this new product on a regular basis so that everyone can take advantage of all the benefits of what it can offer. Let s start easy. This lesson will illustrate: How to import datasets How to quickly obtain basic statistical information for the dataset How to obtain additional descriptive statistical information for the dataset How to do the same for each category or group in the dataset In this lesson, a dataset called chickwts from R is downloaded and saved as a csv file. Chickwts contains 71 rows with two columns named Weight and Feed. Weight is numeric while Feed involves six separate categories: horsebean, linseed, soybean, sunflower, meatmeal, and casein. Import Dataset First, let s import the data. 1. Open the internet browser of choice 2. Enter the URL: http://studio.azureml.net 3. Enter personal Log-In information

4. Once logged-in, this ML Studio should look like this 5. Click on New 6. Click on Dataset

7. Click on From Local File 8. Either enter the directory along with file name or click on Browse to locate chickwts.csv Note: Since chickwts already exist in my computer, a green check mark appears next to the Existing dataset box.

9. Since the original dataset includes a header, select Generic CSV File with a header (.csv) under Select a type for the new dataset 10. Click on check mark 11. This should return to the Home page

12. To check whether chickwts.csv has been properly imported, either: a. Click on Experiment if the program contains existing experiments i. Click on any experiment on the list

b. Or if no experiments exist and for the sake of this lesson, start a new experiment by clicking on New i. Click on Experiment

ii. The resulting screen should look like this: iii. Click on the title at the top to rename the experiment

13. Whether an existing or a new experiment is used, Click on Saved Datasets 14. chickwts.csv should be included in the dropped down list

Basic Statistics Basic Statistics include: Mean Median Min Max Standard Deviation Additional information include Unique Values Missing Values Feature Type 1. Click, hold, and drag chickwts.csv into the workspace

2. Right click on the tiny circle at the bottom of the module

3. Click on Visualize 4. Just like that, the Basic Statistics information should be listed in table form like this

Descriptive Statistics In some cases, more statistical information is desired and a special module will provide it. The information offered by this module includes: Count (Number of Values) Unique Value Count (Number of Unique Values) Missing Value Count (Number of Missing Values) Min Max Mean Mean Deviation 1 st Quartile Median 3 rd Quartile Mode Range Sample Variance Sample Standard Deviation Sample Skewness Sample Kurtosis P0.5 (0.5% Percentile) P1 (1% Percentile) P5 (5% Percentile) P95 (95% Percentile) P99 (99% Percentile) P99.5 (99.5% Percentile) 1. Continuing from where Basic Statistics left off, there are two ways to locate the Descriptive Statistics module. a. Click on Statistical Functions b. Type Descriptive Statistics in the search bar above

2. Click, hold, and drag the Descriptive Statistics module into the workspace 3. Connect the two modules by clicking and dragging the connection arrow from tiny circle at the bottom of the chickwts.csv module to the top tiny circle at the top of the Descriptive Statistics module

4. The resulting chart should look something like this 5. Click on Save (optional but recommended)

6. Click on Run 7. If the simulation ran successfully, a green check mark should appear inside the Descriptive Statistics module

8. Now, right click on the tiny circle at the bottom of the Descriptive Statistics module 9. Click on Visualize

10. The result looks like so. Scroll to the right to look up further numbers Basic and Descriptive Statistics by Category or Group Sometimes, one wishes to obtain statistical information for each category in the dataset. In Excel, this information can be done and displayed like so:

horsebean linseed soybean sunflower meatmeal casein Mean 160.2 218.75 246.4286 328.91667 276.90909 323.5833 Median 151.5 221 248 328 263 342 Min 108 141 158 226 153 216 Max 227 309 329 423 380 404 Std. Dev. 38.625841 52.2357 54.12907 48.836384 64.900623 64.43384 In ML Studio, the dataset must be split in terms of the category before basic statistics can be calculated. A special module named Split can divide the dataset into two, not multiple, based on certain settings in its property. 1. Let s begin with a clear workspace, then click, hold, and drag the chickwts.csv dataset into it 2. Search for the Split module; there are two ways a. Click on Data Transformation

i. Click on Sample and Split b. Or simply type split into the search bar to pull out the Split module

3. Click, hold, and drag the Split module into the workspace 4. Connect chickwts.csv to Split by dragging an arrow from the tiny circle at the bottom of chickwts.csv to the tiny circle at the top of Split

5. The resulting chart should look something like this: 6. Now, click on the Split module to highlight it, then in the Properties window on the right, select Regular Expression for Splitting mode 7. Under Regular expression, type \ feed horsebean

8. Click Save then Run 9. Once a green check mark appears in the Split module, click on the tiny circle at the bottom left of the module. 10. Click on Visualize

11. This should provide statistical results for the category, horsebean 12. Clicking on the tiny circle at the bottom right of the Split module, then Visualize should give the statistics for the rest of the dataset, which may have little meaning at this point.

13. To obtain Basic Statistics for all the categories, continue to drag and drop multiple Split modules into the workspace until: Number of Split Modules = Number of Categories 1 14. Connect the Split modules by dragging an arrow from the tiny circle at the bottom right of each module to the tiny circle at the top of each module like so:

15. For each Split module, click on it to highlight it, then set the following properties: Splitting Mode Regular Expression 2 nd Split Module 3 rd Split Module 4 th Split Module 5 th Split Module Regular Expression Regular Regular Regular Expression Expression Expression \ feed linseed \ feed soybean \ feed sunflower \ feed meatmeal 16. Click Save then Run

17. A green check mark must appear in each Split module 18. Clicking on the tiny circle at the bottom left of each Split module (except the last one) then clicking Visualize will show the Basic Statistics for its corresponding category. For the last Split module, the left tiny circle will show statistics for one category (Meatmeal) and the right for the last category (Casein).

19. To take a step further and obtain Descriptive Statistics, connect each tiny circle at the bottom left of the Split module (the bottom right one as well for the last Split module) to a Descriptive Statistics module so that the end resulting chart likes like this:

20. Click Save then Run 21. Make sure all the modules have green check marks

22. Now, click on each unconnected tiny circle at the bottom of each Descriptive Statistics module will provide additional statistics information for each corresponding category in the chickwts.csv dataset. Hopefully, this lesson has taught the steps needed to obtain simple and basic statistics information for any dataset that involves both numeric and categorical values. In the process of walking through this lesson, a few other consistent techniques (such as connecting modules) have been repeated and should be engrained by now. This is the launching board to more sophisticated yet simple analysis using ML Studio. Until next time.