Laboratory of Machine Learning with Python

Similar documents
CS Machine Learning

Python Machine Learning

SECTION 12 E-Learning (CBT) Delivery Module

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Houghton Mifflin Online Assessment System Walkthrough Guide

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

2 User Guide of Blackboard Mobile Learn for CityU Students (Android) How to download / install Bb Mobile Learn? Downloaded from Google Play Store

Modeling function word errors in DNN-HMM based LVCSR systems

Preferences...3 Basic Calculator...5 Math/Graphing Tools...5 Help...6 Run System Check...6 Sign Out...8

Outreach Connect User Manual

Rule Learning With Negation: Issues Regarding Effectiveness

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Modeling function word errors in DNN-HMM based LVCSR systems

Using SAM Central With iread

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Preparing for the School Census Autumn 2017 Return preparation guide. English Primary, Nursery and Special Phase Schools Applicable to 7.

Dialogue Live Clientside

Science Olympiad Competition Model This! Event Guidelines

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

CS 446: Machine Learning

Getting Started Guide

Physics 270: Experimental Physics

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Generating Test Cases From Use Cases

Lecture 1: Machine Learning Basics

Assignment 1: Predicting Amazon Review Ratings

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Rule Learning with Negation: Issues Regarding Effectiveness

Introduction to WeBWorK for Students

DO NOT DISCARD: TEACHER MANUAL

EdX Learner s Guide. Release

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

Creating a Test in Eduphoria! Aware

Using dialogue context to improve parsing performance in dialogue systems

(Sub)Gradient Descent

Office of Planning and Budgets. Provost Market for Fiscal Year Resource Guide

Linking Task: Identifying authors and book titles in verbose queries

Software Development Plan

ACCESSING STUDENT ACCESS CENTER

The stages of event extraction

Schoology Getting Started Guide for Teachers

Emporia State University Degree Works Training User Guide Advisor

Excel Intermediate

How to set up gradebook categories in Moodle 2.

Your School and You. Guide for Administrators

Learning From the Past with Experiment Databases

Municipal Accounting Systems, Inc. Wen-GAGE Gradebook FAQs

Donnelly Course Evaluation Process

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

Introduction to Causal Inference. Problem Set 1. Required Problems

Once your credentials are accepted, you should get a pop-window (make sure that your browser is set to allow popups) that looks like this:

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

DegreeWorks Advisor Reference Guide

Appendix L: Online Testing Highlights and Script

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Reducing Features to Improve Bug Prediction

Managing the Student View of the Grade Center

myperspectives 2017 Click Path to Success myperspectives 2017 Virtual Activation Click Path

ecampus Basics Overview

Australian Journal of Basic and Applied Sciences

Using MAP-IT to Assess for Healthy People 2020

Automating Outcome Based Assessment

TRAINEESHIP TOOL MANUAL V2.1 VERSION April 1st 2017 * HOWEST.BE

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Word Segmentation of Off-line Handwritten Documents

Applications of data mining algorithms to analysis of medical data

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Beginning Blackboard. Getting Started. The Control Panel. 1. Accessing Blackboard:

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

FIS Learning Management System Activities

Test Administrator User Guide

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

MOODLE 2.0 GLOSSARY TUTORIALS

POWERTEACHER GRADEBOOK

STUDENT MOODLE ORIENTATION

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

TK20 FOR STUDENT TEACHERS CONTENTS

We re Listening Results Dashboard How To Guide

Corrective Feedback and Persistent Learning for Information Extraction

Model Ensemble for Click Prediction in Bing Search Ads

CS177 Python Programming

Updated: 7/17/12. User Manual v. 2

Creating an Online Test. **This document was revised for the use of Plano ISD teachers and staff.

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

INTERMEDIATE ALGEBRA Course Syllabus

Completing the Pre-Assessment Activity for TSI Testing (designed by Maria Martinez- CARE Coordinator)

An Introductory Blackboard (elearn) Guide For Parents

Probabilistic Latent Semantic Analysis

IVY TECH COMMUNITY COLLEGE

Tools and Techniques for Large-Scale Grading using Web-based Commercial Off-The-Shelf Software

Examity - Adding Examity to your Moodle Course

Millersville University Degree Works Training User Guide

Netsmart Sandbox Tour Guide Script

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS Course Missive

Calibration of Confidence Measures in Speech Recognition

Transcription:

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Paolo Dragone University of Trento

Machine Learning with Python http://lion0b.disi.unitn.it:9999 (Only available within the DISI network) Password: ml-lab2 1

Setup (on your own machine) Make sure you are using Python 3 for the following steps. Install Numpy, Scipy, Matplotlib, Scikit-learn and Jupyter: >> pip install numpy scipy matplotlib sklearn jupyter Download and extract the material for the Scikit-learn lab: http://disi.unitn.it/ passerini/teaching/2017-2018/machinelearning/ 2

Setup: Jupyther notebook Open the terminal in the folder containing the extracted archive and run: >> jupyter notebook Open the browser at the given address and you ll see something like: Open the sklearn-lab.ipynb file containing the lecture notebook. 3

Setup: Jupyther notebook Execute commands by selecting a cell and clicking the Run button on the header of the page or by Shift+Enter. You will see the output of the command just below the cell. You can tweak and modify the code as you wish and execute it again. 4

Assignment For the second Machine Learning assignment you will solve a classification task using Scikit-learn over some given dataset. Each available dataset is already split into training and test sets. You have access to the labels of the training examples but the labels of the test set are hidden. Your task is to choose a dataset, train a classifier on the training set and predict the labels on the test set. To pass the assignment, your classifier has to classify the examples in the test set with higher accuracy than the reference baseline for the chosen dataset. Additionally, you need to test your algorithm via cross-validation over the training set and produce a report containing the results obtained. 5

Assignment Datasets OCR Optical Character Recognition Spambase Spam email classification Presidential campaign tweets Classification of tweets from D. Trump and H. Clinton 6

Assignment Material Download the assignment material: http://disi.unitn.it/ passerini/teaching/2017-2018/machinelearning/ The material contains: The three datasets, each one containing: The training set examples; The training set labels; The test set examples; A README containing info about the dataset. this file also contains the reference baseline accuracy; Other info files. A helper script; 7

Assignment Helper The helper script can be used to test your predictions. Given a file containing the predicted labels, the helper script sends the labels to our server and receives the prediction accuracy. You can use it in this way: >>./helper.py your.email dataset test-labels.txt The first parameter is your unitn email, the second parameter is the dataset label (one among ocr, spambase and tweets ), the third parameter is the path to the file containing the predicted labels. This file should contain one label per line in the same order of the file containing the examples. The labels should be in the same format of the labels in the training set. The helper also prints the current best accuracy achieved by any of you on that dataset, just to put a bit of healthy competition! :) 8

Assignment Step-by-step 1. Choose a dataset; 2. Experiment with a classification algorithm of your choosing; 3. Test your classifier using cross-validation over the training set; 4. Write a report describing the learning algorithm used and discussing the results obtained; The report should contain at least: The average precision, recall, and F 1 over the folds. Using cross val score you can specify precision, recall and f1 for the scoring parameter. For the OCR dataset, in which you do multiclass classification, use weighted averaging, i.e. using precision weighted, recall weighted and f1 weighted ; The plot of the learning curve, as shown in the lecture; 5. Train your classifier over the full training set; 6. Use the classifier to predict the examples in the test set; 7. Place the labels in a file, in the same order as you read the test examples and in the same format of the labels in the training set. 9

Assignment Submit After completing the assignment submit it via email Send an email to paolo.dragone@unitn.it (cc: passerini@disi.unitn.it) Subject: sklearnsubmit2017 Attachment: id name surname.zip containing: NOTE The text file containing the final predictions; The code used to produce the predictions, the results and the plots; The report in PDF format. No group work This assignment is mandatory in order to enroll to the oral exam 10