Introduction to Machine Learning

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Learning From the Past with Experiment Databases

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Universidade do Minho Escola de Engenharia

Lecture 1: Basic Concepts of Machine Learning

MGT/MGP/MGB 261: Investment Analysis

Probabilistic Latent Semantic Analysis

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CSL465/603 - Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Australian Journal of Basic and Applied Sciences

Content-based Image Retrieval Using Image Regions as Query Examples

INTRODUCTION TO DECISION ANALYSIS (Economics ) Prof. Klaus Nehring Spring Syllabus

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Statistics and Data Analytics Minor

Seminar - Organic Computing

STA 225: Introductory Statistics (CT)

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

An Empirical Comparison of Supervised Ensemble Learning Approaches

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Conference Presentation

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Mining Association Rules in Student s Assessment Data

Model Ensemble for Click Prediction in Bing Search Ads

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

The Boosting Approach to Machine Learning An Overview

Radius STEM Readiness TM

Welcome to. ECML/PKDD 2004 Community meeting

Computerized Adaptive Psychological Testing A Personalisation Perspective

Laboratorio di Intelligenza Artificiale e Robotica

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Hierarchical Linear Models I: Introduction ICPSR 2015

Probability and Statistics Curriculum Pacing Guide

Multi-Lingual Text Leveling

Human Emotion Recognition From Speech

TextGraphs: Graph-based algorithms for Natural Language Processing

Data Structures and Algorithms

Reducing Features to Improve Bug Prediction

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

Multi-label classification via multi-target regression on data streams

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

Softprop: Softmax Neural Network Backpropagation Learning

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

Time series prediction

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Switchboard Language Model Improvement with Conversational Data from Gigaword

arxiv: v1 [cs.cy] 8 May 2016

Speech Emotion Recognition Using Support Vector Machine

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

On-the-Fly Customization of Automated Essay Scoring

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Comparison of network inference packages and methods for multiple networks inference

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

UCEAS: User-centred Evaluations of Adaptive Systems

Tools and Techniques for Large-Scale Grading using Web-based Commercial Off-The-Shelf Software

Multi-label Classification via Multi-target Regression on Data Streams

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Computational Data Analysis Techniques In Economics And Finance

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Problems of the Arabic OCR: New Attitudes

Two Futures of Software Testing

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

48 contact hours using STANDARD version of Study & Solutions Kit

Medical Complexity: A Pragmatic Theory

Transcription:

Introduction to Machine Learning Reto Wüest University of Geneva Course Description With ever more data available in electronic form, automated methods of data analysis become increasingly important also in the social sciences. Machine learning refers to a set of methods that can automatically detect patterns in data, or learn from data. The uncovered patterns can then be used by the analyst to make accurate predictions and decisions under uncertainty. This course will introduce participants to the fundamentals of machine learning. Students will leave the course with a thorough understanding of the core issues in machine learning (prediction and inference, supervised and unsupervised learning, overfitting, bias-variance trade-off), knowledge of some of the most widely used machine learning methods, and the ability to apply these methods in their own research. All course materials are available at http://retowuest.github.io/ml2018/. Software The course will use the open-source software R, which is freely available for download at https: //www.r-project.org/. We will interact with R through the user interface RStudio, which can be downloaded at https://www.rstudio.com/products/rstudio/download/. Prerequisites Participants are expected to have a solid understanding of linear and binary regression models. The course will also assume at least a basic familiarity with the R statistical programming language. 1

Schedule Session 1: Introduction to Machine Learning (March 6, 2018, 13:00-17:00) The first session will provide an introduction to machine learning. We will discuss the goals of machine learning (prediction, inference, or both), the difference between supervised and unsupervised machine learning, the problem of overfitting, and the bias-variance trade-off. We will then get to know the first class of important supervised learning methods, namely shrinkage methods (Ridge regression and the Lasso). Time Topic 13:00-13:30 Introductions and course overview 13:30-14:00 General introduction to machine learning (prediction and inference, supervised and unsupervised learning) 14:00-14:45 Assessing model accuracy (overfitting, bias-variance trade-off, cross-validation) 15:15-15:45 Shrinkage methods I: Ridge regression 15:45-16:15 Shrinkage methods II: The Lasso 16:15-17:00 Application of Ridge regression and the Lasso James et al., An Introduction to Statistical Learning, Ch. 2 and 6 Hastie et al., The Elements of Statistical Learning, Ch. 3 and 7 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 5, 13 Bishop, Pattern Recognition and Machine Learning, Ch. 12 2

Session 2: Classification and Regression Trees (CART) (March 27, 2018, 13:00-17:00) The second session will deal with tree-based methods, which are another important and highly flexible class of supervised learning methods. After an introduction to the basics of decision trees and a general discussion of the advantages and disadvantages of tree-based models, we will look at three specific widely-used tree-based methods: bagging, random forests, and boosting. Time Topic 13:00-13:30 Introduction to classification and regression trees 13:30-14:00 Advantages and disadvantages of trees 14:00-14:45 Bagging, random forests 15:15-16:00 Boosting 16:00-16:30 Application I: Random forests 16:30-17:00 Application II: Boosting James et al., An Introduction to Statistical Learning, Ch. 8 Hastie et al., The Elements of Statistical Learning, Ch. 9, 10, and 15 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 18 Lantz, Machine Learning with R, Ch. 11 3

Session 3: Unsupervised Learning (April 17, 2018, 13:00-17:00) In the third session, we will move to unsupervised machine learning methods. We will cover two important unsupervised learning techniques: principal components analysis (PCA) and clustering analysis (K-means clustering and hierarchical clustering). Time Topic 13:00-13:30 Introduction to unsupervised learning 13:30-14:15 Principal components analysis (PCA) 14:15-14:45 K-means clustering 15:15-16:00 Hierarchical clustering 16:00-16:30 Application I: PCA 16:30-17:00 Application II: Clustering methods James et al., An Introduction to Statistical Learning, Ch. 10 Hastie et al., The Elements of Statistical Learning, Ch. 14 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 22, 23 Bishop, Pattern Recognition and Machine Learning, Ch. 12 Barber, Bayesian Reasoning and Machine Learning, Ch. 15 Lantz, Machine Learning with R, Ch. 9 4

References Barber, David. 2016. Bayesian Reasoning and Machine Learning. New York: Cambridge University Press. Available for free as a PDF. URL: http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. New York: Springer. Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer. Available for free as a PDF. URL: https://web.stanford.edu/ hastie/elemstatlearn/ James, Gareth, Daniela Witten, Trevor Hastie and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. New York: Springer. Available for free as a PDF. URL: http://www-bcf.usc.edu/ gareth/isl/ Lantz, Brett. 2015. Machine Learning with R. 2nd ed. Birmingham: Packt Publishing. Shalev-Shwartz, Shai and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. New York: Cambridge University Press. Available for free as a PDF. URL: http://www.cs.huji.ac.il/ shais/understandingmachinelearning/ 5