Machine Learning for Social Sciences

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

(Sub)Gradient Descent

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

MGT/MGP/MGB 261: Investment Analysis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Universidade do Minho Escola de Engenharia

Lecture 1: Basic Concepts of Machine Learning

Probabilistic Latent Semantic Analysis

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

STA 225: Introductory Statistics (CT)

Statistics and Data Analytics Minor

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CSL465/603 - Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Australian Journal of Basic and Applied Sciences

Content-based Image Retrieval Using Image Regions as Query Examples

Computational Data Analysis Techniques In Economics And Finance

INTRODUCTION TO DECISION ANALYSIS (Economics ) Prof. Klaus Nehring Spring Syllabus

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Seminar - Organic Computing

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

An Empirical Comparison of Supervised Ensemble Learning Approaches

Probability and Statistics Curriculum Pacing Guide

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Conference Presentation

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Mining Association Rules in Student s Assessment Data

Model Ensemble for Click Prediction in Bing Search Ads

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

The Boosting Approach to Machine Learning An Overview

EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS

Radius STEM Readiness TM

Welcome to. ECML/PKDD 2004 Community meeting

Computerized Adaptive Psychological Testing A Personalisation Perspective

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Laboratorio di Intelligenza Artificiale e Robotica

Hierarchical Linear Models I: Introduction ICPSR 2015

On-the-Fly Customization of Automated Essay Scoring

UCEAS: User-centred Evaluations of Adaptive Systems

Multi-Lingual Text Leveling

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Human Emotion Recognition From Speech

TextGraphs: Graph-based algorithms for Natural Language Processing

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Data Structures and Algorithms

Reducing Features to Improve Bug Prediction

Medical Complexity: A Pragmatic Theory

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Ergonomics of translation: methodological, practical and educational implications

Two Futures of Software Testing

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Active Learning. Yingyu Liang Computer Sciences 760 Fall

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

Multi-label classification via multi-target regression on data streams

Content-free collaborative learning modeling using data mining

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

Softprop: Softmax Neural Network Backpropagation Learning

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Section 2 Command Economies Study Guide Answers

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Unit 7 Data analysis and design

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

Time series prediction

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

AQUA: An Ontology-Driven Question Answering System

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Switchboard Language Model Improvement with Conversational Data from Gigaword

arxiv: v1 [cs.cy] 8 May 2016

Speech Emotion Recognition Using Support Vector Machine

Snow Falling On Cedars By David Guterson

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Transcription:

Machine Learning for Social Sciences Reto Wüest University of Geneva Instructor Biography Reto Wüest is a postdoctoral researcher in the Department of Political Science and International Relations at the University of Geneva. After pre-doctoral fellowships at Princeton University and New York University, he received his PhD from the University of Geneva in 2016. He studies legislative behavior and political representation using quantitative methods and machine learning techniques. His research has been published in West European Politics and Swiss Political Science Review. Course Description With ever more data available in electronic form, automated methods of data analysis become increasingly important also in the social sciences. Machine learning refers to a set of methods that can automatically detect patterns in data, or learn from data. The uncovered patterns can then be used by the analyst to make accurate predictions and decisions under uncertainty. This course will introduce participants to the fundamentals of machine learning. Students will leave the course with a thorough understanding of the core issues in machine learning (prediction and inference, supervised and unsupervised learning, overfitting, bias-variance trade-off), knowledge of some of the most widely used machine learning methods, and the ability to apply these methods in their own research. Software The course will use the open-source software R, which is freely available for download at https: //www.rproject.org/. We will interact with R through the user interface RStudio, which can be downloaded at https://www.rstudio.com/products/rstudio/download/. Prerequisites Participants are expected to have a solid understanding of linear and binary regression models. The course will also assume at least a basic familiarity with the R statistical programming language. 1

Schedule Session 1: Introduction to Machine Learning (July 2, 2018, 09:00-13:00) The first session will provide an introduction to machine learning. We will discuss the goals of machine learning (prediction, inference, or both), the difference between supervised and unsupervised machine learning, the problem of overfitting, and the bias-variance trade-off. We will then get to know the first class of important supervised learning methods, namely shrinkage methods (Ridge regression and the Lasso). Class Schedule Time Topic 09:00-09:30 Introductions and course overview 09:30-10:00 General introduction to machine learning (prediction and inference, supervised and unsupervised learning) 10:00-10:45 Assessing model accuracy (overfitting, bias-variance trade-off, cross-validation) 11:15-11:45 Shrinkage methods I: Ridge regression 11:45-12:15 Shrinkage methods II: The Lasso 12:15-13:00 Application of Ridge regression and the Lasso Main Readings James et al., An Introduction to Statistical Learning, Ch. 2 and 6 Hastie et al., The Elements of Statistical Learning, Ch. 3 and 7 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 5, 13 Bishop, Pattern Recognition and Machine Learning, Ch. 12 Session 2: Classification and Regression Trees (CART) (July 3, 2018, 09:00-13:00) The second session will deal with tree-based methods, which are another important and highly flexible class of supervised learning methods. After an introduction to the basics of decision trees and a general discussion of the advantages and disadvantages of tree-based models, we will look at three specific widely-used tree-based methods: bagging, random forests, and boosting. 2

Class Schedule Time Topic 09:00-09:30 Introduction to classification and regression trees 09:30-10:00 Advantages and disadvantages of trees 10:00-10:45 Bagging, random forests 11:15-12:00 Boosting 12:00-12:30 Application I: Random forests 12:30-13:00 Application II: Boosting Main Readings James et al., An Introduction to Statistical Learning, Ch. 8 Hastie et al., The Elements of Statistical Learning, Ch. 9, 10, and 15 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 18 Lantz, Machine Learning with R, Ch. 11 Session 3: Unsupervised Learning (July 4, 2018, 09:00-13:00) In the third session, we will move to unsupervised machine learning methods. We will cover two important unsupervised learning techniques: principal components analysis (PCA) and clustering analysis (K-means clustering and hierarchical clustering). Class Schedule Time Topic 09:00-09:30 Introduction to unsupervised learning 09:30-10:15 Principal components analysis (PCA) 10:15-10:45 K-means clustering 11:15-12:00 Hierarchical clustering 12:00-12:30 Application I: PCA 12:30-13:00 Application II: Clustering methods 3

Main Readings James et al., An Introduction to Statistical Learning, Ch. 10 Hastie et al., The Elements of Statistical Learning, Ch. 14 Shalev-Shwartz and Ben-David, Understanding Machine Learning, Ch. 22, 23 Bishop, Pattern Recognition and Machine Learning, Ch. 12 Barber, Bayesian Reasoning and Machine Learning, Ch. 15 Lantz, Machine Learning with R, Ch. 9 4

References Barber, David. 2016. Bayesian Reasoning and Machine Learning. New York: Cambridge University Press. Available for free as a PDF. URL: http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. New York: Springer. Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer. Available for free as a PDF. URL: https://web.stanford.edu/ hastie/elemstatlearn/ James, Gareth, Daniela Witten, Trevor Hastie and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. New York: Springer. Available for free as a PDF. URL: http://www-bcf.usc.edu/ gareth/isl/ Lantz, Brett. 2015. Machine Learning with R. 2nd ed. Birmingham: Packt Publishing. Shalev-Shwartz, Shai and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. New York: Cambridge University Press. Available for free as a PDF. URL: http://www.cs.huji.ac.il/ shais/understandingmachinelearning/ 5