An Introduction to Data Science

Similar documents
(Sub)Gradient Descent

SSE - Supervision of Electrical Systems

Python Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Time series prediction

Lecture 1: Machine Learning Basics

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Assignment 1: Predicting Amazon Review Ratings

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

MASTER OF PHILOSOPHY IN STATISTICS

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

CSL465/603 - Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

A. What is research? B. Types of research

Multivariate k-nearest Neighbor Regression for Time Series data -

Statistics and Data Analytics Minor

Rule Learning With Negation: Issues Regarding Effectiveness

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Welcome to. ECML/PKDD 2004 Community meeting

COMS 622 Course Syllabus. Note:

12- A whirlwind tour of statistics

CS Machine Learning

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Development of Multistage Tests based on Teacher Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

Universidade do Minho Escola de Engenharia

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

Introduction to Forensic Drug Chemistry

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Detailed course syllabus

SARDNET: A Self-Organizing Feature Map for Sequences

Reducing Features to Improve Bug Prediction

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Computerized Adaptive Psychological Testing A Personalisation Perspective

CS 101 Computer Science I Fall Instructor Muller. Syllabus

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Hierarchical Linear Models I: Introduction ICPSR 2015

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Axiom 2013 Team Description Paper

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Content-based Image Retrieval Using Image Regions as Query Examples

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lecture 1: Basic Concepts of Machine Learning

Learning From the Past with Experiment Databases

Speech Emotion Recognition Using Support Vector Machine

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Evaluation of Teach For America:

Course Syllabus Chem 482: Chemistry Seminar

Australian Journal of Basic and Applied Sciences

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

Data Fusion Through Statistical Matching

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Evolution of Symbolisation in Chimpanzees and Neural Nets

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

A Case-Based Approach To Imitation Learning in Robotic Agents

Semi-Supervised Face Detection

How Organizational Cybernetics Can Help to Organize Debates on Complex Issues

Answer Key Applied Calculus 4

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Issues in the Mining of Heart Failure Datasets

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Student attrition at a new generation university

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Computer Science 1015F ~ 2016 ~ Notes to Students

Math Placement at Paci c Lutheran University

Business Ethics Philosophy 305 California State University, Northridge Fall 2011

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

A Note on Structuring Employability Skills for Accounting Students

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Office Hours: Mon & Fri 10:00-12:00. Course Description

Communication and Cybernetics 17

Switchboard Language Model Improvement with Conversational Data from Gigaword

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Course Syllabus for Math

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Ergonomics of translation: methodological, practical and educational implications

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Guide to Teaching Computer Science

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Seminar - Organic Computing

MGMT 5303 Corporate and Business Strategy Spring 2016

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

ATW 202. Business Research Methods

Transcription:

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2018 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research BACHELOR'S DEGREE IN INDUSTRIAL TECHNOLOGY ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN MATERIALS ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN CHEMICAL ENGINEERING (Syllabus 2010). (Teaching unit Optional) 4,5 Teaching languages: English Teaching staff Coordinator: Others: JOSEP GINEBRA JOSEP GINEBRA Opening hours Timetable: Monday and wednesday from 11:30 to 13:30 Prior skills To have passed Estadística. Degree competences to which the subject contributes Specific: 1. Basic knowledge on the use and programming of computers, operative systems, data bases and computer software with an engineering application. 2. Knowledge and capacities to organise and manage projects. Knowing the organisational structure and functions of a project office. Transversal: 3. EFFICIENT ORAL AND WRITTEN COMMUNICATION. Communicating verbally and in writing about learning outcomes, thought-building and decision-making. Taking part in debates about issues related to the own field of specialization. 4. TEAMWORK. Being able to work as a team player, either as a member or as a leader. Contributing to projects pragmatically and responsibly, by reaching commitments in accordance to the resources that are available. Teaching methodology All classes will be taught in a computer lab. The data analysis will be done with MINITAB and with R. Every week there will be small data analysis assignements to be done at home. Students will have to do a final project. Learning objectives of the subject At the end of the course the student should be able to identify situations where it is useful to analize data, to identify the model and/or method of analysis that is best for his data, to build a model that summarizes the information in the data and allows to make predictions, to reduce the dimensionality and visualize multivariate data, to implement supervised and unsupervised classification algorithms, and to evaluate the quality of the results obtained. 1 / 5

Study load Total learning time: 112h 30m Hours large group: Hours medium group: 45h 4 Hours small group: Guided activities: Self study: 67h 30m 6 2 / 5

Content Chapter 1: Introduction Learning time: 3h 30m Guided activities: 1h Self study : 1h (ENG) 1.- Problems: Association, prediction and classification. 2.- Tools: Statistical models and multivariate analysis. Chapter 2: Linear models for continous response Learning time: 3 Theory classes: 6h Laboratory classes: 6h Guided activities: 6h Self study : 12h (ENG) 1.- Normal linear model. 3.- Model fit; least squares and robust regression. 3.- ANOVA table and goodness of fit measures. 4.- Inference on the model parameters. 5.- Prediction. 6.- Model checking. 7.- Model selection. 8.- Cross validation and lack of fit tests. 9.- Model interpretation; Bias, colinearity and causality. 10.- Use of categorical explanatory variables. 11.- Comparison of means. 12.- Analysis of two-level factorial designs. Chapter 3: Non-linear models for a continuous response Learning time: 6h Self study : 3h 1.- Normal non-linear model. 2.- Model fit. 3.- Inference. 4.- Model checking. Chapter 4: Categorical and discrete response models Learning time: 22h 30m Theory classes: 4h 30m Laboratory classes: 4h 30m Guided activities: 4h 30m Self study : 9h (ENG) 1 Generalized linear model. 2. - Count response models. 3.- Binary response models. 4.- Model fit. 5.- Inference. 6.- Model checking. 7.- Prediction. 8.- Model interpretation. 9.- Contingency tables and models for a polytomous response. 3 / 5

Chapter 5: Time series models Learning time: 13h Theory classes: 3h Laboratory classes: 3h Guided activities: 3h Self study : 4h 1.- Description of a time series. 2.- AR models. 3.- MA models. 4.- ARIMA models. 5.- Seasonal ARIMA models. Chapter 6: Visualization of multivariate data (Dimensionality reduction) Learning time: 6h Self study : 3h (ENG) 1.- Principal components analysis. 2.- Correspondence analysis. Chapter 7: Cluster analysis (Unsupervised classification) Learning time: 6h 30m Guided activities: 1h 30m Self study : 2h 1.- Hierarchical methods. 2.- Partition methods (k-means algorithm). 3.- Variable cluster analysis. Chapter 8: Discriminant analysis (Supervised classification) Learning time: 8h 30m Guided activities: 1h 30m Self study : 4h 1.- Linear discriminant. 2.- Quadratic discriminant. 3.- Logistic discriminant. 4 / 5

Chapter 9: Non-parametric regression and classification models Learning time: 4h 30m Self study : 1h 30m 1.- Local smoothers. 2.- Nearest neighbors. 3.- Additive models. 4.- Classification and regression trees. 5.- Neural networks. Qualification system There will be a take home midterm exam and an in class final exam. Grade = 0,1 Assignments + 0,3 Final Project + 0,1 Midterm + 0,5 Final Exam Bibliography Basic: Hastie, Trevor; Tibshirani, Robert; Friedman, J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. NewYork: Springer Verlag, 2011. ISBN 9780387848570. Peña, Daniel. Regresión y diseño de experimentos. Madrid: Alianza, 2002. ISBN 9788420693897. Venables, William N; Ripley, B.D. Modern Applied Statistics with S. 4th ed. New York: Springer Verlag, 2003. ISBN 0387954570. Peña, Daniel. Análisis de datos multivariantes. Madrid: McGrawHill, 2008. ISBN 9788448136109. Weisberg, Sanford. Applied Linear Regression. 3rd ed. New York: Wiley, 2005. ISBN 0471663794. Everitt, B.S.; Dunn, G. Applied Multivariate Data Analysis [on line]. 2nd. New York: Wiley, 2010 [Consultation: 27/06/2014]. Available on: <http://onlinelibrary.wiley.com/book/10.1002/9781118887486>. ISBN 9781118887486. Greenacre, Michael J. Correspondence Analysis in Practice. 2nd Ed. Boca Raton: Chapman and Hall, 2007. ISBN 9781584886167. James, Gareth [et al.]. An Introduction to statistical learning : with applications in R. New York: Springer Verlag, 2013. ISBN 9781461471370. Gareth, James [et al.]. An Introduction to statistical learning : with applications in R. New York: Springer Verlag, 2013. ISBN 9781461471370. Complementary: Clarke, Bertrand; Fokoue, Ernest; Zhang, Hao Helen. Principles and Theory for Data Mining and Machine Learning. Berlin: Springer Verlag, 2009. ISBN 9780387981345. Dobson, Annette J. An Introduction to Generalized Linear Models. 3rd ed. Boca Raton: Chapman Hall, 2008. ISBN 9781584889502. Johnson, Richard; Wichern, Dean. Applied multivariate statistical analysis. 6th ed. Englewood Cliffs, N.J: Pearson, 2007. ISBN 9780131877153. Peña, Daniel. Análisis de series temporales. Madrid: Alianza, 2005. ISBN 8420691283. Wakefield, Jon. Bayesian and frequentist regression methods. New York: Springer Verlag, 2013. ISBN 9781441909244. 5 / 5