Data Analysis for Business and Industry

Similar documents
SSE - Supervision of Electrical Systems

(Sub)Gradient Descent

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Python Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

STA 225: Introductory Statistics (CT)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probability and Statistics Curriculum Pacing Guide

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Lecture 1: Machine Learning Basics

Assignment 1: Predicting Amazon Review Ratings

MASTER OF PHILOSOPHY IN STATISTICS

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Time series prediction

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

A. What is research? B. Types of research

Multivariate k-nearest Neighbor Regression for Time Series data -

Statistics and Data Analytics Minor

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CSL465/603 - Machine Learning

COMS 622 Course Syllabus. Note:

12- A whirlwind tour of statistics

Rule Learning With Negation: Issues Regarding Effectiveness

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Development of Multistage Tests based on Teacher Ratings

Welcome to. ECML/PKDD 2004 Community meeting

Universidade do Minho Escola de Engenharia

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Introduction to Forensic Drug Chemistry

Detailed course syllabus

Reducing Features to Improve Bug Prediction

SARDNET: A Self-Organizing Feature Map for Sequences

Computerized Adaptive Psychological Testing A Personalisation Perspective

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Axiom 2013 Team Description Paper

CS Machine Learning

Content-based Image Retrieval Using Image Regions as Query Examples

Rule Learning with Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Speech Emotion Recognition Using Support Vector Machine

Lecture 1: Basic Concepts of Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Evaluation of Teach For America:

Course Syllabus Chem 482: Chemistry Seminar

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Data Fusion Through Statistical Matching

Hierarchical Linear Models I: Introduction ICPSR 2015

Evolution of Symbolisation in Chimpanzees and Neural Nets

OFFICE SUPPORT SPECIALIST Technical Diploma

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

A Case-Based Approach To Imitation Learning in Robotic Agents

Answer Key Applied Calculus 4

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Student attrition at a new generation university

Australian Journal of Basic and Applied Sciences

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Computer Science 1015F ~ 2016 ~ Notes to Students

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

A Note on Structuring Employability Skills for Accounting Students

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Introduction to Financial Accounting

Communication and Cybernetics 17

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Switchboard Language Model Improvement with Conversational Data from Gigaword

Course Syllabus for Math

Evolutive Neural Net Fuzzy Filtering: Basic Description

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Ergonomics of translation: methodological, practical and educational implications

Semi-Supervised Face Detection

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

How Organizational Cybernetics Can Help to Organize Debates on Complex Issues

Guide to Teaching Computer Science

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Course Description. Student Learning Outcomes

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Issues in the Mining of Heart Failure Datasets

MGMT 5303 Corporate and Business Strategy Spring 2016

ATW 202. Business Research Methods

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

Transcription:

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2017 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research BACHELOR'S DEGREE IN INDUSTRIAL TECHNOLOGY ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN MATERIALS ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN CHEMICAL ENGINEERING (Syllabus 2010). (Teaching unit Optional) 4,5 Teaching languages: Catalan Teaching staff Coordinator: Others: JOSEP GINEBRA JOSEP GINEBRA Opening hours Timetable: Monday and wednesday from 11:30 to 13:30 Prior skills To have passed Estadística. Degree competences to which the subject contributes Specific: 1. Basic knowledge on the use and programming of computers, operative systems, data bases and computer software with an engineering application. 2. Knowledge and capacities to organise and manage projects. Knowing the organisational structure and functions of a project office. Transversal: 3. EFFICIENT ORAL AND WRITTEN COMMUNICATION. Communicating verbally and in writing about learning outcomes, thought-building and decision-making. Taking part in debates about issues related to the own field of specialization. 4. TEAMWORK. Being able to work as a team player, either as a member or as a leader. Contributing to projects pragmatically and responsibly, by reaching commitments in accordance to the resources that are available. Teaching methodology One half of the classes will be expository in nature, and the other half will be done in a computer lab. We will use MINITAB. At the end of the sessions in the computer lab students will be given assignments that will have to be handed in the next computer lab session and will be graded. On the second half of the semester there will be a final project. Learning objectives of the subject At the end of the course the student should be able to identify situations where it is useful to analize data, to identify the model and/or method of analysis that is best for his data, to build a model that summarizes the information in the data and allows to make predictions, to reduce the dimensionality and visualize multivariate data, to implement supervised and unsupervised classification algorithms, and to evaluate the quality of the results obtained. 1 / 5

Study load Total learning time: 112h 30m Hours large group: Hours medium group: 45h 4 Hours small group: Guided activities: Self study: 67h 30m 6 2 / 5

Content Chapter 1: Introduction Learning time: 3h 30m Guided activities: 1h Self study : 1h (ENG) 1.- Problems: Association, prediction and classification. 2.- Tools: Statistical models and multivariate analysis. Chapter 2: Linear models for continous response Learning time: 45h Theory classes: 9h Laboratory classes: 9h Guided activities: 9h Self study : 18h (ENG) 1.- Normal linear model. 3.- Model fit; least squares and robust regression. 3.- ANOVA table and goodness of fit measures. 4.- Inference on the model parameters. 5.- Prediction. 6.- Model checking. 7.- Model selection. 8.- Lack of fit tests. 9.- Model interpretation; Bias, colinearity and causality. 10.- Use of categorical explanatory variables. 11.- Comparison of means. 12.- Analysis of two-level factorial designs. 13- Cross validation and PRESS. Chapter 3: Non-linear models for a continuous response Learning time: 6h Self study : 3h 1.- Normal non-linear model. 2.- Model fit. 3.- Inference. 4.- Model checking. Chapter 4: Discrete response models Learning time: 2 Theory classes: 4h Laboratory classes: 4h Guided activities: 4h Self study : 8h (ENG) 1 Generalized linear model. 2. - Count response models. 3.- Binary response models. 4.- Model fit. 5.- Inference. 6.- Model checking. 7.- Prediction. 8.- Model interpretation. 9.- Contingency tables and models for a polytomous response. 3 / 5

Chapter 5: Time series models Learning time: 13h Theory classes: 3h Laboratory classes: 3h Guided activities: 3h Self study : 4h 1.- Description of a time series. 2.- AR models. 3.- MA models. 4.- ARMA models. 5.- ARIMA models. 6.- Seasonal ARIMA models. Chapter 6: Visualization of multivariate data Learning time: 6h Self study : 3h (ENG) 1.- Principal components analysis. 2.- Correspondence analysis. Chapter 7: Cluster analysis Learning time: 6h 30m Guided activities: 1h 30m Self study : 2h 1.- Hierarchical methods. 2.- Partition methods (k-means algorithm). 3.- Variable cluster analysis. Chapter 8: Discriminant analysis Learning time: 8h 30m Guided activities: 1h 30m Self study : 4h 1.- Linear discriminant. 2.- Quadratic discriminant. 3.- Logistic discriminant. 4.- Nearest neighbors. 4 / 5

Chapter 9: Non-parametric regression and classification models Learning time: 2h Theory classes: 30m Laboratory classes: 30m Self study : 1h 1.- Local smoothers. 2.- Additive models. 3.- Classification and regression trees. 4.- Neural networks. Qualification system There will be a midterm and a final exam. Grade = 0,1 Assignments + 0,3 Final Project + 0,1 Midterm + 0,5 Final Exam Bibliography Basic: Hastie, Trevor; Tibshirani, Robert; Friedman, J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. NewYork: Springer Verlag, 2011. ISBN 9780387848570. Peña, Daniel. Regresión y diseño de experimentos. Madrid: Alianza, 2002. ISBN 9788420693897. Venables, William N; Ripley, B.D. Modern Applied Statistics with S. 4th ed. New York: Springer Verlag, 2003. ISBN 0387954570. Peña, Daniel. Análisis de datos multivariantes. Madrid: McGrawHill, 2008. ISBN 9788448136109. Weisberg, Sanford. Applied Linear Regression. 3rd ed. New York: Wiley, 2005. ISBN 0471663794. Everitt, B.S.; Dunn, G. Applied Multivariate Data Analysis [on line]. 2nd. New York: Wiley, 2010 [Consultation: 27/06/2014]. Available on: <http://onlinelibrary.wiley.com/book/10.1002/9781118887486>. ISBN 9781118887486. Greenacre, Michael J. Correspondence Analysis in Practice. 2nd Ed. Boca Raton: Chapman and Hall, 2007. ISBN 9781584886167. James, Gareth [et al.]. An Introduction to statistical learning : with applications in R. New York: Springer Verlag, 2013. ISBN 9781461471370. Complementary: Clarke, Bertrand; Fokoue, Ernest; Zhang, Hao Helen. Principles and Theory for Data Mining and Machine Learning. Berlin: Springer Verlag, 2009. ISBN 9780387981345. Dobson, Annette J. An Introduction to Generalized Linear Models. 3rd ed. Boca Raton: Chapman Hall, 2008. ISBN 9781584889502. Johnson, Richard; Wichern, Dean. Applied multivariate statistical analysis. 6th ed. Englewood Cliffs, N.J: Pearson, 2007. ISBN 9780131877153. Peña, Daniel. Análisis de series temporales. Madrid: Alianza, 2005. ISBN 8420691283. Wakefield, Jon. Bayesian and frequentist regression methods. New York: Springer Verlag, 2013. ISBN 9781441909244. 5 / 5