Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2017 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research BACHELOR'S DEGREE IN INDUSTRIAL TECHNOLOGY ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN MATERIALS ENGINEERING (Syllabus 2010). (Teaching unit Optional) BACHELOR'S DEGREE IN CHEMICAL ENGINEERING (Syllabus 2010). (Teaching unit Optional) 4,5 Teaching languages: Catalan Teaching staff Coordinator: Others: JOSEP GINEBRA JOSEP GINEBRA Opening hours Timetable: Monday and wednesday from 11:30 to 13:30 Prior skills To have passed Estadística. Degree competences to which the subject contributes Specific: 1. Basic knowledge on the use and programming of computers, operative systems, data bases and computer software with an engineering application. 2. Knowledge and capacities to organise and manage projects. Knowing the organisational structure and functions of a project office. Transversal: 3. EFFICIENT ORAL AND WRITTEN COMMUNICATION. Communicating verbally and in writing about learning outcomes, thought-building and decision-making. Taking part in debates about issues related to the own field of specialization. 4. TEAMWORK. Being able to work as a team player, either as a member or as a leader. Contributing to projects pragmatically and responsibly, by reaching commitments in accordance to the resources that are available. Teaching methodology One half of the classes will be expository in nature, and the other half will be done in a computer lab. We will use MINITAB. At the end of the sessions in the computer lab students will be given assignments that will have to be handed in the next computer lab session and will be graded. On the second half of the semester there will be a final project. Learning objectives of the subject At the end of the course the student should be able to identify situations where it is useful to analize data, to identify the model and/or method of analysis that is best for his data, to build a model that summarizes the information in the data and allows to make predictions, to reduce the dimensionality and visualize multivariate data, to implement supervised and unsupervised classification algorithms, and to evaluate the quality of the results obtained. 1 / 5
Study load Total learning time: 112h 30m Hours large group: Hours medium group: 45h 4 Hours small group: Guided activities: Self study: 67h 30m 6 2 / 5
Content Chapter 1: Introduction Learning time: 3h 30m Guided activities: 1h Self study : 1h (ENG) 1.- Problems: Association, prediction and classification. 2.- Tools: Statistical models and multivariate analysis. Chapter 2: Linear models for continous response Learning time: 45h Theory classes: 9h Laboratory classes: 9h Guided activities: 9h Self study : 18h (ENG) 1.- Normal linear model. 3.- Model fit; least squares and robust regression. 3.- ANOVA table and goodness of fit measures. 4.- Inference on the model parameters. 5.- Prediction. 6.- Model checking. 7.- Model selection. 8.- Lack of fit tests. 9.- Model interpretation; Bias, colinearity and causality. 10.- Use of categorical explanatory variables. 11.- Comparison of means. 12.- Analysis of two-level factorial designs. 13- Cross validation and PRESS. Chapter 3: Non-linear models for a continuous response Learning time: 6h Self study : 3h 1.- Normal non-linear model. 2.- Model fit. 3.- Inference. 4.- Model checking. Chapter 4: Discrete response models Learning time: 2 Theory classes: 4h Laboratory classes: 4h Guided activities: 4h Self study : 8h (ENG) 1 Generalized linear model. 2. - Count response models. 3.- Binary response models. 4.- Model fit. 5.- Inference. 6.- Model checking. 7.- Prediction. 8.- Model interpretation. 9.- Contingency tables and models for a polytomous response. 3 / 5
Chapter 5: Time series models Learning time: 13h Theory classes: 3h Laboratory classes: 3h Guided activities: 3h Self study : 4h 1.- Description of a time series. 2.- AR models. 3.- MA models. 4.- ARMA models. 5.- ARIMA models. 6.- Seasonal ARIMA models. Chapter 6: Visualization of multivariate data Learning time: 6h Self study : 3h (ENG) 1.- Principal components analysis. 2.- Correspondence analysis. Chapter 7: Cluster analysis Learning time: 6h 30m Guided activities: 1h 30m Self study : 2h 1.- Hierarchical methods. 2.- Partition methods (k-means algorithm). 3.- Variable cluster analysis. Chapter 8: Discriminant analysis Learning time: 8h 30m Guided activities: 1h 30m Self study : 4h 1.- Linear discriminant. 2.- Quadratic discriminant. 3.- Logistic discriminant. 4.- Nearest neighbors. 4 / 5
Chapter 9: Non-parametric regression and classification models Learning time: 2h Theory classes: 30m Laboratory classes: 30m Self study : 1h 1.- Local smoothers. 2.- Additive models. 3.- Classification and regression trees. 4.- Neural networks. Qualification system There will be a midterm and a final exam. Grade = 0,1 Assignments + 0,3 Final Project + 0,1 Midterm + 0,5 Final Exam Bibliography Basic: Hastie, Trevor; Tibshirani, Robert; Friedman, J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. NewYork: Springer Verlag, 2011. ISBN 9780387848570. Peña, Daniel. Regresión y diseño de experimentos. Madrid: Alianza, 2002. ISBN 9788420693897. Venables, William N; Ripley, B.D. Modern Applied Statistics with S. 4th ed. New York: Springer Verlag, 2003. ISBN 0387954570. Peña, Daniel. Análisis de datos multivariantes. Madrid: McGrawHill, 2008. ISBN 9788448136109. Weisberg, Sanford. Applied Linear Regression. 3rd ed. New York: Wiley, 2005. ISBN 0471663794. Everitt, B.S.; Dunn, G. Applied Multivariate Data Analysis [on line]. 2nd. New York: Wiley, 2010 [Consultation: 27/06/2014]. Available on: <http://onlinelibrary.wiley.com/book/10.1002/9781118887486>. ISBN 9781118887486. Greenacre, Michael J. Correspondence Analysis in Practice. 2nd Ed. Boca Raton: Chapman and Hall, 2007. ISBN 9781584886167. James, Gareth [et al.]. An Introduction to statistical learning : with applications in R. New York: Springer Verlag, 2013. ISBN 9781461471370. Complementary: Clarke, Bertrand; Fokoue, Ernest; Zhang, Hao Helen. Principles and Theory for Data Mining and Machine Learning. Berlin: Springer Verlag, 2009. ISBN 9780387981345. Dobson, Annette J. An Introduction to Generalized Linear Models. 3rd ed. Boca Raton: Chapman Hall, 2008. ISBN 9781584889502. Johnson, Richard; Wichern, Dean. Applied multivariate statistical analysis. 6th ed. Englewood Cliffs, N.J: Pearson, 2007. ISBN 9780131877153. Peña, Daniel. Análisis de series temporales. Madrid: Alianza, 2005. ISBN 8420691283. Wakefield, Jon. Bayesian and frequentist regression methods. New York: Springer Verlag, 2013. ISBN 9781441909244. 5 / 5