240ST014 - Data Analysis of Transport and Logistics

Similar documents
STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Detailed course syllabus

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

12- A whirlwind tour of statistics

Office Hours: Mon & Fri 10:00-12:00. Course Description

(Sub)Gradient Descent

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Hierarchical Linear Models I: Introduction ICPSR 2015

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Lecture 1: Machine Learning Basics

ATW 202. Business Research Methods

An Introduction to Simio for Beginners

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

SSE - Supervision of Electrical Systems

A. What is research? B. Types of research

Theory of Probability

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Assignment 1: Predicting Amazon Review Ratings

Computational Data Analysis Techniques In Economics And Finance

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

EGRHS Course Fair. Science & Math AP & IB Courses

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS/SE 3341 Spring 2012

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Learning From the Past with Experiment Databases

Multivariate k-nearest Neighbor Regression for Time Series data -

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Time series prediction

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Mathematics subject curriculum

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Python Machine Learning

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

MASTER OF PHILOSOPHY IN STATISTICS

Math 96: Intermediate Algebra in Context

BMBF Project ROBUKOM: Robust Communication Networks

Setting the Scene: ECVET and ECTS the two transfer (and accumulation) systems for education and training

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

MGT/MGP/MGB 261: Investment Analysis

Why Did My Detector Do That?!

Strategy and Design of ICT Services

Lecture 15: Test Procedure in Engineering Design

SAT MATH PREP:

Julia Smith. Effective Classroom Approaches to.

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

CS Machine Learning

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Instructor: Matthew Wickes Kilgore Office: ES 310

1. Programme title and designation International Management N/A

Mathematics. Mathematics

BA 130 Introduction to International Business

Networks and the Diffusion of Cutting-Edge Teaching and Learning Knowledge in Sociology

Learning Disability Functional Capacity Evaluation. Dear Doctor,

DBA Program Curriculum

School Size and the Quality of Teaching and Learning

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Evolutive Neural Net Fuzzy Filtering: Basic Description

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED MECHANICS MET 2025

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

A Reinforcement Learning Variant for Control Scheduling

Programme Specification and Curriculum Map for Foundation Year

University of Groningen. Systemen, planning, netwerken Bosman, Aart

SAMPLE SYLLABUS. Master of Health Care Administration Academic Center 3rd Floor Des Moines, Iowa 50312

The Moodle and joule 2 Teacher Toolkit

The My Class Activities Instrument as Used in Saturday Enrichment Program Evaluation

Knowledge management styles and performance: a knowledge space model from both theoretical and empirical perspectives

Evaluation of Teach For America:

APPENDIX A: Process Sigma Table (I)

BENCHMARK TREND COMPARISON REPORT:

M-Learning. Hauptseminar E-Learning Sommersemester Michael Kellerer LFE Medieninformatik

E-Teaching Materials as the Means to Improve Humanities Teaching Proficiency in the Context of Education Informatization

Intermediate Computable General Equilibrium (CGE) Modelling: Online Single Country Course

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Probability and Game Theory Course Syllabus

A Case Study: News Classification Based on Term Frequency

Applications of data mining algorithms to analysis of medical data

Mathematics Program Assessment Plan

Investment in e- journals, use and research outcomes

Statistics and Data Analytics Minor

Measurement. When Smaller Is Better. Activity:

Transcription:

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2017 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research MASTER'S DEGREE IN SUPPLY CHAINS, TRANSPORT AND MOBILITY (Syllabus 2014). (Teaching unit Compulsory) MASTER'S DEGREE IN INDUSTRIAL ENGINEERING (Syllabus 2014). (Teaching unit Optional) MASTER'S DEGREE IN STATISTICS AND OPERATIONS RESEARCH (Syllabus 2013). (Teaching unit Optional) 5 Teaching languages: English Teaching staff Coordinator: Montero Mercadé, Lídia Requirements Students must have sufficient knowledge of algebra and mathematical analysis in order to assimilate concepts regarding probability, univariant distribution of random variates, numerical series, matrix algebra, he functions of real variables in one or more dimensions, derivation and integration. Student must have basic programming skills in pseucode or in a high level programming language Degree competences to which the subject contributes Specific: CESCTM2. Develop procedures for collecting transportation data that take into account their specificity, namely to apply appropriate treat, analyze and draw conclusions for appropriate use in models that require techniques. CESCTM3. Design and conduct studies demand analysis, demand modeling and structuring for different transport models. CETM2. Understanding and quantifying capacity fundamentals transport systems and mobility determine the safety, quality and sustainability of transport infrastructure and optimizing the operation of these systems. 1 / 8

Teaching methodology Learning the course consists of three distinct phases: 1. Acquisition of specific knowledge through the study of literature and material provided by teachers. 2. The acquisition of skills in specific techniques of data analysis, exploitation of information and statistical modeling. 3. Integration of knowledge, skills and competencies (specific and generic) by solving short case studies. Theory classes expose the foundations of methodologies and techniques of the subject. The laboratory classes are intented to learn the use of specific techniques for problem solving in statistical data analysis, using appropriate informatics tools, in this sense, students first must follow and take notes about the analysis carried out by the teacher and then solve in the selflearning hours a similar one, that focuses on the current block/contents and with a questionnaire included in the description of laboratory sessions. The short case study described in the questionnaire has to be solved according to the questionnaire at most in 1 week as will be indicated by the lecturer during the lab session. Feedback will be provided before the next laboratory session, where a discussion about common problems encountered by the teacher will be place in the first 20 min. The course case studies, where students are setttled in groups in selflearning hours, serves to put into practice the knowledge, skills and competences in solving a case provided by the teacher and related to Logistics, Transportation and Mobility. R software is the selected statistical tools for data analysis and modeling. Common professional software (TransCAD, EMME4, VISSUM) capabilities are presented and related to R tools. Learning objectives of the subject Learn how to make a report on data quality (missing data profile, univariate and bivariate outlier detection). Missing recovery. Learn how to use and interpret fundamental concepts in probability and statistics from a practical point of view when using R statistical software: random event, population, sample, random variable, common continuous and discrete random variates. Point and interval estimate. Computational statistical inference. Learn how to analyze databases including numeric and graphical univariate description, bivariate and multivariate tools. Determination of the significant characteristics of groups of individuals. Learn how to make a profile for a target response, either quantitative or qualitative. Feature selection. Learn the basic principles of Classification: hierarchical classification techniques and K-Nearest neighbors. Perform and validate a proposed classification using R software. Know how to Model a numeric responses using general linear regression: formulation, estimation and interpretation of statistical models using R software. Know how to interpret indicators on Model comparison and selection for general regression models: Goodness of fit statistics (R2, F-Test for nested models, AIC, BIC, etc) Know how to validate general regression models: outliers and influential data. Know to Apply general regression models to the generation/attraction of trip at Zones of Transport (ZAT). Know how to Model discrete choices- by generalized linear models : formulation, estimation and interpretation of statistical models using R software. Know how to interpret indicators on Model comparison and selection for generalized linear models: Goodness of fit statistics (X2 Pearson, Deviance Test for nested models, AIC, BIC, etc) Know how to validate generalized linear models. Know how to forecast in general linear models and binary choice generalized linear models using R. Apply to the modal choice between pairs of Zones of Transport (ZAT). Aggregated vs Disaggregated models. Know the basic principles of Sampling Theory: point and interval estimates. Learn how to compute relative vs absolute errors for estimates of means, totals and proportions in random sampling and stratified sampling. 2 / 8

Study load Total learning time: 125h Hours large group: 0h 0.00% Hours medium group: 30h 24.00% Hours small group: 15h 12.00% Guided activities: 0h 0.00% Self study: 80h 64.00% 3 / 8

Content Block 1. Introduction to Data Analysis in Transportation and Logistics Learning time: 6h Theory classes: 2h Practical classes: 1h Self study : 3h Introduction to common data collections and surveys in Logistics, Transportation and Mobility: home-based surveys, O-D surveys, cordon surveys, stated and revealed preferences surveys. Traffic data collection: inductive loop sensors and new technologies (Bluetooth data, wireless magnetic sensor data, etc). Related activities: Theory class and Introduction to R in Laboratory The purpose of the subject is to provide students with the knowledge and skills to cope with exploratory data analysis and data mining needs of organizations and professional practice in the field of Supply Chain, Transportation and Mobility. That is, to take advantage of the data stored by stakeholders to integrate automatic systems to aid decision making and traffic operations and management. The underlying idea is that data are a treasure for stakeholders and through its exploration becomes clear information contained in them. The course is developed based on solving the problems of case studies. It is divided into four areas: Exploratory Data Analysis and Interpretation: summary description, Computational Statistical Inference, Modeling and Prediction-tools and Design of Questionnaires and Sampling Design. The subject will give a solid background in the techniques to manage, analyze, model and extract knowledge from the current massive data sets, databases, Internet,..., as well as in the techniques to exploit that knowledge in the sector. Block 2. Exploratory Data Analysis Learning time: 18h Theory classes: 4h Practical classes: 2h Self study : 12h Exploratory Data Analysis: numerical and graphical tools for univariant/bivariant data (quantitative and qualitative characteristics). Missing data: profile and recovery. Detection of univariant and bivariant outliers. Association measures for multivariant data (Pearson/Spearman correlation).example of massive data: traffic counts (missing recovery, outlier detection, filtering) Learn how to make a report on data quality (missing data profile, univariate and bivariate outlier detection). Missing recovery. Learn how to analyze databases including numeric and graphical univariate description, bivariate and multivariate tools in R. Determination of the significant characteristics of groups of individuals. 4 / 8

Block 3. Computational Statistical Inference Learning time: 24h Theory classes: 6h Practical classes: 4h Self study : 14h Basic statistical elements used in transportation and logistics: common univariate distributions (binomial, multinomial, Poisson, exponential, Weibull, gamma, (log)logistic, (log)normal, etc) with emphasis in moments and characteristic parameters (location, scale and shape). Input Data Analysis. Computational statistical inference for means, proportions and variances according to groups: parametrics and nonparametrics tests for (Chi2, Anderson-Darling, Wilcoxon, Kruskal-Wallis, Barlett, etc). Learn how to use and interpret fundamental concepts in probability and statistics from a practical point of view when using R statistical software: random event, population, sample, random variable, common continuous and discrete random variates. Point and interval estimate. Computational statistical inference. Input Data Analysis and Model Fitting. Block 4. Statistical Modeling through Regression Learning time: 24h Theory classes: 8h Practical classes: 4h Self study : 12h Modeling through multiple regression models. Least square estimation. Properties. Conditions for optimal inferential properties and problems when properties lack. Transforming variables. Diagnostic tools and statistics: residuals, influent data and outliers. General Linear Model: how to introduce qualitative variables as explicative variables - definition of dummy variables. Main effects and interactions between factors and covariates: model interpretation and validation. F-Test to compare nested models. Example: Modeling home to work peak morning generation trips between Transportation Zones (ZAT) with R software, lm() method and step() for best model selection. Modeling of numeric responses: formulation, estimation and interpretation of statistical models using R software. Model comparison and selection: Goodness of fit statistics (R2, F-Test for nested models, AIC, BIC, etc) Make diagnosis of general linear models: outliers and influential data. Learn how to forecast a numeric target using a general linear model. Learn how apply general linear models to the generation/attraction models between Zones of Transport (ZAT). 5 / 8

Block 5. Modeling Binary Response Data Learning time: 18h Theory classes: 4h Practical classes: 2h Self study : 12h Modeling binary discrete data through generalized regression models: link function role, ML estimation, properties, model validation and interpretation. Forescasting. ROC Curve. Deviance Test to compare nested models. Case study of mode selection between public and private modes according to trip and individual characteristics: glm() for binary family in R. Learn how to Model discrete choices with generalized linear models : formulation, estimation and interpretation of statistical models using R software. Learn how to Perform model comparison and best model selection. Know how to compute with R Goodness of fit statistics and related tests (X2 Pearson, Deviance Test for nested models, AIC, BIC, etc) Learn how to perform a Diagnosis of a generalized linear models for a discrete binary choice using R. Learn how to Apply generalized linear models for binary choice to modal split between pairs of Zones of Transport (ZAT). Understand pros and cons of Aggregated vs Disaggregated data format for models. Block 6. Introduction to Sampling Theory Learning time: 12h Theory classes: 4h Self study : 8h Introduction to Sampling theory: random sampling and stratified sampling. Point and interval estimates for means, totals and proportions in random sampling. Selection of sample size to satisfy absolute/relative errors in random and stratified sampling. Example: Setting a homebase survey sampling size. Know the basic principles of Sampling Theory: point and interval estimates. Learn how to compute relative vs absolute error estimates for means, totals and proportions in random sampling and stratified sampling. 6 / 8

Block 7. Introductory Data Mining Learning time: 12h Theory classes: 2h Practical classes: 2h Self study : 8h Data Mining in massive data: Useful methods for Logistics and Transportation. Classification: segmentation of the population of a study area? hierarchical classification with R. Principle components as a tool for reduction of dimensionality. Example: Satisfaction survey for transport users of a bus network Knowing how to turn data into information that is of use for decision making. Learn how to perform a Profiling in R. Learn Reduction of dimensionality strategies. Learn how to perform a Hierarchical classification in R. Learn how to perform a K Means in R. Assessment: Quizz and Final Exam Learning time: 11h Theory classes: 5h Self study : 6h Assessment: Quizz and Final Exam Qualification system The evaluation of the course integrates the three phases of learning process: knowledge, skills and competencies. The knowledge is assessed by one quiz and the final exam (F1 and F2 scores), in the middle and last week of the course. The skills and competencies are assessed from the delivery of m practices (m>1) based on the short case studies and related to the contents of the course. Each of the blocks, except the first one, might involve a practice that students will perform in group (at most 3 persons). The average of the m scores comes out the L score. Students have to quantify the hours addressed to solve each practice and deliver it through ATENEA's DUE Task. Feedback for formative evaluation will be given by the lecturer at most in 10 days before the next laboratory session when common problems and mistakes will be discussed in the first 20 min. The final grade will obtained weighing the three scores: Final Mark = 0.65F + 0.35L. Where F is Max(F2,0.3F1+0.7F2). Regulations for carrying out activities Students can carry all the slides presented in the theory sessions, calculator, statistical tables, etc.. Not allowed to carry resolutions of exams from previous years, but resolutions of case studies available on the ADTL Website are allowed. 7 / 8

Bibliography Basic: Washington, S.P. ; Karlaftis, M.G. ; Mannering, F.L. Statistical and Econometric methods for transportation data analysis. 2nd. Boca Raton: Chapman and Hall, 2011. ISBN 9781420082852. Dalgaard, Peter. Introductory Statistics with R [on line]. 2nd ed. New York: Springer, 2008 [Consultation: 05/10/2017]. Available on: <http://dx.doi.org/10.1007/978-0-387-79054-1>. ISBN 9780387790534. Clairin, Rémy ; Brion, Philippe. Manual de Muestreo. Madrid: La Muralla, 2001. ISBN 8471337118. Fox, John. Applied Regression Analysis and Generalized Linear Models. 2nd ed. Los Angeles: SAGE, 2008. ISBN 9780761930426. Fox, John ; Weisber, Sanford. An R Companion to Applied Regression. 2nd ed. Thousands Oaks: SAGE, 2002. ISBN 9781412975148. Ortúzar S., Juan de Dios; Willumsen, Luis G. Modelling transport. 4th ed. Chichester: John Wiley & Sons, 2011. ISBN 9780470760390. Others resources: Website Course: - Planning Course - Lecture Notes and slides used in lectures. - Description of the practice sessions, questionnaires for each block and case studies. - Case Study: Data (Excel and MS-R) and description of the context and the target variable / s. - Guidelines for case studies presented in the form of a list of questions to guide the analysis. - Quizzes and final exams from previous years. Hyperlink Web Docent ADTL http://www-eio.upc.es/teaching/adtl/ Computer material ATENEA - Tasques Tasks in ATENEA to deliver Assignments 8 / 8