GIE - Management of Statistical Information

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Diploma in Library and Information Science (Part-Time) - SH220

Guide to Teaching Computer Science

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Specification of the Verity Learning Companion and Self-Assessment Tool

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

A Case Study: News Classification Based on Term Frequency

TEACHING AND EXAMINATION REGULATIONS PART B: programme-specific section MASTER S PROGRAMME IN LOGIC

Curriculum for the Bachelor Programme in Digital Media and Design at the IT University of Copenhagen

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

School of Innovative Technologies and Engineering

SSE - Supervision of Electrical Systems

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

On-Line Data Analytics

RESEARCH METHODS AND LIBRARY INFORMATION SCIENCE

Reducing Features to Improve Bug Prediction

Agent-Based Software Engineering

Mining Association Rules in Student s Assessment Data

Word Segmentation of Off-line Handwritten Documents

Computer Organization I (Tietokoneen toiminta)

Lecture 1: Basic Concepts of Machine Learning

Academic Catalog Programs & Courses Manchester Community College

Computer Science (CS)

MINISTRY OF EDUCATION

CS Machine Learning

Learning From the Past with Experiment Databases

Mathematics 112 Phone: (580) Southeastern Oklahoma State University Web: Durant, OK USA

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Statistics and Data Analytics Minor

Automating Outcome Based Assessment

Online Marking of Essay-type Assignments

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Assignment 1: Predicting Amazon Review Ratings

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

GACE Computer Science Assessment Test at a Glance

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Date : Controller of Examinations Principal Wednesday Saturday Wednesday

A Note on Structuring Employability Skills for Accounting Students

Curriculum for the Academy Profession Degree Programme in Energy Technology

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Arts, Humanities and Social Science Faculty

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Apps4VA at JMU. Student Projects Featuring VLDS Data. Dr. Chris Mayfield. Department of Computer Science James Madison University

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Australian Journal of Basic and Applied Sciences

Evolution of Symbolisation in Chimpanzees and Neural Nets

BSc Food Marketing and Business Economics with Industrial Training For students entering Part 1 in 2015/6

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Unit 7 Data analysis and design

GRAPHIC DESIGN TECHNOLOGY Associate in Applied Science: 91 Credit Hours

Exposé for a Master s Thesis

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Top US Tech Talent for the Top China Tech Company

STUDENT INFORMATION GUIDE MASTER'S DEGREE PROGRAMME ENERGY AND ENVIRONMENTAL SCIENCES (EES) 2016/2017. Faculty of Mathematics and Natural Sciences

Implementation Regulations

MAHATMA GANDHI KASHI VIDYAPITH Deptt. of Library and Information Science B.Lib. I.Sc. Syllabus

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

5. UPPER INTERMEDIATE

DOUBLE DEGREE PROGRAM AT EURECOM. June 2017 Caroline HANRAS International Relations Manager

Probabilistic Latent Semantic Analysis

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Programme Specification

Computerized Adaptive Psychological Testing A Personalisation Perspective

Speech Recognition at ICSI: Broadcast News and beyond

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

OVERVIEW & CLASSIFICATION OF WEB-BASED EDUCATION (SYSTEMS, TOOLS & PRACTICES)

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

16.1 Lesson: Putting it into practice - isikhnas

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

PROGRAMME SYLLABUS International Management, Bachelor programme, 180

Studies Arts, Humanities and Social Science Faculty

Lecture 1: Machine Learning Basics

Ministry of Education and Science of Kazakhstan. Karaganda State Technical University

ebusiness Technologies Spring 2000 Syllabus

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Welcome to. ECML/PKDD 2004 Community meeting

Universidade do Minho Escola de Engenharia

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Research computing Results

Rule Learning With Negation: Issues Regarding Effectiveness

Applications of memory-based natural language processing

Firms and Markets Saturdays Summer I 2014

(Sub)Gradient Descent

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

DBA Program Curriculum

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Introduction to Financial Accounting

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

KUBAN STATE UNIVERSITY: DOUBLE-DEGREE MASTER S PROGRAMME INNOVATION FOR THE INSTITUTION ENVIRONMENT

Transcription:

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2016 200 - FME - School of Mathematics and Statistics 707 - ESAII - Department of Automatic Control 723 - CS - Department of Computer Science 1004 - UB - (ENG)Universitat de Barcelona 715 - EIO - Department of Statistics and Operations Research 5 Teaching languages: English Teaching staff Coordinator: Others: PEDRO FRANCISCO DELICADO USEROS Segon quadrimestre: PEDRO FRANCISCO DELICADO USEROS - A, B JOAQUIN GABARRÓ VALLÉS - A, B ALEXANDRE PERERA LLUNA - A, B ÀLEX SÁNCHEZ PLA - A, B Prior skills Compulsory subject for all students. The student has already developed several abilities in Statistics and/or Operations Research in the previous semester. The student must know basic computing environment and programming capabilities such as those developed by the mandatory course " Statistical Computation and Optimization". A B2 (Cambridge First Certificate, TOEFL PBT >550) level of English is required. Degree competences to which the subject contributes Specific: 3. CE-1. Ability to design and manage the collection of information and coding, handling, storing and processing it. 4. CE-4. Ability to use different inference procedures to answer questions, identifying the properties of different estimation methods and their advantages and disadvantages, tailored to a specific situation and a specific context. 5. CE-5. Ability to formulate and solve real problems of decision-making in different application areas being able to choose the statistical method and the optimization algorithm more suitable in every occasion. Translate to english 6. CE-6. Ability to use appropriate software to perform the necessary calculations in solving a problem. 7. CE-7. Ability to understand statistical and operations research papers of an advanced level. Know the research procedures for both the production of new knowledge and its transmission. 8. CE-8. Ability to discuss the validity, scope and relevance of these solutions and be able to present and defend their conclusions. Transversal: 1. ENTREPRENEURSHIP AND INNOVATION: Being aware of and understanding how companies are organised and the principles that govern their activity, and being able to understand employment regulations and the relationships between planning, industrial and commercial strategies, quality and profit. 2. EFFECTIVE USE OF INFORMATION RESOURCES: Managing the acquisition, structuring, analysis and display of data and information in the chosen area of specialisation and critically assessing the results obtained. 10. FOREIGN LANGUAGE: Achieving a level of spoken and written proficiency in a foreign language, preferably 1 / 8

English, that meets the needs of the profession and the labour market. 11. TEAMWORK: Being able to work in an interdisciplinary team, whether as a member or as a leader, with the aim of contributing to projects pragmatically and responsibly and making commitments in view of the resources that are available. Teaching methodology The course is divided into 3 modules that are taught in succession. Each module consists of the third part of the sessions. All classes are theoretical-practical and in them teachers present and discuss the basic concepts of each module. The support material will be published previously in Athena (teaching guide, contents, course slides, examples, evaluation activities schedule, bibliography,...). The student should devote the autonomous learning hours to the study of the subjects of the course, bibliography extension and follow-up of the laboratory practices. Learning objectives of the subject This course presents and discusses tools and techniques to prepare students for their professional development. The course consists of three main modules. MODULE 1: First modulus will cover a crash course for scientific python for data analysis for (around 15h). This crash course will include include three main stages: * Introduction to python language as a tool. Workflow, ipython, ipython notebook (jupyter), basic types, mutability and inmutability and object oriented programming. * Short introduction to numerical python and matplotlib for graphical visualization. * Introduction to scientific kits for data analysis with machine learning. Principal components analysis, clustering and supervised analysis with multivariate data. MODULE 2: The second module develops relational databases. At the end of this module, students should be able to work fluently with a client/server relational DB system like PostgreSQL or MariaDB. In a more specific way: * Query an existing DB. * Update a current DB and create (a small) DB. * Work with instruments like triggers and stored procedures. * Understand the problems and solutions with the concurrent access. MODULE 3 An important aspect when dealing with data is that often they are found in the web in formats that require some preprocessing before being analyzed. This module will explore techniques to understand these formats so that you can retrieve data from the web and extract the desired information. The first part of the module introduces the most common web technologies, their relationship and some tools to manipulate and extract the information. Then the most common formats for storing web information (HTML, XML, JSON) are presented, as well as tools to extract it, as XPath and CSS selectors. Finally we introduce some technical package suitable to process Web information with R. Specifically at the end of the module students should: * To be familiar with the main technologies with information stored in the web. * To recognize the different formats that can be used for storage. * To learn how to extract information from these formats using specific R packages. 2 / 8

Study load Total learning time: 125h Hours large group: 30h 24.00% Hours medium group: 0h 0.00% Hours small group: 15h 12.00% Guided activities: 0h 0.00% Self study: 80h 64.00% 3 / 8

Content Introduction to Python a. Why Python? b. Python History c. Installing Python d. Python resources Working with Python a. Workflow b. ipython vs. CLI c. Text Editors d. IDEs e. Notebook Getting started with Python a. Introduction b. Getting Help c. Basic types d. Mutable and in-mutable e. Assignment operator f. Controlling execution flow g. Exception handling 4 / 8

Functions and Object Oriented Programming a. Defining Functions b. Input and Output c. Standard Library d. Object-oriented programming Introduction to NumPy a. Overview b. Arrays c. Operations on arrays d. Advanced arrays (ndarrays) e. Notes on Performance (\%timeit in ipython) Matplotlib a. Introduction b. Figures and Subplots c. Axes and Further Control of Figures d. Other Plot Types e. Animations Python scikits a. Introduction b. scikit-timeseries 5 / 8

scikit-learn a. Datasets b. Sample generators c. Unsupervised Learning d. Supervised Learning i. Linear and Quadratic Discriminant Analysis ii. Nearest Neighbors iii. Support Vector Machines e. Feature Selection Practical Introduction to Scikit-learn a. Solving an eigenfaces problem i. Goals ii. Data description iii. Initial Classes iv. Importing data b. Unsupervised analysis i. Descriptive Statistics ii. Principal Component Analysis iii. Clustering c. Supervised Analysis i. k-nearest Neighbors ii. Support Vector Classification iii. Cross validation Introduction to the relational data bases Learning time: 5h Theory classes: 2h Laboratory classes: 3h Basic concepts on DB like tables, tuples. First steps in PostgreSQL 6 / 8

SQL and relational algebra Learning time: 5h Theory classes: 2h Laboratory classes: 3h Queries, insertions and deletions, joints, Elements of the relational algebra. Ordering, grouping, averages. Transactions Learning time: 5h Theory classes: 2h Laboratory classes: 3h Problems on the concurrent access. ACID properties. Different levels of isolation Web data processing Learning time: 15h Theory classes: 15h 1. Introduction to technology for wed data. (1.5h) 2. The languages and formats for the web: HTML, XML, JSON, XPath, CSS (4.5h) 3. Programs and communication protocols: HTTP (1.5h) 4. Retrieving web data: web "scrapping" and text mining (4.5h) 5. Data project management and case studies (3h) Qualification system There will be a grade for each module, derived from an exam or from a final project, depending on the module. The final grade will be the average of the grades of the 3 modules 7 / 8

Bibliography Basic: Langtangen, H.P. A Primer on Scientific Programming with Python [on line]. Springer, 2011Available on: <https://hplgit.github.io/primer /doc/pub/half/book.pdf>. ISBN 978-3-642-18365-2. Munzert, S.; Rubba, R.; Meiboner, P.; Nyhuis, D. Automated data collection with R: A Practical guide to web scraping and text mining. Wiley, 2015. ISBN 978-1118834817. Nolan, D.; Lang, D.T. XML and web technologies for data sciences with R. Springer, 2014. ISBN 978-1-4614-7899-7. Shapiro, B.E. Scientific Computation: Python Hacking for Math Junkies. Sherwood Forest Books, 2015. ISBN 9780692366936. Stones, Richard; Matthew, Neil. Beginning databases with Postgre SQL : from novice to professional [on line]. 2nd ed. USA: Apress, 2005Available on: <http://site.ebrary.com/lib/upcatalunya/docdetail.action?docid=10150839>. ISBN 978-1-59059-478-0. Suehring, S.;Valade, J. PHP, MySQL, & HTML5 All-in-One For Dummies. Wiley, 2013. ISBN 978-1-118-21370-4. Complementary: Garcia-Molina, Hector ; Ullman, Jeffrey D. ; Widom, Jennifer. Database Systems: the complete book. 2nd ed. USA: Pearson, 2009. ISBN 0131873253. Spector, P.. Concepts in computing with data (Stat 133, UC Berkeley) [on line]. Berkeley, 2011Available on: <http://www.stat.berkeley.edu/ spector/s133/index >. 8 / 8