Faculty of Science Course Syllabus Department of Mathematics and Statistics Introduction to Data Mining with R STAT 2450 Winter 2016

Similar documents
Python Machine Learning

Faculty of Science Course Syllabus Department of Biology BIOL 3327 Entomology Summer 2016

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Learning Methods for Fuzzy Systems

Learning From the Past with Experiment Databases

Artificial Neural Networks written examination

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Softprop: Softmax Neural Network Backpropagation Learning

CS Machine Learning

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Syllabus for ART 365 Digital Photography 3 Credit Hours Spring 2013

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Australian Journal of Basic and Applied Sciences

Office Hours: Day Time Location TR 12:00pm - 2:00pm Main Campus Carl DeSantis Building 5136

Assignment 1: Predicting Amazon Review Ratings

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Issues in the Mining of Heart Failure Datasets

MTH 141 Calculus 1 Syllabus Spring 2017

DIGITAL GAMING AND SIMULATION Course Syllabus Advanced Game Programming GAME 2374

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

IDS 240 Interdisciplinary Research Methods

CS 100: Principles of Computing

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Maintaining Resilience in Teaching: Navigating Common Core and More Online Participant Syllabus

Axiom 2013 Team Description Paper

Generative models and adversarial training

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

Instructor: Matthew Wickes Kilgore Office: ES 310

Course Syllabus Chem 482: Chemistry Seminar

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

use different techniques and equipment with guidance

Reducing Features to Improve Bug Prediction

Time series prediction

Instructor Experience and Qualifications Professor of Business at NDNU; Over twenty-five years of experience in teaching undergraduate students.

Guide to Teaching Computer Science

Math 181, Calculus I

HSMP 6611 Strategic Management in Health Care (Strg Mgmt in Health Care) Fall 2012 Thursday 5:30 7:20 PM Ed 2 North, 2301

STUDENT ASSESSMENT, EVALUATION AND PROMOTION

Course Syllabus MFG Modern Manufacturing Techniques I Spring 2017

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

MTH 215: Introduction to Linear Algebra

Answer Key Applied Calculus 4

Probabilistic Latent Semantic Analysis

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Syllabus Education Department Lincoln University EDU 311 Social Studies Methods

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Chromatography Syllabus and Course Information 2 Credits Fall 2016

Texas A&M University - Central Texas PSYK EDUCATIONAL PSYCHOLOGY INSTRUCTOR AND CONTACT INFORMATION

Welcome to. ECML/PKDD 2004 Community meeting

Introduction to Personality Daily 11:00 11:50am

SOCIAL PSYCHOLOGY. This course meets the following university learning outcomes: 1. Demonstrate an integrative knowledge of human and natural worlds

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Syllabus for PRP 428 Public Relations Case Studies 3 Credit Hours Fall 2012

Syllabus for GBIB 634 Wisdom Literature 3 Credit hours Spring 2014

Managing Sustainable Operations MGMT 410 Bachelor of Business Administration (Sustainable Business Practices) Business Administration Program

Human Emotion Recognition From Speech

ECON 6901 Research Methods for Economists I Spring 2017

University of Waterloo Department of Economics Economics 102 (Section 006) Introduction to Macroeconomics Winter 2012

MGT/MGP/MGB 261: Investment Analysis

10.2. Behavior models

SARDNET: A Self-Organizing Feature Map for Sequences

LMIS430: Administration of the School Library Media Center

BIOL Nutrition and Diet Therapy Blinn College-Bryan Campus Course Syllabus Spring 2011

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

Statewide Framework Document for:

A Case Study: News Classification Based on Term Frequency

Course outline. Code: SPX352 Title: Sports Nutrition

General Chemistry II, CHEM Blinn College Bryan Campus Course Syllabus Fall 2011

MGMT3274 INTERNATONAL BUSINESS PROCESSES AND PROBLEMS

BA 130 Introduction to International Business

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

ITM2500 Spreadsheet & Database Productivity. Spreadsheet & Database Productivity

ECON 484-A1 GAME THEORY AND ECONOMIC APPLICATIONS

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

National Survey of Student Engagement (NSSE) Temple University 2016 Results

CS/SE 3341 Spring 2012

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Texas A&M University-Central Texas CISK Comprehensive Networking C_SK Computer Networks Monday/Wednesday 5.

Digital Media Literacy

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

CENTRAL MICHIGAN UNIVERSITY COLLEGE OF EDUCATION AND HUMAN SERVICES

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Transcription:

Faculty of Science Course Syllabus Department of Mathematics and Statistics Introduction to Data Mining with R STAT 2450 Winter 2016 Instructor(s): Hong Gu hgu@dal.ca Lectures: TR 11:35 12:55 (LSC-COMMON AREA C244) Laboratories: None Tutorials: None Course Description This course provides an introduction to data mining and R programming, suited for science students. Data mining methods include a vast set of tools developed in different areas for identifying the patterns in data. Students will learn programming methods for manipulating and exploring data through learning the basic ideas of some clustering, regression and classification methods. No prior programming knowledge is assumed. Course Prerequisites MATH 1000 and either STAT/MATH 1060 or STAT/MATH 2060 Course Objectives/Learning Outcomes Explain the key differences between the tasks of classification, clustering, regression, and dimensionality reduction Identify the key differences between supervised and unsupervised learning paradigms Explain how noisy observations affect the result of data mining methods Recognize the concept of class imbalance when constructing classifiers Design data mining experiments using R and existing data mining tools Apply the Nearest Neighbours method for supervised learning tasks Estimate the effects of hyperparameters on the resulting performance of data mining methods 1

Propose a suitable visualization design for a particular combination of data characteristics and application tasks Write a reasonably-complex (100-150 line) modular procedural scripts with the R language to solve common data tasks Apply file-operations on given data sets for reading and writing Explain and use the concept of loops to perform repetitive tasks Develop and use arithmetic expressions comprising arithmetic operators, constants, and variables Explain what is an algorithm Design (reusable) functions to divide the solution of a problem into simpler steps Manipulate and interpret the the data frame in R Explain and use the concept of conditional structures to perform decision-making Apply the CART-based decision tree learning method for supervised learning tasks Explain the model complexity with regards to the bias-variance trade-off Explain the concepts over-fitting and under-fitting Apply the K-fold cross-validation and hold-out validation techniques for assessing the performance of a predicitve model Apply the grid search method for hyperparameter optimization Recognize how to evaluate the performance of predictive models using R 2 and classification accuracy Explain how support vector machines discover an optimal hyperplane for classificationbased tasks Interpret kernel methods can be applied to solve non-linear problems using linear methods Discuss how to introduce a soft-margin on support vector machine with the cost hyperparameter Apply the single-layer perceptron learning algorithm for constructing a classifier Describe the backpropagation algorithm for training the weights of a feed-forward neural network Explain the effects of momentum and early-stopping while training neural networks 2

Discuss the implications of the universal approximation theorem Explain the procedure for creating a bagged learning ensemble using bootstrap sampling Elaborate on the processes taken by the random forest learning algorithm for supervised learning tasks Explain how random forests can be used for analyzing feature importance Discuss implications of the No Free Lunch Theorem in the context of data mining Apply the DBSCAN clustering algorithm for discovering density-based clusters Apply the K-means algorithm for discovering centroid-based clusters Apply principal component analysis to project data onto lower dimensions Course Materials Textbook: Introduction to Statistical Learning with Applications in R (Second Edition) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani published by Springer, 2009 Course website: http://web.cs.dal.ca/~kallada/stat2450/ Course Assessment Component Weight (% of final grade) Date Final Exam 35 Scheduled by Registrar Assignments 65 6 8 assignments, approximately bi-weekly Other Course Requirements Conversion of numerical grades to Final Letter Grades follows the Dalhousie Common Grade Scale A+ (90 100) B+ (77 79) C+ (65 69) D (50 54) A (85 89) B (73 76) C (60 64) D < 50 A- (80 84) B- (70 72) C- (55 59) D (50 54) 3

Course Policies Credit cannot be given for late assignments. ACCOMMODATION POLICY FOR STUDENTS Students may request accommodation as a result of barriers related to disability, religious obligation, or any characteristic protected under Canadian Human Rights legislation. The full text of Dalhousies Student Accommodation Policy can be accessed here: http://www.dal.ca/dept/university_secretariat/policies/academic/student-accommodationpolicy-wef-sep--1--2014.html Students who require accommodation for classroom participation or the writing of tests and exams should make their request to the Advising and Access Services Centre (AASC) prior to or at the outset of the regular academic year. More information and the Request for Accommodation form are available at www.dal.ca/access ACADEMIC INTEGRITY Academic integrity, with its embodied values, is seen as a foundation of Dalhousie University. It is the responsibility of all students to be familiar with behaviours and practices associated with academic integrity. Instructors are required to forward any suspected cases of plagiarism or other forms of academic cheating to the Academic Integrity Officer for their Faculty. The Academic Integrity website (http://academicintegrity.dal.ca) provides students and faculty with information on plagiarism and other forms of academic dishonesty, and has resources to help students succeed honestly. The full text of Dalhousies Policy on Intellectual Honesty and Faculty Discipline Procedures is available here: http://www.dal.ca/dept/university_secretariat/academic-integrity/academic-policies. html STUDENT CODE OF CONDUCT Dalhousie University has a student code of conduct, and it is expected that students will adhere to the code during their participation in lectures and other activities associated with this course. In general: The University treats students as adults free to organize their own personal lives, behaviour and associations subject only to the law, and to University regulations that are necessary to protect the integrity and proper functioning of the academic and nonacademic programs and activities of the University or its faculties, schools or departments; the peaceful and safe enjoyment of University facilities by other members of the University and the public; 4

the freedom of members of the University to participate reasonably in the programs of the University and in activities on the University s premises; the property of the University or its members. The full text of the code can be found here: http://www.dal.ca/dept/university_secretariat/policies/student-life/code-ofstudent-conduct.html SERVICES AVAILABLE TO STUDENTS The following campus services are available to help students develop skills in library research, scientific writing, and effective study habits. The services are available to all Dalhousie students and, unless noted otherwise, are free. 5

Service Support Provided Location Contact General Academic Advising Ground floor Rm G28 Help with - understanding degree requirements and academic regulations - choosing your major - achieving your educational or career goals - dealing with academic or other difficulties Bissett Centre for Academic Success In person: Rm G28 By appointment: - e-mail: advising@dal.ca - Phone: (902) 494-3077 - Book online through MyDal Dalhousie Libraries Studying for Success (SFS) Writing Centre Help to find books and articles for assignments Help with citing sources in the text of your paper and preparation of bibliography Help to develop essential study skills through small group workshops or oneon-one coaching sessions Match to a tutor for help in course-specific content (for a reasonable fee) Meet with coach/tutor to discuss writing assignments (e.g., lab report, research paper, thesis, poster) - Learn to integrate source material into your own work appropriately - Learn about disciplinary writing from a peer or staff member in your field Ground floor Librarian offices 3rd floor Coordinator Rm 3104 Study Coaches Rm 3103 Ground floor Learning Commons & Rm G25 In person: Service Point (Ground floor) By appointment: Identify your subject librarian (URL below) and contact by email or phone to arrange a time: http://dal.beta.libguides.com/ sb.php?subject_id=34328 To make an appointment: - Visit main office ( main floor, Rm G28) - Call (902) 494-3077 - email Coordinator at: sfs@dal.ca or - Simply drop in to see us during posted office hours All information can be found on our website: www.dal.ca/sfs To make an appointment: - Visit the Centre (Rm G25) and book an appointment - Call (902) 494-1963 - email writingcentre@dal.ca - Book online through MyDal We are open six days a week See our website: writingcentre.dal.ca 6