City University of Hong Kong Course Syllabus. offered by Department of Computer Science with effect from Semester B 2017/18

Similar documents
City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

City University of Hong Kong Course Syllabus. offered by School of Law with effect from Semester A 2015/16

Python Machine Learning

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Mining Association Rules in Student s Assessment Data

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A Case Study: News Classification Based on Term Frequency

CS Machine Learning

STA 225: Introductory Statistics (CT)

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Rule Learning With Negation: Issues Regarding Effectiveness

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Mining Student Evolution Using Associative Classification and Clustering

Issues in the Mining of Heart Failure Datasets

Probability and Statistics Curriculum Pacing Guide

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Probabilistic Latent Semantic Analysis

Bachelor Programme Structure Max Weber Institute for Sociology, University of Heidelberg

Rule Learning with Negation: Issues Regarding Effectiveness

Honors Mathematics. Introduction and Definition of Honors Mathematics

Australian Journal of Basic and Applied Sciences

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Abstractions and the Brain

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

NATIONAL SURVEY OF STUDENT ENGAGEMENT (NSSE)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Data Structures and Algorithms

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Reducing Features to Improve Bug Prediction

Lecture 1: Machine Learning Basics

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Multivariate k-nearest Neighbor Regression for Time Series data -

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Generative models and adversarial training

Axiom 2013 Team Description Paper

Learning From the Past with Experiment Databases

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Cross Language Information Retrieval

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

School of Innovative Technologies and Engineering

Welcome to. ECML/PKDD 2004 Community meeting

Human Emotion Recognition From Speech

Preference Learning in Recommender Systems

Evolutive Neural Net Fuzzy Filtering: Basic Description

Math Placement at Paci c Lutheran University

EGRHS Course Fair. Science & Math AP & IB Courses

Lecture 1: Basic Concepts of Machine Learning

An application of student learner profiling: comparison of students in different degree programs

Content-based Image Retrieval Using Image Regions as Query Examples

Diploma in Library and Information Science (Part-Time) - SH220

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Syllabus Education Department Lincoln University EDU 311 Social Studies Methods

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Multi-Lingual Text Leveling

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

NATIONAL SURVEY OF STUDENT ENGAGEMENT

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

WHEN THERE IS A mismatch between the acoustic

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Master s Programme in European Studies

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Assignment 1: Predicting Amazon Review Ratings

Instructor: Matthew Wickes Kilgore Office: ES 310

Statistics and Data Analytics Minor

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Julia Smith. Effective Classroom Approaches to.

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

The Extend of Adaptation Bloom's Taxonomy of Cognitive Domain In English Questions Included in General Secondary Exams

Speech Emotion Recognition Using Support Vector Machine

MASTER OF PHILOSOPHY IN STATISTICS

Seminar - Organic Computing

K-Medoid Algorithm in Clustering Student Scholarship Applicants

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

On-the-Fly Customization of Automated Essay Scoring

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

District Advisory Committee. October 27, 2015

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Matching Similarity for Keyword-Based Clustering

Sample Performance Assessment

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

CSL465/603 - Machine Learning

COMPARISON OF TWO SEGMENTATION METHODS FOR LIBRARY RECOMMENDER SYSTEMS. by Wing-Kee Ho

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Ensemble Technique Utilization for Indonesian Dependency Parser

Introduction to Simulation

Transcription:

City University of Hong Kong offered by Department of Computer Science with effect from Semester B 2017/18 Part I Course Overview Course Title: Fundamentals of Data Science Course Code: CS3481 Course Duration: One semester Credit Units: 3 credits Level: Proposed Area: (for GE courses only) Medium of Instruction: Medium of Assessment: Prerequisites: Precursors: Equivalent Courses: Exclusive Courses: B3 Arts and Humanities Study of Societies, Social and Business Organisations Science and Technology English English CS2204 Fundamentals of Internet Applications Development Nil Nil CS4483 Data Warehousing and Data Mining

Part II Course Details 1. Abstract (A 150-word description about the course) This course aims to explore the important field of data science. The syllabus covers the main techniques in statistical data modelling, and algorithms in data science, which include predictive modelling, cluster analysis, association rule mining and text mining. In addition, different applications of data science techniques in the real world such as web mining, business analytics and health informatics will be discussed. 2. Course Intended Learning Outcomes (CILOs) (CILOs state what the student is expected to be able to do at the end of the course according to a given standard of performance.) No. CILOs # Weighting* (if applicable) 1. Identify the main characteristics of different techniques in data science through observation of their operations. Discovery-enriched curriculum related learning outcomes (please tick where appropriate) A1 A2 A3 2. Perform a critical assessment of current techniques in data science. 3. Implement the main algorithms in data science in a computationally efficient way. 4. Propose new solutions for real world information analytics problems by improving and combining current data science * If weighting is assigned to CILOs, they should add up to 100%. 100% # Please specify the alignment of CILOs to the Gateway Education Programme Intended Learning outcomes (PILOs) in Section A of Annex. A1: Attitude Develop an attitude of discovery/innovation/creativity, as demonstrated by students possessing a strong sense of curiosity, asking questions actively, challenging assumptions or engaging in inquiry together with teachers. A2: Ability Develop the ability/skill needed to discover/innovate/create, as demonstrated by students possessing critical thinking skills to assess ideas, acquiring research skills, synthesizing knowledge across disciplines or applying academic knowledge to self-life problems. A3: Accomplishments Demonstrate accomplishment of discovery/innovation/creativity through producing /constructing creative works/new artefacts, effective solutions to real-life problems or new processes.

3. Teaching and Learning Activities (TLAs) (TLAs designed to facilitate students achievement of the CILOs.) Teaching pattern: Suggested lecture/tutorial/laboratory mix: 2 hrs. lecture; 1 hr. tutorial. TLA Brief Description CILO No. Hours/week (if applicable) 1 2 3 4 Lecture This course will focus on introducing the 2 hrs/wk fundamental and state-of-the-art techniques in data science. Tutorial Students will work on a set of problems on the principles and applications of data science, and present their solutions in the class. 1 hr/wk Assignments/Project The completion of the assignments/projects gives students an opportunity to implement existing algorithms in data science in a computationally efficient way, and allows them to create new designs for information analytics systems. 6 hrs/wk for 6 weeks 4. Assessment Tasks/Activities (ATs) (ATs are designed to assess how well the students achieve the CILOs.) Assessment Tasks/Activities CILO No. Weighting* Remarks 1 2 3 4 Continuous Assessment: 50% Assignments/Projects 30% Mid-term Examination 20% Examination^: 50% (duration: 2 hours) * The weightings should add up to 100%. 100% ^ For a student to pass the course, at least 30% of the maximum mark for the examination must be obtained.

5. Assessment Rubrics (Grading of student achievements is based on student performance in assessment tasks/activities with the following rubrics.) Assessment Task Criterion Excellent (A+, A, A-) 1. Assignments/Project 1.1 Capacity for effectively implementing data science algorithms in a computationally efficient way. 1.2 Capability to create new solutions for real world information analytics problems by improving and combining different data science Good (B+, B, B-) Fair (C+, C, C-) Marginal (D) Failure (F) 2. Mid-term Examination 2.1 Ability to explain in detail the principles of different data science 2.2 Capability to correctly apply a suitable data science technique to solve an information analytics problem. 3. Examination 3.1 Capacity for understanding the main characteristics of different data science techniques in depth. 3.2 Capability to perform a critical assessment of current data science 3.3 Ability to integrate different data science techniques for addressing real world information analytics problems. Jun 2017 4

Part III Other Information (more details can be provided separately in the teaching plan) 1. Keyword Syllabus (An indication of the key topics of the course.) Data pre-processing, statistical data modelling, predictive modelling, classifier evaluation, cluster analysis, association rule mining, text mining. Syllabus 1. Knowledge discovery process Introduction of the knowledge discovery process in three stages: data pre-processing, data mining, and knowledge representation. Basic data pre-processing techniques including data cleaning, selection, integration, transformation and reduction will be discussed. 2. Statistical data modelling Introduction of fundamental concepts of statistical data modelling, which include random variables, probability distribution functions, probability density functions, covariance matrix, correlation coefficient, linear regression, sampling, statistical inference and multivariate statistical analysis. 3. Predictive modelling Introduction of the main predictive modelling techniques for data science, which include decision tree, nearest neighbour classifier probabilistic classification, and connectionist models. In addition, the issues of classification performance evaluation and model selection will be discussed. 4. Cluster analysis Introduction of the main clustering techniques: partitional, hierarchical, and density-based clustering. Important algorithms such as k-means, agglomerative hierarchical clustering, and DBSCAN will be discussed. Related issues in outlier analysis and detection will be introduced. 5. Association rule mining Introduction of the Apriori algorithm for frequent pattern mining and association rule mining, and the comparison of different measures for evaluating the association patterns. Mining of frequent patterns in data streams will also be discussed. 6. Text mining Introduction of the vector space model for document representation, the term frequency-inverse document frequency (tf-idf) approach for term weighting, and proximity measures such as cosine similarity for document comparison. Different algorithms in text mining such as document clustering and text classification will also be discussed. 2. Reading List 2.1 Compulsory Readings (Compulsory readings can include books, book chapters, or journal/magazine articles. There are also collections of e-books, e-journals available from the CityU Library.) 1. Tan P. N., Steinbach M. and Kumar V. (2018) Introduction to Data Mining. Addison Wesley, 2 nd edition. 2.2 Additional Readings (Additional references for students to learn to expand their knowledge about the subject.) 1. Bramer M. (2013) Principles of Data Mining. Springer, 2 nd edition. 2. Han J. and Kamber M. (2011) Data Mining: Concepts and Techniques. Morgan Kaufmann, 3 rd edition. 3. Witten I., Frank E. Hall M. and Pal C. (2016) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 4 th edition. Jun 2017 5