Data Mining (CSE 572) About this Course. Required Prior Knowledge and Skills. Specific topics covered include:

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Methods for Fuzzy Systems

Laboratorio di Intelligenza Artificiale e Robotica

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

Rule Learning With Negation: Issues Regarding Effectiveness

Human Emotion Recognition From Speech

Multidisciplinary Engineering Systems 2 nd and 3rd Year College-Wide Courses

CSL465/603 - Machine Learning

CS Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Australian Journal of Basic and Applied Sciences

Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

Knowledge-Based - Systems

Axiom 2013 Team Description Paper

Harness the power of public media and partnerships for the digital age. WQED Multimedia Strategic Plan

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Texas Healthcare & Bioscience Institute

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Rule Learning with Negation: Issues Regarding Effectiveness

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Word Segmentation of Off-line Handwritten Documents

Journal title ISSN Full text from

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lecture 1: Basic Concepts of Machine Learning

Georgetown University at TREC 2017 Dynamic Domain Track

Mining Association Rules in Student s Assessment Data

Top US Tech Talent for the Top China Tech Company

Reducing Features to Improve Bug Prediction

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Generative models and adversarial training

Assignment 1: Predicting Amazon Review Ratings

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Software Maintenance

Learning From the Past with Experiment Databases

Lecture 10: Reinforcement Learning

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Skillsoft Acquires SumTotal: Frequently Asked Questions. October 2014

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

CS 446: Machine Learning

University of Alabama in Huntsville

Artificial Neural Networks written examination

An Introduction to the Minimalist Program

Online Master of Business Administration (MBA)

The Teaching and Learning Center

For the Ohio Board of Regents Second Report on the Condition of Higher Education in Ohio

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

give every teacher everything they need to teach mathematics

Shared Portable Moodle Taking online learning offline to support disadvantaged students

Welcome. Paulo Goes Dean, Eller College of Management Welcome Our region

Time series prediction

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Introductory Astronomy. Physics 134K. Fall 2016

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

MYCIN. The MYCIN Task

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Mining Student Evolution Using Associative Classification and Clustering

Automating the E-learning Personalization

An OO Framework for building Intelligence and Learning properties in Software Agents

Research computing Results

Seminar - Organic Computing

Centre for Excellence Elite Sports Program

North Carolina Information and Technology Essential Standards

Self Study Report Computer Science

POLICE COMMISSIONER. New Rochelle, NY

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Community Rhythms. Purpose/Overview NOTES. To understand the stages of community life and the strategic implications for moving communities

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Speech Recognition at ICSI: Broadcast News and beyond

University of the Arts London (UAL) Diploma in Professional Studies Art and Design Date of production/revision May 2015

Probabilistic Latent Semantic Analysis

On-Line Data Analytics

An Introduction to Simio for Beginners

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

RWTH Aachen University

Guide to Teaching Computer Science

A Comparison of Two Text Representations for Sentiment Analysis

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Circuit Simulators: A Revolutionary E-Learning Platform

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Nanotechnology STEM Program via Research Experience for High School Teachers

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Nottingham Trent University Course Specification

Applications of memory-based natural language processing

Kenya: Age distribution and school attendance of girls aged 9-13 years. UNESCO Institute for Statistics. 20 December 2012

Oregon Institute of Technology Computer Systems Engineering Technology Department Embedded Systems Engineering Technology Program Assessment

Transcription:

(CSE 572) Note: Below outline is subject to modifications and updates. About this Course Once called knowledge discovery in databases, advances in processing power and speed over the last decade have allowed users to move beyond manual, tedious, and time-consuming practices to quick, easy data analysis that harnesses the power of machine learning and highperformance computing. This course will introduce you to the fundamentals of data mining and pattern recognition. You will gain a deeper understanding of data through hands-on experience in the topic areas of big data analysis, classification, clustering, and association rule mining. Advanced topics such as reinforcement learning, deep learning, transfer learning and Deep Mind for Google will also be covered. By the end of the course, you will be able to apply state of the art data mining technology to real world applications, analyze and compare competing techniques, and design optimal solutions for a given set of application driven constraints. Specific topics covered include: Fundamentals Machine Learning Data Collection Deep Learning Data Visualization Reinforcement Learning Algorithms Required Prior Knowledge and Skills Intermediate understanding of core concepts of data mining Basics of statistics Programming (language such as Python or MATLAB) Lead: Ayan Banerjee, Ph.D. Updated 02/2019 1

Learning Outcomes Learners completing this course will be able to: Differentiate among major data mining techniques such as classification, cluster analysis, and association rule mining Apply common data mining algorithms to discover relationships and patterns in large datasets Implement more advanced learning algorithms such as deep learning and reinforcement learning Utilize open source tools to build a data mining project to solve a specific problem Projects Project 1: Activity Recognition Project 2: User Dependent Analysis Course Content Instruction Video lectures Office hours Live sessions with instructional team Knowledge checks Practice quizzes Discussion questions Assessment Graded quizzes (auto-graded) Individual project(s) (instructor-graded) Midterm exam (auto-graded) Final exam (auto-graded) Estimated Workload/Time Commitment Per Week Approximately 20 hours per week. Technology Requirements Hardware Standard with major OS Software and Other Standard technology integrations will be provided through Coursera Lead: Ayan Banerjee, Ph.D. Updated 02/2019 2

Course Outline Unit 1: Core Concepts of 1.1 Explain the history and purpose of data mining across multiple disciplines 1.2 Differentiate what is and what is not data mining 1.3 Describe different data mining tasks 1.4 Recognize attributes of data needed for data mining 1.5 Review and summarize data exploration techniques for use in initial data analysis Welcome and Start Here Module 1: History and Purpose of Module 2: Data Attributes Needed for Module 3: Review of Initial Data Exploration Techniques Assignment: Activity Recognition Direction Unit 2: Existing Techniques: Classification 2.1 Define classification and classification applications 2.2 Compare and contrast common classification techniques 2.3 Apply common algorithms used in data mining Module 1: Introduction to Classification Module 2: Introduction to Classification Tasks Module 3: Classification Issues Unit 3: Alternative Classification Techniques 3.1 Define Instance Based Classifiers 3.2 Use the basics of probability theory to calculate the Bayes Classifier 3.3 Use the probability estimation to calculate Naive Bayes Classifier 3.4 Recognize the basic structure of Neural Networks 3.5 Identify the Perceptron learning algorithm 3.6 Recall the Artificial Neural Networks learning model 3.7 Explain the underlying concepts behind support vector machines and why they work Lead: Ayan Banerjee, Ph.D. Updated 02/2019 3

Module 1: Alternative Techniques Module 2: Artificial Neural Networks Module 3: Support Vector Machines Unit 4: Clustering 4.1 Define cluster analysis 4.2 Differentiate what is and what is not cluster analysis 4.3 Categorize different types of clusters 4.4 Use common algorithmic measures to evaluate clusters 4.5 Analyze DB Scan in relation to other clustering methods Module 1: K Means Clustering Module 2: Hierarchical Clustering Module 3: Cluster Validity Module 4: DBSCAN Unit 5: Association Rule Mining 5.1 Apply the Mining Association Rules to discover relationships in large datasets 5.2 Use inferencing techniques to analyze association rule analysis results 5.3 Identify ways to reduce the computational complexity of frequent itemset generation 5.4 Describe how to efficiently generate rules from frequent datasets Module 1: Introduction to Basic Concepts of Association Rule Mining Module 2: Apriori Principle Unit 6: Big Data Tools 6.1 Describe components that comprise deep learning 6.2 Implement a deep neural network using common tools such as Keras or Theano 6.3 Describe the structure and usage of Restricted Boltzmann Machines 6.4 Design Restricted Boltzman Machine algorithm to create a movie recommendation application 6.5 Describe the structure of deep autoencoders, and describe different application scenarios where they can be used 6.6 Apply deep autoencoders to on a sample data to derive low dimensional representations Lead: Ayan Banerjee, Ph.D. Updated 02/2019 4

6.7 Compare open source tools that allow for fast implementation of data mining tasks Module 1: Deep Learning Introduction Unit 7: Reinforcement Learning 7.1 Describe agents, environments, states, actions and rewards that comprise reinforcement learning 7.2 Describe a Markov decision process and how it is different from Markov chains 7.3 Describe the difference between an MDP learning and reinforcement learning 7.4 Describe usage of reinforcement learning in automatically solving Atari games 7.5 Describe slate MDP and its difference from MDP 7.6 Describe the usage of attention in Reinforcement Learning 7.7 Describe the usage of Reinforcement Learning in a commercially used recommendation system (Deep Mind) 7.8 Describe multiple ways to solve a Reinforcement Learning problem Module 1: Reinforcement Learning Module 2: Markov Decision Process Module 3: Solving Reinforcement Learning Problems Unit 8: Course Wrap-Up 8.1 Complete the final exam 8.2 *Optional: Complete and submit your Portfolio Inclusion Report Lead: Ayan Banerjee, Ph.D. Updated 02/2019 5

About ASU Established in Tempe in 1885, Arizona State University (ASU) has developed a new model for the American Research University, creating an institution that is committed to access, excellence and impact. As the prototype for a New American University, ASU pursues research that contributes to the public good, and ASU assumes major responsibility for the economic, social and cultural vitality of the communities that surround it. Recognizing the university s groundbreaking initiatives, partnerships, programs and research, U.S. News and World Report has named ASU as the most innovative university all three years it has had the category. The innovation ranking is due at least in part to a more than 80 percent improvement in ASU s graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university in the country and the emphasis on inclusion and student success that has led to more than 50 percent of the school s in-state freshman coming from minority backgrounds. About Ira A. Fulton Schools of Engineering Structured around grand challenges and improving the quality of life on a global scale, the Ira A. Fulton Schools of Engineering at Arizona State University integrates traditionally separate disciplines and supports collaborative research in the multidisciplinary areas of biological and health systems; sustainable engineering and the built environment; matter, transport and energy; and computing and decision systems. As the largest engineering program in the United States, students can pursue their educational and career goals through 25 undergraduate degrees or 39 graduate programs and rich experiential education offerings. The Fulton Schools are dedicated to engineering programs that combine a strong core foundation with top faculty and a reputation for graduating students who are aggressively recruited by top companies or become superior candidates for graduate studies in medicine, law, engineering and science. About the School of Computing, Informatics, & Decision Systems Engineering The School of Computing, Informatics, and Decision Systems Engineering advances developments and innovation in artificial intelligence, big data, cybersecurity and digital forensics, and software engineering. Our faculty are winning prestigious honors in professional societies, resulting in leadership of renowned research centers in homeland security operational efficiency, data engineering, and cybersecurity and digital forensics. The school s rapid growth of student enrollment isn t limited to the number of students at ASU s Tempe and Polytechnic campuses as it continues to lead in online education. In addition to the Online Master of Computer Science, the school also offers an Online Bachelor of Science in Software Engineering, and the first four- year, completely online Bachelor of Science in Engineering program in engineering management. Lead: Ayan Banerjee, Ph.D. Updated 02/2019 6

AOL-5344 Creators Dr. Banerjee is an Assistant Research Professor at School of Computing Informatics and Decision Systems Engineering, Arizona State University. His research interests include pervasive computing in healthcare and analysis, safety verification of embedded system software. Dr. Banerjee currently focuses on data driven analysis and modeling in many different domains including diet monitoring, gesture recognition, and biological process modeling. He works closely with government agencies such as Food and Drug Administration and medical agencies such as Mayo Clinic. Dr. Banerjee is also interested in hybrid system-based modeling and safety verification of closed loop control systems which interact with the physical environment, also known as Cyber-Physical Systems. In addition, his work includes developing management algorithms for sustainable data centers using renewable sources of energy. Scalable Data Processing Lead: Mohamed Sarwat, Ph.D. Updated 12/28/2017 7