MIS 464 DATA ANALYTICS - Spring Hsinchun Chen, Professor, Department of MIS

Similar documents
CSL465/603 - Machine Learning

Top US Tech Talent for the Top China Tech Company

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

95723 Managing Disruptive Technologies

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

On-Line Data Analytics

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

BA 130 Introduction to International Business

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Linking Task: Identifying authors and book titles in verbose queries

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Social Media Marketing BUS COURSE OUTLINE

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Social Media Journalism J336F Unique Spring 2016

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Ryerson University Sociology SOC 483: Advanced Research and Statistics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

Assignment 1: Predicting Amazon Review Ratings

Introduction to CS 100 Overview of UK. CS September 2015

Generative models and adversarial training

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

DOUBLE DEGREE PROGRAM AT EURECOM. June 2017 Caroline HANRAS International Relations Manager

An OO Framework for building Intelligence and Learning properties in Software Agents

Human Emotion Recognition From Speech

Data Structures and Algorithms

Learning Methods for Fuzzy Systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Applications of data mining algorithms to analysis of medical data

Welcome to. ECML/PKDD 2004 Community meeting

Class Mondays & Wednesdays 11:00 am - 12:15 pm Rowe 161. Office Mondays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Organizational Knowledge Distribution: An Experimental Evaluation

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Social Media Journalism J336F Unique ID CMA Fall 2012

Natural Language Processing. George Konidaris

Biscayne Bay Campus, Marine Science Building (room 250 D)

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

International Environmental Policy Spring :374:315:01 Tuesdays, 10:55 am to 1:55 pm, Blake 131

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

THE GEORGE WASHINGTON UNIVERSITY Department of Economics. ECON 1012: PRINCIPLES OF MACROECONOMICS Prof. Irene R. Foster

A Comparison of Two Text Representations for Sentiment Analysis

arxiv: v1 [cs.lg] 15 Jun 2015

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Counseling 150. EOPS Student Readiness and Success

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

ECON 442: Economic Development Course Syllabus Second Semester 2009/2010

CS/SE 3341 Spring 2012

Mining Association Rules in Student s Assessment Data

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

MGMT 5303 Corporate and Business Strategy Spring 2016

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Course Syllabus It is the responsibility of each student to carefully review the course syllabus. The content is subject to revision with notice.

T Seminar on Internetworking

Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics

Extracting Verb Expressions Implying Negative Opinions

AST Introduction to Solar Systems Astronomy

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS 598 Natural Language Processing

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Syllabus: CS 377 Communication and Ethical Issues in Computing 3 Credit Hours Prerequisite: CS 251, Data Structures Fall 2015

Text-mining the Estonian National Electronic Health Record

A Neural Network GUI Tested on Text-To-Phoneme Mapping

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

CS 446: Machine Learning

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

MIT Sloan School of Management / Marketing Management, Spring 2017

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Business Computer Applications CGS 1100 Course Syllabus. Course Title: Course / Prefix Number CGS Business Computer Applications

Universidade do Minho Escola de Engenharia

General Microbiology (BIOL ) Course Syllabus

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Course Content Concepts

Switchboard Language Model Improvement with Conversational Data from Gigaword

Indian Institute of Technology, Kanpur

Information System Design and Development (Advanced Higher) Unit. level 7 (12 SCQF credit points)

Word Embedding Based Correlation Model for Question/Answer Matching

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Transcription:

MIS 464 DATA ANALYTICS - Spring 2019 Hsinchun Chen, Professor, Department of MIS Instructor: Hsinchun Chen, Ph.D., Professor, Management Information Systems Dept, Eller College of Management, University of Arizona Time/Classroom: T/TH 9:30AM-10:45AM MCCL 126 Instructor s Office Hours: T/TH 2:00-3:00PM or by appointment Office/Phone: MCCL 430X, (520) 621-4153 Email/Web site: hchen@eller.arizona.edu; https://ai.arizona.edu/about/director (email is the best way to reach me!) Class Web site: http://ai.eller.arizona.edu/hchen/mis464/ (VERY IMPORTANT!) Teaching Assistants (TAs): Shuo Yu, shuoyu@email.arizona.edu, Ph.D. student (office: MCCL 430 Cubical #34-35) Hongyi Zhu, zhuhy@email.arizona.edu, Ph.D. student (office: MCCL 430 Cubical #36-37) TA Office Hours: TA hours will be announced via email. CLASS MATERIAL (Optional) Data Mining: Practical Machine Learning Tools and Techniques, by Witten, Frank, Hall & Pal, 4 th Edition, 2017, Morgan Kaufmann (also with a 5-week MOOC course). See more at: http://www.cs.waikato.ac.nz/ml/weka/ Artificial Intelligence: A Modern Approach, by Russel & Norvig, 3 rd Edition, 2000, Prentice Hall Deep Learning, by Goodfellow, Bengio & Courville, 2016, MIT Press Additional readings and handouts will be distributed in class and made available through the class web site. COURSE OBJECTIVES Business intelligence and analytics and the related field of big data analytics have become increasingly important in both the academic and the business communities over the past two decades. The IBM Tech Trends Report identified business analytics as one of the four major technology trends in the 2010s and beyond. A report by the McKinsey Global Institute predicted that by 2018, the United States alone will face a shortage of 140,000 to 190,000 people with deep data analytical skills, as well as a shortfall of 1.5 million data-savvy managers with the know-how to analyze big data to make effective decisions. Big data and data science have begun to transform different facets of the society, from e-commerce and global logistics, to smart health and cyber security. This undergraduate senior level course (elective) will cover the important concepts and techniques relating to data analytics, including: statistical foundation, data mining methods, data visualization, AI, deep learning, and web mining techniques that are applicable to emerging e-commerce, government, and health and security applications. The course contains lectures, readings, lab sessions, and hands-on projects. Most business school seniors are welcome. The course will require some basic computing and database background. The course will prepare students to become a data scientist or a data-savvy manager for different businesses. PREREQUISITE FOR THE COURSE Programming experience in selected modern computing languages (e.g., Java, C, C++, Python) and DBMS (SQL). 1

COURSE TOPICS (selected topics will be covered) Topic 1: Introduction (the field of MIS & CS) From computational design science in MIS to applied data science in CS Business intelligence and analytics, opportunities & techniques Emerging AI applications, from face recognition to autonomous vehicle Data, text and web mining overview: AI, ML, deep learning Data mining and web computing tools (by TAs): Weka, Tableau, Hadoop, SPARK Topic 2: Web Mining (the changing world) Web 1.0, 1995-: WWW, search engines, surface web, spidering, graph search, genetic algorithms Web 2.0, 2005-: deep web, web services & mesh-ups, social media, crowdsourcing systems, network sciences Web 3.0, 2010-: IoTs, mobile & cloud computing, big data analytics, dark web, mobile analytics, cybersecurity Web 4.0, 2015-: AI-empowered society, image/face, translation, drones, autonomous vehicles, health, security Topic 3: Data Mining (the analytics techniques) Symbolic learning: decision trees, random forest Statistical analysis: regression, principal component analysis, Naïve Bayes Statistical machine learning: Support Vector Machines, Hidden Markov Models, Conditional Random Fields Neural networks and soft computing: feedforward networks, self-organizing maps, genetic algorithms Network Analysis: social network analysis, graph models Deep learning: Convolutional NN, Recurrent NN, Long Short-Term Memory Representation Learning: Transfer Learning, Deep Generative Models Topic 4: Text Mining (handling unstructured text) Digital library and search engines Information retrieval & extraction: vector space model, entity & topic extraction Authorship analysis: lexical, syntactic, structural, and semantic analysis Sentiment and affect analysis: lexicon-based, machine learning based Information visualization: scientific, text and web visualization Topic 5: Emerging Research in Data and Web Mining (major conferences, groups, opportunities) Emerging research in major data and web mining conferences: ACM KDD, IEEE ICDM, WWW, ACM SIGIR, ACM CHI, AAAI, IJCAI, ICML, NIPS, ICLR Key journals: MISQ, ISR, IEEE TKDE, JAMIA, JBI, JASIST Emerging research in major academic institutions: Stanford, Berkeley, CMU, MIT Emerging research in major industry research labs: Google, Facebook, Amazon, Baidu, Microsoft Emerging data and web mining applications: health, security, e-commerce, AV, drones, robotics 2

GRADING POLICY Project proposal 5% Extra credit assignments 10% Midterm exam 30% Review paper 15% Research project 40% Class attendance and participation 10% TOTAL 110% MIDTERM EXAM (30%) The midterm exam will be closed book, closed notes and in the short-essay format (8-10 questions). The questions will be based mostly on classroom lectures. There will be NO Final Exam for this class. Academic integrity will be strictly enforced. Consequence for cheating will be severe. REVIEW PAPER PRESENTATION AND PROPOSAL (20%) Each student will be required to form a two-person team. Each team will select an emerging data analytics topic of interest and develop a comprehensive review paper (5 pages, IEEE format) for the topic. Secondary literature review will be needed based on recent papers published in press, magazines, conferences, and journals. Each team (both students) will be required to present their review in the second half of the semester (10 minutes each). The instructor will suggest selected emerging topics for consideration. A paper review and project proposal will be needed in the third week of the semester. EXTRA CREDIT ASSIGNMENTS (10%) In order to improve students hands-on data analytics knowledge and to facilitate final project execution, we are adding two Extra Credit Assignments in this semester the first on Tableau and the second on Weka. Each team is required to identify 1-2 public data sources (e.g., data.gov, Kaggle) in the application area of their final Research Project (e.g., security, health, finance, e- commerce) and execute selected meaningful data exploration/visualization or analytics functions. Each assignment is worth 5% of final grade. A team report summarizing results with screen shots (5 pages, IEEE format) needs to be submitted in two weeks for each assignment. No literature review is required. RESEARCH PROJECT PRESENTATION AND PAPER (40%) Each team will be required to propose and execute an interesting data-driven research project in data analytics for applications of interest to the students. The instructor will suggest suitable data and algorithms for consideration. The class TAs will also provide assistance in data preparation and analytics using selected open source tools. Each team (both students) will present at the end of the semester (15 minutes) and a final research paper (8 pages, IEEE format) will be submitted after all presentation sessions. The instructor will provide details about the final paper format and structure. Students are expected to gain significant hands-on data analytics experience through the project. LECTURES, ATTENDANCE, AND ACADEMIC INTEGRITY Students are required to attend all lectures on time and honor academic integrity. Missing classes will result in loss of points or administrative drop by the instructor. Students are required to send excuse notes (via email) to the instructor before missing classes. Students are permitted to bring 3

laptop to classroom for note taking purposes, but not for checking email or web surfing. Professional attitude and strong work ethics are needed for this class. Students are encouraged to consult the instructor for advice and help. LAB SESSIONS and GUEST SPEAKERS Selected lab sessions will be provided during the semester on the following topics: web services, cloud computing platforms, Tableau, Weka, etc. Selected guest speakers will present in the class 4

COURSE OUTLINE (tentative) DATE TOPIC CONTENT/NOTES Jan 10 Syllabus & registration Class roster, syllabus Jan 15 (T) MIS, CS, design science Readings, discussions Jan 17 Big data, applications Readings, discussions Jan 22 (T) BI, data analytics Readings, discussions Jan 24 AI, deep learning Readings, discussions PROPOSAL DUE (REVIEW & RESEARCH, 5%) Jan 29 (T) Web Computing & Mining Jan 31 Tableau, Cloud, Hadoop, SPARK TA session Feb 5 (T) Web 1.0, Surface Web Feb 7 Search engine, graph search Readings, lecture Feb 12 (T) Web 2.0, Social Web Feb 14 Deep web, social media, SNA Readings, lecture Feb 19 (T) Web 3.0, Mobile Web, IoT, dark web ASSIGNMENT 1 DUE (TABLEAU, 5%) Feb 21 Web 4.0, AI Web Feb 26 (T) Data Mining Feb 28 Symbolic learning, AI, decision trees ID3, RF Mar 4-8 SPRING RECESS NO CLASS Mar 12 (T) MIDTERM EXAM (30%) Mar 14 Statistical analysis, regression, Bayes Mar 19 (T) DM tools, Weka TA session Mar 21 Statistical ML, SVM, CRF Readings, lecture Mar 26 (T) Neural networks, Backprop Readings, lecture Mar 28 Deep learning, review Readings, lecture Apr 2 (T) REVIEW PAPER PRESENTATION (15%) ASSIGNMENT 2 DUE (WEKA, 5%) Apr 4 REVIEW PAPER PRESENTATION Apr 9 (T) Deep learning, CNN Readings, lecture Apr 11 Text Mining Apr 16 (T) IR/IE, Sentiment analysis Readings, lecture Apr 18 Information Visualization Readings, lecture Apr 23 (T) RESEARCH PROJECT PRESENTATION (30%) Apr 25 RESEARCH PROJECT PRESENTATION Apr 30 (T) RESEARCH PROJECT PRESENTATION May 3-9 FINAL EXAM WEEK NO EXAM FOR MIS 464 May 9 FINAL PROJECT PAPER DUE (10%) 5