MSA 8150: Machine Learning for Analytics Syllabus for Spring 2016 Contents 1 Catalog Description 2 1.1 Sections...................................... 2 1.2 Instructor..................................... 2 1.3 Contact the instructor.............................. 2 1.4 Course Web-site.................................. 2 2 Overview 3 2.1 Intended audience................................. 3 2.2 Learning objectives................................ 3 3 Schedule 3 4 Readings by Session 5 4.1 References..................................... 5 4.2 E-Books and other resources........................... 5 4.3 Software...................................... 6 5 Evaluation 6 6 Machine Learning Headlines 7 7 Homework 7 8 Group Project 7 9 Examinations 8 10 Literature 8 11 Workload Expectations 8 12 Student Behavior 10 12.1 Discrimination and harassment......................... 10 12.2 Official department class policies........................ 10 1
1 Catalog Description The current university catalog description of this course can be obtained in the University s Catalog: http://www.gsu.edu/es/catalogs_courses.html A recent university catalog description follows: The course will cover theory, methods, and tools for automated inference from data. This introductory course will include (1) supervised learning, (2) unsupervised learning methods, (3) graphical structure models, and (4) deep learning. The course will prepare students in the fundamentals of machine learning, as well as provide practical skills in apply current software tools to machine inference from large data sets. 1.1 Sections Room: RCB Buckhead Center 306 Days: Thursdays Time: 7:15 PM 9:45 PM 1.2 Instructor Dr. Péter Molnár Email: pmolnar@gsu.edu Office: (404) 413-7713 Office hours: TBA & by Appointment. 1.3 Contact the instructor During the term, it is highly recommended that you contact the instructor, in-person or via email. I am available to help you focus your projects, gain access to resources, and answer your questions. Please try to see me, phone me, or e-mail me at least once during the term to discuss your project. Your class members are also a good source of help. 1.4 Course Web-site Class information will be posted on the D2L site. There will be links to other web-sites with course related material. 2
2 Overview This class covers the principles of machine learning emphasizing fundamentals, methods, and tools. 2.1 Intended audience Anyone with a keen interest in machine learning and machine inference will do well in this course. It s mainly geared to produced Data Analysts. 2.2 Learning objectives Upon successful completion of this course, you will accomplish the following objectives and outcomes. In particular, students who complete this course will gain Ready for work skills (along with theory), including: 1. Specifying and reasoning about machine learning problems 2. Applying machine learning tools & techniques Specific objectives include the following: 1. Understand the machine learning context 2. Understand common machine learning models 3. Understand and apply common mining tools 4. Create descriptive and predictive models from data 5. Know current concepts in machine learning, machine learning, and prediction 6. Demonstrate critical thinking, integrative reasoning, & communication skills 3 Schedule The course schedule is shown in Table 1. However, the topics and readings may change according to the interests and abilities of the class. See the Academic Calendar. Materials may be updated 24 hours prior to class; please check before attending class. 3
Date Readings In class Due 1 Jan. 14 Introduction 1, 3.6 2 Jan. 21 Information Based Learning 4.1-3, 4.4.4-5 3 Jan. 28 Information Based Learning 4 Feb. 4 Similarity Based Learning 5.1-5.3, 5.4.[1,3,6] Lecture, Quiz, Demo Hands-on, Panel 1 Lecture, Quiz HW 1 5 Feb. 11 Similarity Based Learning Hands-on, Panel 2 6 Feb. 18 Probability Based Learning 6.1-3, 6.4.1 Lecture, Quiz HW 2 7 Feb. 25 Probability Based Learning Hands-on, Panel 3 8 Mar. 3 Exam 1 9 Mar. 10 Error Based Learning 7.1-3, 7.4.4-7 Lecture, Quiz HW 3 10 Mar. 24 Error Based Learning Hands-on, Panel 4 Project Proposal 11 Mar. 31 Evaluation 8 Lecture, Quiz HW 4 12 Apr. 7 Artificial Neural Networks and Deep Learning TBA 13 Apr. 14 Artificial Neural Networks and Deep Learning Lecture, Quiz Demo, Panel 5 14 Apr. 21 Project Highlights Presentations Group Project 15 Apr. 28 Exam 2 Table 1: Additional reading material will be posted on the web-site. Unless indicated otherwise chapter number refer to the primary text book. 4
4 Readings by Session Readings provide content for class discussions. Thus, readings must be read prior the class. Don t get more than 1 week ahead of the class in the readings. Sometimes (mostly rarely) readings may be changed 1 week prior to their presentation in class. Table 1 shows the readings by session. 4.1 References Students must have access to the primary textbook: Primary Textbook: Kelleher, Mac Namee, D Arcy Machine Learning for Predictive Data Analytics MIT Press, 2015. Readings 1. Murphy, K.P. Machine Learning: a probabilistic perspective MIT press, 2012. 2. Mitchell, T.M. Machine Learning McGraw-Hill, 1997. Some books can be accessed from E-book from Books24x7. Most articles have a URL, which can be used to. download the article. (This assumes that you are on the university network directly or VPN. You may be prompted for your campus ID and password.) Some articles may be only available from our web site. To find other articles, use the method described in Section 10.) 4.2 E-Books and other resources Consider the E-books as good resource; they are free to our students. See this note: http: //www2.cis.gsu.edu/cis/news/newandnoteworthy2.asp Books 24x7. Access from the GSU online library: http://homer.gsu.edu/search/ databases/proxy/gll25038; select the link Books24x7 IT ebooks http://it-ebooks.info Select Search ebook by: Title. The site hosts complete textbooks for download in PDF. 5
Assignment Percentage Exam 1 20% Exam 2 20% Quizzes (top 5) 10% Homework (4) 20% Group Project 20% Machine Learning Headlines 10% Total 100% Table 2: Deliverables and their weights. Grade Percentage Grade Percentage A+ 97 C+ 73 A 90 C 70 A- 87 C- 67 B+ 83 D 60 B 80 F < 60 B- 77 Table 3: Breakout depicts how grades will be assigned under this system. 4.3 Software The course utilizes open source software on Windows, OS X, and Linux. Your laptop should have Python 3 and Jupyter (aka IPython Notebook) installed. Using the Anaconda installation https://www.continuum.io/downloads is a good start to have most of the packages available. 5 Evaluation Students are evaluated by the deliverables summarized in Table 2. 6
6 Machine Learning Headlines Each student will be randomly assigned to a panel, though there is no coordination among the panel members. The objective of this activity is to present a (somewhat) recent article or news item concerning machine learning, and generating a class discussion around it. Each panelist has the opportunity to pitch their pick to the class in a 3 minute presentation. After the presentations are completed the panel will moderate discussion about their presented topics. Hereby each panelist should ensure that their selected topics stays relevant during the conversation. Criteria for selecting a head line: Show the relevance of machine learning for everyone Present machine learning course materials in the context of real, ongoing, problems Generate discussion about machine learning?in particular, tradeoffs, decision-making, and consequences of machine learning for organizations and people In your 3 minute presentation: Show the news article(s), blogs, etc. Present a few (not more than 3) PowerPoint slides summarizing the articles, the machine learning issues, and provide issues and questions for subsequent discussion 7 Homework Homework assignments are the continuation of a hands-on activities in class. Detailed information about the activity and expectation for successful completion are provided with the instructions. See the web site for the most recent and detailed information on these assignments. Homeworks are individual assignments! 8 Group Project The project should showcase some of the methods and tools are introduced in this course. Teams can comprise up to five students, and should form within the first few weeks of the term. Teams are free to choose a data set for their project. Teams should start early on developing their project proposal. The use of proprietary or classified data sets is not allowed. Project deliverables include a detailed report, functioning code, and a poster. Details about requirements and evaluation criteria will be posted on the web-site. 7
9 Examinations Quizzes are given out in the middle of the class and comprise only a few questions. However, some questions may need some thinking and calculations. The format of the exams is similar to that of quizzes. Use of books and electronic devices is prohibited during the exam. You are allowed to bring your own cheat-sheet, which can be up to 4 pages (two sheets double sides) long. The cheat-sheets needs to be turned in with the exam. (Make copies for your records.) 10 Literature Search for peer reviewed articles using keywords: 1. Scan the web (in particular using scholar search engines) (a) http://scholar.google.com/ (b) http://academic.live.com/ (c) http://citeseer.ist.psu.edu/ 2. Scan using library databases (@GSU) (a) http://www.galileo.usg.edu (b) In particular, the following databases i. ABI/INFORM Complete ii. ACM Digital Library iii. IEEE Xplore 11 Workload Expectations Students should plan for 2-3 hours of work outside of class each week for each course credit hour. Thus, a 3-credit course averages between 6 and 9 hours of student work outside of the classroom, each week. See GSU sites for Academic Success: http://www2.gsu.edu/~wwwcam/incept/successtips.html http://www2.gsu.edu/~wwwctr/sac/studyskills.htm 8
Self-Managed Teams: Teams will be allowed for some activities during the term. Please note that unless the activity is explicitly identified as a team activity, I expect everyone to perform their own work (your hands on the keyboard). For team activities, you will be allowed to work with partners (of your choosing). Initial teams must be established by the second week of classes. Established teams may continue working together on subsequent team activities. Team membership may change during the term, if problems arise. However, team members must be designated within one week of the due date for the team activity. Exception: you may withdraw from a team at any time and submit an assignment individually. Teams will submit one assignment for all team members. In most cases, each member of the team will get the same score. However, an individual s score may be reduced at the discretion of the instructor. Each team assignment must include the following: Tasks completed by each member. Percentage of the total work completed by each member. Any individual with a low team contribution will be removed from their team. Arbitration: There will be a one-week arbitration period after graded activities are returned. Within that one-week period, you are encouraged to discuss any assumptions and/or misinterpretations that you made on the activity that may have influenced your grade. Attendance: If you are unable to attend a class session, it is your responsibility to acquire the class notes, assignments, announcements, etc. from a classmate. The instructor will not give private lectures for those that miss class. Submission of Deliverables: Unless specific, prior approval is obtained, no deliverable will be accepted after the specified due date. If you have a legitimate personal emergency (e.g., health problem) that may impair your ability to submit a deliverable on time, you must take the initiative to contact the instructor before the due date/time (or as soon after your emergency as possible) to communicate the situation. Make-up exams will not be given: However, if a student has a planned absence, he or she may take the exam earlier with the permission of the instructor. 9
12 Student Behavior Behavior in class should be professional at all times. People must treat each other with dignity and respect in order for scholarship to thrive. Behaviors that are disruptive to learning will not be tolerated and may be referred to the Office of the Dean of Students for disciplinary action. 12.1 Discrimination and harassment Discrimination and/or harassment will not be tolerated in the classroom. In most cases, discrimination and/or harassment violates Federal and State laws and/or University Policies and Regulations. Intentional discrimination and/or harassment will be referred to the Affirmative Action Office and dealt with in accordance with the appropriate rules and regulations. Unintentional discrimination and/or harassment is just as damaging to the offended party. But, it usually results from people not understanding the impact of their remarks or actions on others, or insensitivity to the feelings of others. We must all strive to work together to create a positive learning environment. This means that each individual should be sensitive to the feelings of others, and tolerant of the remarks and actions of others. If you find the remarks and actions of another individual to be offensive, please bring it to their attention. If you believe those remarks and actions constitute intentional discrimination and/or harassment, please bring it to my attention. 12.2 Official department class policies 1. Prerequisites are strictly enforced. Students failing to complete any of the prerequisites with a grade of C or higher will be administratively withdrawn from this course with loss of tuition fees. There are no exceptions, except as granted by the instructor with the approval of the department. 2. Students are expected to attend all classes and group meetings, except when precluded by emergencies, religious holidays, or bona fide extenuating circumstances. 3. Students who, for non-academic reasons beyond their control, are unable to meet the full requirements of the course should notify the instructor, by email, as soon as this is known and prior to the class meeting. Incompletes may be given if a student has ONE AND ONLY ONE outstanding assignment. 4. A W grade will be assigned if a student withdraws before mid-semester if (and only if) he/she has maintained a passing grade up to the point of withdrawal. Withdrawals after the mid-semester date will result in a grade of WF. See the GSU catalog or registrar s office for details. 10
5. Spirited class participation is encouraged and informed discussion in class is expected. This requires completing readings and assignments before class. 6. All exams and individual assignments are to be completed by the student alone with no help from any other person. 7. Collaboration within groups is encouraged for project work. However, collaboration between project groups will be considered cheating. 8. Copying work from the Internet without a proper reference is considered plagiarism and subject to disciplinary action as delineated in the GSU Student Handbook. 9. Any non-authorized collaboration will be considered cheating and the student(s) involved will have an Academic Dishonesty charge completed by the instructor and placed on file in the Dean s office and the CIS Department. All instructors regardless of the type of assignment will apply this Academic Dishonesty policy equally to all students. Abstracted from GSU s Student Handbook Student Code of Conduct Policy on Academic Honesty and Procedures for Resolving Matters of Academic Honesty (a) http://www2.gsu.edu/%7ewwwdos/codeofconduct_conpol.html (b) http://www2.gsu.edu/~wwwcam/ As members of the academic community, students are expected to recognize and uphold standards of intellectual and academic integrity. The University assumes as a basic and minimum standard of conduct in academic matters that students be honest and that they submit for credit only the products of their own efforts. Both the ideals of scholarship and the need for fairness require that all dishonest work be rejected as a basis for academic credit. They also require that students refrain from any and all forms of dishonorable or unethical conduct related to their academic work. Students are expected to discuss with faculty the expectations regarding course assignments and standards of conduct. Here are some examples and definitions that clarify the standards by which academic honesty and academically honorable conduct are judged at GSU. Plagiarism. Plagiarism is presenting another person s work as one s own. Plagiarism includes any paraphrasing or summarizing of the works of another person without acknowledgment, including the submitting of another student s work as one s own. Plagiarism frequently involves a failure to acknowledge in the text, notes, or footnotes the quotation of the paragraphs, sentences, or even a few phrases written or spoken by someone else. The submission of research or completed papers or projects by someone else is plagiarism, as is the unacknowledged use of research sources gathered by someone else when that use is specifically forbidden by the faculty member. Failure to indicate the extent and nature of one s reliance on other sources is also a form of plagiarism. Any work, in whole or part, taken from the Internet or other computer based resource without properly referencing the source (for example, the URL) is considered plagiarism. A complete reference is required in 11
order that all parties may locate and view the original source. Finally, there may be forms of plagiarism that are unique to an individual discipline or course, examples of which should be provided in advance by the faculty member. The student is responsible for understanding the legitimate use of sources, the appropriate ways of acknowledging academic, scholarly or creative indebtedness, and the consequences of violating this responsibility. Cheating on Examinations. Cheating on examinations involves giving or receiving unauthorized help before, during, or after an examination. Examples of unauthorized help include the use of notes, texts, or?crib sheets? during an examination (unless specifically approved by the faculty member), or sharing information with another student during an examination (unless specifically approved by the faculty member). Other examples include intentionally allowing another student to view one s own examination and collaboration before or after an examination if such collaboration is specifically forbidden by the faculty member. Unauthorized Collaboration. Submission for academic credit of a work product, or a part thereof, represented as its being one s own effort, which has been developed in substantial collaboration with another person or source or with a computer-based resource is a violation of academic honesty. It is also a violation of academic honesty knowingly to provide such assistance. Collaborative work specifically authorized by a faculty member is allowed. Falsification. It is a violation of academic honesty to misrepresent material or fabricate information in an academic exercise, assignment or proceeding (e.g., false or misleading citation of sources, the falsification of the results of experiments or of computer data, false or misleading information in an academic context in order to gain an unfair advantage). Multiple Submissions. It is a violation of academic honesty to submit substantial portions of the same work for credit more than once without the explicit consent of the faculty member(s) to whom the material is submitted for additional credit. In cases in which there is a natural development of research or knowledge in a sequence of courses, use of prior work may be desirable, even required; however the student is responsible for indicating in writing, as a part of such use, that the current work submitted for credit is cumulative in nature. 12