CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE Mingon Kang, PhD Computer Science, Kennesaw State University
Self Introduction Mingon Kang, PhD Homepage: http://ksuweb.kennesaw.edu/~mkang9 Or Google Mingon Kang and the top one. Research interests: Bioinformatics, Machine Learning, Data Mining, and Big Data Analytics Projects you may be interested in: Several Genomic projects in Bioinformatics Facial Image Recognition Gender/Age/Emotion
Now it s your turn Name, program/year, where from Your interests in Computer Science Your favorites What do you expect in Big Data Analytics? If you are in the online course, introduce yourself in D2L, Discussions Self-Introduction
Course Information Instructor: Dr. Mingon Kang Office: J-339 Email: mkang9@kennesaw.edu Only reply to e-mails that are sent from KSU student email accounts and list the course number Office Hours: Tue/Wed, 1-5pm Anytime my door is open. Course Materials Homework assignments, lecture slides, and other materials will be posted in D2L. All lectures will be recorded.
Choice of Language You can use your favorite language, but R, Matlab, Python are highly recommended. The course will briefly introduce R/Python in case you have no experience of those script languages. Why? Better for file I/O of textual data Better to do matrix manipulation Fast Prototyping
Topics in Machine Learning Classification Problems Decision Tress Linear Models Naïve Bayes Classifiers Logistic Regression Fisher Linear Discriminant Analysis K-Nearest Neighbors Data Presentation Principal Component Analysis Clustering Problems K-Means
Topics in Big Data Analytics MapReduce Framework Hadoop Apache Spark Applications in Big Data Analytics
Textbook Advanced Analytics with Spark (Patterns for Learning from Data at Scale) By Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills O Reilly Media, 2015
Reference Books in Machine Learning Pattern Recognition and Machine Learning, Christopher M. Bishop, 6-edition, Springer-Verlag New York, 2006
Reference Book Elements of statistical learning, Hastie, Tibshirani, Friedman, Edition 2, Springer, 2009
Evaluation (tentative) Attendance (CS4491&CS7265: 5%) If a student misses more than 4 session (class meetings), the student's final grade for the course may be reduced by 5% Homework Assignment (4-5 assignments: CS4491&CS7265-W01: 45%, CS7265: 40%) All programming assignments Exams (40%) Exam1 (20%) and Exam 2 (20%) Project (10%) Machine Learning (6%) Spark (4%) Presentation (5%) - only for CS7265 A PhD student will be asked additional presentations. Late submission policy: see syllabus
Grade Evaluation CS4491 CS7265 A 90% - 100% 90% - 100% B 75% - 89% 80% - 89% C 60% - 74% 70% - 79% D 45% - 59% 60% - 69% F 44% or below 59% or below
Two components in Project One component in Machine Learning/Data Mining Algorithms/One in Big Data Analytics Individual work You can choose one ML project from Kaggle Dataset (www.kaggle.com) UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/) One component in Big Data Analytics from the textbook
Academic Integrity Academic dishonesty Cheating Plagiarism Collusion The submission for credit of any work or materials that are attributable in whole or in part to another person Taking an examination for another person Any act designed to give unfair advantage to a student or the attempt to commit
How to succeed this class THINK hard, not WORK hard Scientific Thinking Passion to learn something NEW ASK ME questions (office hours) Begin homework assignments EARLY
Programming? Looks like building a house Timeline to build a house Layout: Floor plan Excavation Footing/ Foundation Framing Mechanicals Insulation Drywall Paint
House!!
Work of art like Antoni Gaudi s?
Before beginning the course Let s discuss about the origins of Computer Science
Philosophy Definition of the word The study of the fundamental nature of knowledge, reality, and existence, especially when considered as an academic discipline. Oxford Dictionary Literally means love of wisdom or friend of wisdom
Philosophy Flooding of the Nile Logic logically describe world (around 500 BC) From God to Human Ancient Graeco-Roman philosophy Socrates, Plato, Aristotle, and etc..
Philosophers Aristotle Gottfried Wilhelm Leibniz George Boole Bertrand Russell Alan Turing
Aristotle (384 322 BC) So many different roles Physics, Biology, Music, Linguistics, Zoology, Economy, Politics How to understand the different world? LOGIC
Gottfried Wilhelm Leibniz German philosopher (1646-1716) Known as one of the founding fathers of calculus Wanted to prove all phenomena using binary logic Convert world to binary logic
George Boole English mathematician, philosopher, and logician (1815-1864) Author of The Laws of Thought Inventor of Boolean Logic Note that Boolean logic can be used to implement binary arithmetic
Bertrand Russell British philosopher, logician, mathematician, historian, writer, social critic and political activist Wanted to make perfect mathematics from perfect logic Author of Principia Mathematica, published in 1910, 1912, and 1913. Total of 1994 pages!!
Principia Mathematica 54.43: "From this proposition it will follow, when arithmetical addition has been defined, that 1+1=2." Volume I, 1st edition, page 379
Alan Turing (1912-1954) Automatize logic. If everything can be explained by logic, we may implement the logic automatically not manually. Introduced Turing test: https://www.csee.umbc.edu/courses/471/papers/turing.pdf Turing Machine A model of a general purpose computer
Summary Aristotle (384-322BC) modern disciplines Gottfried Wilhelm Leibniz (1646-1716) binary logic George Boole (1815-1864) Boolean Logic Bertrand Russell (1872-1970) Principia Mathematica Alan Turing (1912-1954) Automated logic See http://www.datesandevents.org/events-timelines/07-computer-history-timeline.htm