Subject Code: CSE4020 Indicative Pre-requisite Objective Expected Outcomes Machine Learning L,T,P,J,C 2,0,2,4,4 MAT2001- Statistics for Engineers It introduces theoretical foundations, algorithms, methodologies, and applications of Machine Learning and also provide practical knowledge for handling and analysing data sets covering a variety of real-world applications. After successfully completing the course the student should be able to 1. Recognize the characteristics of machine learning that make it useful to solve real-world problems. 2. Identify real-world applications of machine learning. 3. Identify and apply appropriate machine learning algorithms for analyzing the data for variety of problems. 4. Implement different machine learning algorithms using R and Python for analyzing the data 5. Design test procedures in order to evaluate a model 6. Combine several models in order to gain better results 7. Make choices for a model for new machine learning tasks based on reasoned argument SLO 5. Having design thinking capability 7. Having computational thinking (Ability to translate vast data in to abstract concepts and to understand database reasoning) 9. Having problem solving ability- solving social issues and engineering problems Module Topics L Hrs SLO 1 INTRODUCTION TO MACHINE LEARNING What is Machine Learning, Examples of Various Learning Paradigms, Perspectives and Issues, Version Spaces, Finite and Infinite Hypothesis Spaces, PAC Learning 2 Supervised Learning - I Learning a Class from Examples, Linear, Non-linear, Multi-class and Multi-label classification, Generalization error bounds: VC Dimension, Decision Trees: ID3, Classification and Regression Trees, Regression: Linear Regression, Multiple Linear Regression, Logistic Regression 3 5 4 5,7,9 3 Supervised Learning - II 5 5,7,9
Neural Networks: Introduction, Perceptron, Multilayer Perceptron, Support vector machines: Linear and Non-Linear, Kernel Functions, K- Nearest Neighbors 4 Ensemble Learning Model Combination Schemes, Voting, Error-Correcting Output Codes, Bagging: Random Forest Trees, Boosting: Adaboost, Stacking 5 Unsupervised Learning - I Introduction to clustering, Hierarchical: AGNES, DIANA, Partitional: K-means clustering, K-Mode Clustering, Self-Organizing Map, Expectation Maximization, Gaussian Mixture Models 6 Unsupervised Learning - II Principal components analysis (PCA), Locally Linear Embedding (LLE), Factor Analysis 7 Machine Learning in Practice Design, Analysis and Evaluation of Machine Learning Experiments, Feature selection Mechanisms, Other Issues: Imbalanced data, Missing Values, Outliers 3 5,7,9 7 5,7,9 3 5,7,9 3 5,7 8 Recent Trends in Machine Learning 2 Lab (Indicative List of Experiments (in the areas of ) 30 5,7 1. Implement Decision Tree learning 2. Implement Logistic Regression 3. Implement classification using Multilayer perceptron 4. Implement classification using SVM 5. Implement Adaboost 6. Implement Bagging using Random Forests 7. Implement K-means Clustering to Find Natural Patterns in Data 8. Implement Hierarchical clustering 9. Implement K-mode clustering 10. Implement Principle Component Analysis for Dimensionality Reduction 11. Implement Multiple Correspondence Analysis for Dimensionality Reduction
12. Implement Gaussian Mixture Model Using the Expectation Maximization 13. Evaluating ML algorithm with balanced and unbalanced datasets 14. Comparison of Machine Learning algorithms 15. Implement k-nearest neighbours algorithm Project # Generally a team project [5 to 10 members] # Concepts studied in XXXX should have been used # Down to earth application and innovative idea should have been attempted # Report in Digital format with all drawings using software package to be submitted. # Assessment on a continuous basis with a min of 3 reviews. Projects may be given as group projects The following is the sample project that can be given to students to be implemented: 1. Solving Data Science problems from Kaggle website 2. Applying Machine Learning algorithms in the field of biometrics for reliable and robust identification of humans from their personal traits, mainly for security and authentication purposes 3. Applying Machine Learning for OCR, Video Analytics 4. Applying Machine Learning algorithms in the field of Natural Language Processingfor document clustering and sentiment analysis 5. Applying Machine Learning for Fraud Detection, Customer segmentation etc. Note: Students can down load real time data sets for different Machine Learning Tasks from https://archive.ics.uci.edu/ml/datasets.html and http://sci2s.ugr.es/keel/datasets.php#sub1 and do the projects 60 [Non Contact hrs] 5,7,9 Text Books 1. Ethem Alpaydin,"Introduction to Machine Learning, MIT Press, Prentice Hall of India, Third Edition 2014. Reference Books 2. Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar "Foundations of Machine Learning, MIT Press, 2012. 3. Tom Mitchell, Machine Learning, McGraw Hill, 3rd Edition,1997. 4. Charu C. Aggarwal, Data Classification Algorithms and Applications, CRC Press, 2014. 5. Charu C. Aggarwal, DATA CLUSTERING Algorithms and Applications, CRC Press, 2014. 6. Kevin P. Murphy "Machine Learning: A Probabilistic Perspective", The MIT Press, 2012
Machine Learning Knowledge Areas that contain topics and learning outcomes covered in the course Knowledge Area Total Hours of Coverage CS: IS(Intelligent System) 30 Body of Knowledge coverage [List the Knowledge Units covered in whole or in part in the course. If in part, please indicate which topics and/or learning outcomes are covered. For those not covered, you might want to indicate whether they are covered in another course or not covered in your curriculum at all. This section will likely be the most time-consuming to complete, but is the most valuable for educators planning to adopt the CS2013 guidelines.] KA Knowledge Unit Topics Covered Hours CS: IS IS/Basic Machine Learning Introduction to Machine Learning 3 CS: IS IS/Advanced Machine Learning Supervised Learning - I Unsupervised Learning - I Machine Learning in Practice Supervised Learning - II Unsupervised Learning - II Ensemble Learning Recent Trends 27 Total hours 30
Where does the course fit in the curriculum? [In what year do students commonly take the course? Is it compulsory? Does it have prerequisites, required following courses? How many students take it?] This course is a Elective Course. Suitable from 5 th semester onwards. Knowledge of any one programming language is essential. What is covered in the course? [A short description, and/or a concise list of topics - possibly from your course syllabus.(this is likely to be your longest answer)] Part 1: Introduction to Machine Learning It introduces the concepts learning and various aspects of machine learning such different learning paradigms: Supervised, unsupervised, semi-supervised, reinforced. Part II: Supervised Learning - I It introduces Supervised learning algorithms for classification. Learning a Class from Examples, Linear, Non-linear, Multi-class and Multi-label classification, Generalization error bounds: VC Dimension, Decision Trees: ID3, Classification and Regression Trees, Regression: Linear Regression, Multiple Linear Regression, Logistic Regression Part III: Supervised Learning - II It introduces advanced Supervised learning algorithms for classification. Neural Networks: Introduction, Perceptron, Multilayer Perceptron, Support vector machines: Linear and Non- Linear, Kernel Functions, K-nearest neighbours Part IV: Ensemble Learning - I It introduces ensemble methods combining different models. Model Combination Schemes, Voting, Error-Correcting Output Codes, Bagging: Random Forest Trees, Boosting: Adaboost, Stacking
Part V: Unsupervised Learning - I It introduces various clustering techniques like Hierarchical: AGNES, DIANA, Partitional: K- means clustering, K-means++, K-Mode Clustering, Self-Organizing Map, Expectation Maximization, Gaussian Mixture Models Part VI: Unsupervised Learning - II It introduces dimensionality reduction techniques like Principal components analysis (PCA), Locally Linear Embedding (LLE), Factor Analysis Part VII: Machine Learning in Practice Design, Analysis and Evaluation of Machine Learning Experiments, Feature selection Mechanisms, Other Issues: Imbalanced data, Missing Values, Outliers What is the format of the course? [Is it face to face, online or blended? How many contact hours? Does it have lectures, lab sessions, discussion classes?] This Course is designed with 100 minutes of in-classroom sessions per week, 60 minutes of video/reading instructional material per week, 100 minutes of lab hours per week, as well as 200 minutes of non-contact time spent on implementing course related project. Generally this course should have the combination of lectures, in-class discussion, case studies, guest-lectures, mandatory off-class reading material, quizzes. How are students assessed? [What type, and number, of assignments are students are expected to do? (papers, problem sets, programming projects, etc.). How long do you expect students to spend on completing assessed work?] Students are assessed on a combination group activities, classroom discussion, projects, and continuous, final assessment tests. Additional weightage will be given based on their rank in crowd sourced projects/ Kaggle like competitions. Students can earn additional weightage based on certificate of completion of a related MOOC course.