Machine Learning Nanodegree Syllabus

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning From the Past with Experiment Databases

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

(Sub)Gradient Descent

CSL465/603 - Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS Machine Learning

Reducing Features to Improve Bug Prediction

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Rule Learning With Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

STA 225: Introductory Statistics (CT)

Applications of data mining algorithms to analysis of medical data

arxiv: v1 [cs.lg] 15 Jun 2015

Human Emotion Recognition From Speech

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

Artificial Neural Networks written examination

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Universidade do Minho Escola de Engenharia

Probability and Statistics Curriculum Pacing Guide

Assignment 1: Predicting Amazon Review Ratings

Truth Inference in Crowdsourcing: Is the Problem Solved?

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Model Ensemble for Click Prediction in Bing Search Ads

Mining Association Rules in Student s Assessment Data

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lecture 1: Basic Concepts of Machine Learning

Welcome to. ECML/PKDD 2004 Community meeting

Speech Emotion Recognition Using Support Vector Machine

MGT/MGP/MGB 261: Investment Analysis

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Australian Journal of Basic and Applied Sciences

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

CS 446: Machine Learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Generative models and adversarial training

Indian Institute of Technology, Kanpur

A survey of multi-view machine learning

Learning Methods for Fuzzy Systems

Issues in the Mining of Heart Failure Datasets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Time series prediction

arxiv: v2 [cs.cv] 30 Mar 2017

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

WHEN THERE IS A mismatch between the acoustic

Cultivating DNN Diversity for Large Scale Video Labelling

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Word Segmentation of Off-line Handwritten Documents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Attributed Social Network Embedding

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Firms and Markets Saturdays Summer I 2014

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Multivariate k-nearest Neighbor Regression for Time Series data -

Axiom 2013 Team Description Paper

Linking Task: Identifying authors and book titles in verbose queries

Softprop: Softmax Neural Network Backpropagation Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

Corrective Feedback and Persistent Learning for Information Extraction

Laboratorio di Intelligenza Artificiale e Robotica

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

arxiv: v1 [cs.cv] 10 May 2017

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Knowledge Transfer in Deep Convolutional Neural Nets

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Test Effort Estimation Using Neural Network

OFFICE SUPPORT SPECIALIST Technical Diploma

Speech Recognition at ICSI: Broadcast News and beyond

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

A Case Study: News Classification Based on Term Frequency

Dublin City Schools Mathematics Graded Course of Study GRADE 4

2017 FALL PROFESSIONAL TRAINING CALENDAR

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Probabilistic Latent Semantic Analysis

A Deep Bag-of-Features Model for Music Auto-Tagging

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Data Fusion Through Statistical Matching

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Transcription:

Machine Learning Nanodegree Syllabus Artificial Neural Networks, TensorFlow, and Machine Learning Algorithms Before You Start Prerequisites: In order to succeed in this program, we recommend having experience programing in Python, knowledge of inferential statistics, probability, linear algebra and calculus. If you ve never programmed before, or want a refresher, you can prepare for this Nanodegree with Lessons 1-4 of Intro to Computer Science. Educational Objectives: This program will teach you how to become a Machine Learning Engineer, build Machine Learning models and apply them to data sets in fields like finance, healthcare, education, and more. Length of Program*: Hours Frequency of Classes: Self-paced Textbooks required: None Instructional Tools Available: Video lectures, 1:1 appointments, forum support, In-classroom mentorship *This is a self-paced program and the length is an estimation of total hours the average student may take to complete all required coursework, including lecture and project time. Actual hours may vary. Project 0: Titanic Survival Exploration In this project, you will create decision functions that attempt to predict survival outcomes from the 1912 Titanic disaster based on each passenger s features, such as sex and age. You will start with a simple algorithm and increase its complexity until you are able to accurately predict the outcomes for at least 80% of the passengers in the provided data. By the end of this project you ll be able to: Use basic Python code to clean a dataset for analysis Run code to create visualizations from the wrangled data Analyze trends shown in the visualizations and report your conclusions Determine if this program is a good fit for your time and talents

Project 1: Predicting Boston Housing Prices The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients' homes. Supporting Lesson Content: Model Evaluation and Validation STATISTICAL ANALYSIS DATA MODELING EVALUATION AND VALIDATION MANAGING ERROR AND COMPLEXITY Identify key features of datasets, such as average, mean, median, mode, standard deviation, and quantiles. Learn the basic types of data. Learn how to handle datasets in sklearn. Test a model, and use metrics such as accuracy and recall to compare and improve models. Learn the types of error such as overfitting and underfitting. Learn to identify them using learning curves and model complexity. Apply techniques such as cross validation to improve your models. Project 2: Find Donors for CharityML CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.

Supporting Lesson Content: Supervised Learning SUPERVISED LEARNING TASKS DECISION TREES ARTIFICIAL NEURAL NETWORKS SUPPORT VECTOR MACHINES NONPARAMETRIC MODELS BAYESIAN METHODS ENSEMBLE OF LEARNERS Difference between Regression and Classification Learn to predict values with Linear Regression Learn to predict states using Logistic Regression Train Decision Trees to predict states Use Entropy to build decision trees recursively Learn the definition of a Neural Network Learn to train them using backpropagation Build a neural network starting from a single perceptron Learn to train a Support Vector Machine to separate data linearly Use Kernel Methods in order to train SVMs on data that is not linearly separable Instance Based Learning Learn the Bayes rule, and how to apply it to predicting data using the Naive Bayes algorithm Train models using Bayesian Learning Use Bayesian Inference to create Bayesian Networks of several variables Bayes NLP Mini-Project Enhance traditional algorithms via boosting Random forests AdaBoost Project 3: Creating Customer Segments In this project you will apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor in Lisbon, Portugal to identify customer segments hidden in the data. You will first explore the data by selecting a small subset to sample and determine if any product categories highly correlate with one another. Afterwards, you will preprocess the data by scaling each product category and then identifying (and removing) unwanted outliers. With the good, clean customer spending data, you will apply PCA transformations to the data and implement clustering

algorithms to segment the transformed customer data. Finally, you will compare the segmentation found with an additional labeling and consider ways this information could assist the wholesale distributor with future service changes. Supporting Lesson Content: Unsupervised Learning CLUSTERING FEATURE ENGINEERING DIMENSIONALITY REDUCTION Learn the basics of clustering Data Cluster data with the K-means algorithm Cluster data with Single Linkage Clustering Gaussian models and Expectation Maximization Learn to scale features in your data Learn to select the best features for training data Reduce the dimensionality of the data using Principal Component Analysis and Independent Component Analysis Project 4: Train a Smartcab to Drive In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents, known as smartcabs, to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to depend on smartcabs to get to where they need to go as safely and reliably as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or reliable as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and reliability can be achieved. Supporting Lesson Content: Reinforcement Learning REINFORCEMENT LEARNING GAME THEORY Learn the basics of Markov Decision Processes Find optimal policies using Q-Learning. Poker strategies Equilibriums Minimax Strategies

Project 5: Dog Breed Classifier Supporting Lesson Content: Deep Learning MACHINE LEARNING TO DEEP LEARNING DEEP NEURAL NETWORKS CONVOLUTIONAL NEURAL NETWORKS The basics of deep learning, including softmax, one-hot encoding, and cross entropy. Basic linear classification models such as Logistic Regression, and their associated error function. Review: What is a Neural Network? Activation functions, sigmoid, tanh, and ReLus. How to train a neural network using backpropagation and the chain rule. How to improve a neural network using techniques such as regularization and dropout. What is a Convolutional Neural Network? How CNNs are used in Image recognition. Project 6: Capstone Proposal In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points: The project's domain background the field of research where the project is derived; A problem statement a problem being investigated for which a solution will be defined; The datasets and inputs data or inputs being used for the problem; A solution statement a the solution proposed for the problem given; A benchmark model some simple or historical model or result to compare the defined solution to; A set of evaluation metrics functional representations for how the solution can be measured; An outline of the project design how the solution will be developed and results obtained.

Project 7: Capstone Project In this capstone project, you will leverage what you ve learned throughout the Nanodegree program to solve a problem of your choice by applying machine learning algorithms and techniques. You will first define the problem you want to solve and investigate potential solutions and performance metrics. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it. You will then implement your algorithms and metrics of choice, documenting the preprocessing, refinement, and postprocessing steps along the way. Afterwards, you will collect results about the performance of the models used, visualize significant quantities, and validate/justify these values. Finally, you will construct conclusions about your results, and discuss whether your implementation adequately solves the problem.