Programming Assignment2: Neural Networks

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

CS Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Artificial Neural Networks written examination

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Knowledge Transfer in Deep Convolutional Neural Nets

CSL465/603 - Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Model Ensemble for Click Prediction in Bing Search Ads

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

(Sub)Gradient Descent

Modeling function word errors in DNN-HMM based LVCSR systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

INPE São José dos Campos

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CS 100: Principles of Computing

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Course Content Concepts

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Attributed Social Network Embedding

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v1 [cs.lg] 7 Apr 2015

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Test Effort Estimation Using Neural Network

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Softprop: Softmax Neural Network Backpropagation Learning

Word Segmentation of Off-line Handwritten Documents

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Human Emotion Recognition From Speech

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS177 Python Programming

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Calibration of Confidence Measures in Speech Recognition

arxiv: v1 [cs.cv] 10 May 2017

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

M55205-Mastering Microsoft Project 2016

Deep Neural Network Language Models

Using focal point learning to improve human machine tacit coordination

Axiom 2013 Team Description Paper

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A study of speaker adaptation for DNN-based speech synthesis

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

On the Formation of Phoneme Categories in DNN Acoustic Models

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

On-the-Fly Customization of Automated Essay Scoring

Learning Methods for Fuzzy Systems

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Probability and Statistics Curriculum Pacing Guide

Generative models and adversarial training

WHEN THERE IS A mismatch between the acoustic

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Residual Stacking of RNNs for Neural Machine Translation

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

arxiv:submit/ [cs.cv] 2 Aug 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Evolution of Symbolisation in Chimpanzees and Neural Nets

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Learning From the Past with Experiment Databases

An empirical study of learning speed in backpropagation

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

CS Course Missive

Early Model of Student's Graduation Prediction Based on Neural Network

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

*Lesson will begin on Friday; Stations will begin on the following Wednesday*

A deep architecture for non-projective dependency parsing

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

LEGO MINDSTORMS Education EV3 Coding Activities

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Statewide Framework Document for:

Individual Differences & Item Effects: How to test them, & how to test them well

Navigating the PhD Options in CMS

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Dialog-based Language Learning

A Review: Speech Recognition with Deep Learning Methods

arxiv: v4 [cs.cl] 28 Mar 2016

Transcription:

Programming Assignment2: Neural Networks Problem :. In this homework assignment, your task is to implement one of the common machine learning algorithms: Neural Networks. You will train and test a neural network with the dataset we provided and experiment with different settings of hyper parameters. This assignment also includes a written part which is to help you understand how to train a neural net and its solution will be used to test your code as it is not easy to debug neural networks. Note: For this assignment, you should not use any machine learning libraries. You should implement the neural network learning algorithm yourself. You can use any Python standard library and the following third-party libraries: 1. numpy: for creating arrays and using methods to manipulate arrays. 2. matplotlib: for making plots 3. pandas or other packages for visualizaing data or graphs Dataset Figure 1: Sample images in MNIST The dataset we use for this assignment is the MNIST database. It is a database of handwritten digits. Images for 100 samples/digits are given in the above figure. We have split the date set into training set and test set stored in five csv.gz files. The dataset can be downloaded on the course website. 1

Training samples: http://www.cs.princeton.edu/courses/archive/fall16/cos402/ex/traindigitx.csv.gz. Training labels: http://www.cs.princeton.edu/courses/archive/fall16/cos402/ex/traindigity.csv.gz. Test samples: http://www.cs.princeton.edu/courses/archive/fall16/cos402/ex/testdigitx.csv.gz. Test labels for samples in TestDigitX.csv.gz: http://www.cs.princeton.edu/courses/archive/fall16/cos402/ex/testdigity.csv.gz. More test samples(no labels are provided for samples in the second test set.): http://www.cs.princeton.edu/courses/archive/fall16/cos402/ex/testdigitx2.csv.gz. There are 50,000 training samples in TrainDigitX.csv.gz, 10,000 test samples in TestDigitX.csv.gz, and another 5,000 test sample in TestDigitX2.csv.gz. Each sample is a handwritten digit represented by a 28 by 28 greyscale pixel image. Each pixel is a value between 0 and 1 with a value of 0 indicating white. Each sample used in the dataset (a row in TrainDigitX.csv, TestDigitX.csv, or TestDigitX2.csv.gz) is a feature vector of length 784(28x28=784). TrainDIgitY.csv.gz and TestDigitY.csv.gz provide labels for samples in TrainDigitX.csv.gz and TestDigitX.csv.gz, respectively. The value of a label is the digit it represents, e.g, a label of value 8 indicates the sample represents the digit 8. Note: The data files are compressed. But you do not need to upcompress them. The loadtxt() method from numpy can read compressed or uncompressed csv files. Part 1: Manually update weights and biases for a small neural network Figure 2: A Small Neural Net Before designing and writing your code, you should work on the above neural network by hand. The small network has two neurons in the input layer, two neurons in the hidden layer and two neurons in the output layer. Weights and biases are marked on the figure. Note that there are no biases for the neurons in the input layer. There are two training samples: X1=(0.1, 0.1) and X2=(0.1, 0.2). The label for X1 is 0, so the desired output for X1 in the output layer should be Y1=(1,0). The label for X2 is 1, so the desired output for X2 in the output layer should be Y2=(0,1). You will update the weights and biases for the this small neural net using stochastic gradient descent with backpropagation. You should 2

use batch size of 2 and a learning rate of 0.1. You should update all the weights and biases only once MANUALLY. Show your work and results. And then use this as a test case to test your code which you implement on Part 2, train this network for 3 epochs and output the weights and biases after each epoch. These updated weights and biases should be outputed by running your code. It is imperative you do this part by yourself to make sure you have fully understood backpropagation. Part 2: Implement a neural network learning algorithm Your task is to implement a a neural network learning algorithm that creates a neural network of specified size and runs stochastic gradient descent on a cost function over given training data. The network you are working on in this assigment has 3 layers: one input layer, one hidden layer and one output layer. You should name your main file as neural network.py which accepts seven arguments. The grader will run your code on the command line in the following manner: >python neural network.py NInput NHidden NOutput TrainDigitX.csv.gz TrainDigitY.csv.gz TestDigitX.csv.gz PredictDigitY.csv.gz Your code should then train a neural net(ninput: number of neurons in the input layer, NHidden: number of neurons in the hidder layer, NOutpuy: number of neurons in output layer) using the training set TrainDigitX.csv.gz and TrainDigitY.csv.gz, and then make predictions for all the samples in TestDigitX.csv.gz and output the labels to PredictDigitY.csv.gz. You should set the default value of number of epochs to 30, size of mini-batches to 20, and learning rate to 3 respectively. Note: you can use numpy.loadtxt() and numpy.savetxt() methods to read from or write to a csv.gz file. The nonlinearity used in your neural net should be the basic sigmoid function. σ w,b (x) = 1 1 + e (w.x+b) We will be using mini-batches to train the neural net for several epochs. Mini-batches are just the training dataset divided randomly into smaller sets to approximate the gradient. The main steps of training a neural net using stochastic gradient descent are: Assign random initial weights and biases to the neurons. Each initial weight or bias is a random floating-point number drawn from the standard normal distribution (mean 0 and variance 1). For each training example in a mini-batch, use backpropagation to calculate a gradient estimate, which as you saw in class consists of following steps: 1. Feed forward the input to get the activations of the output layer. 2. Calculate derivatives of the cost function for that input with respect to the activations of the output layer. 3

3. Calculate the errors for all the weights and biases of the neurons using backpropogation. Update weights (and biases) using stochastic gradient descent: m w w η m i=1 error w i where m is the number of training examples in a mini-batch, errori w weight w for input i, and η is the learning rate. is the error of Repeat this for all mini-batches. Repeat the whole process for specified number of epochs. At the end of each epoch evaluate the network on the test data and display its accuracy. For this first part of the assignment, use the quadratic cost function. C(w, b) = 1 2n n f(x i ) y i 2 i=1 (w: weights, b: biases, n: number of test instances, x i : i th test instance vector, y i : test label vector, if the label for x i is 8, then y i will be [0,0,0,0,0,0,0,0,1,0], the ideal output of the trained neural network, f(x): function that the neural network guesses which maps input vector x to a label vector f(x)). You should do the following tasks with your neural net 1. Create a neural net of size [784,30,10]. This network has three layers: 784 neurons in the input layer, 30 neurons in the hidden layer, and 10 neurons in the output layer. Then train it on the training data for 30 epochs, with a minibatch size of 20 and η = 3.0 and test it on the test data(testdigitx.csv.gz). Make plots of test accuracy vs epoch. Report the maximum accuracy achieved. Now run you code again with the second test set(testdigitx2.csv.gz) and output your predictions to PredictDigitY2.csv.gz. In addition to your code, you should also upload both PredictTestY.csv.gz and PredictTestY2.gz to CS Dropbox. 2. Train new neural nets of the original specifications (specifications in 1) but with η = 0.001, 0.1, 1.0, 10, 100. Plot test accuracy vs epoch for each η on the same graph. Report the maximum test accuracy achieved for each η. (remember to create a new neural net each time so its starts learning from scratch.) 3. Train new neural nets of the original specifications but with mini-batch sizes = 1, 5, 10, 20, 100. Plot maximum test accuracy vs mini-batch size. test accuracy. Which one is the slowest? Which one achieves the maximum 4

4. Try different hyperparameter settings(number of epochs, η, and mini-batch size, etc.). Report the maximum test accuracy you achieved and the settings of all the hyper parameters. Note:You should try your implementation on the small network given in Part 1 before running your code on the MNIST data set. When you run your code on the small neural net, you should initialize the weights and biases as given in Figure 2, use the two samples and the same learning rate and batch size when you do it by hand in Part 1. Then train this network for 3 epochs and output the weights and biases after each epoch. (Make sure that the weights and biases outputed by the learned net after one epoch are the same as those you updated manually.) Part 3: An alternate cost function Now replace the quadratic cost function by a cross entropy cost function. C(w, b) = 1 n n y i ln [f(x i )] + (1 y i )ln [1 f(x i )] i=1 You only need to modify one function slightly to account for the change in the cost function if your code is well structured. Train a neural net with the original specifications. What is the test maximum accuracy achieved? Part 4:L 2 regularization.(optional, for bonus points) Use L 2 regularizers on the weights to modify the cost function. L 2 : C(w, b) = C 0 + λ 2n Train a neural net with the original specifications. What is the maximum test accuracy achieved with L 2 regularizer by a neural net of the original specifications? What and how to turn in: 1. Turn in hard copies in class on the due date. A printout of all your python scripts, answers and plots to questions in all sections. Plots should be made by python code or other software. w w 2 5

2. Upload your code and predictions to CS dropbox by the due date. Using this DropBox link, http://dropbox.cs.princeton.edu/cos402 F2016/Programming Assignment2, Upload all your python scripts and the two prediction files: PredictTestY.csv.gz and PredictTestY2.gz. You should only turn in uncompressed.py files. All code should be working and well documented. If appropriate, a readme.txt file explaining briefly how your code is organized, what data structures you are using, or anything else that will help the graders understand how your code works. 6