CIS680: Vision & Learning Assignment 2.a: Gradient manipulation. Due: Oct. 16, 2018 at 11:59 pm

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

Generative models and adversarial training

Lecture 1: Machine Learning Basics

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

(Sub)Gradient Descent

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Knowledge Transfer in Deep Convolutional Neural Nets

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

arxiv:submit/ [cs.cv] 2 Aug 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

SORT: Second-Order Response Transform for Visual Recognition

Cultivating DNN Diversity for Large Scale Video Labelling

arxiv: v1 [cs.cl] 27 Apr 2016

Dropout improves Recurrent Neural Networks for Handwriting Recognition

CSL465/603 - Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Softprop: Softmax Neural Network Backpropagation Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

A Neural Network GUI Tested on Text-To-Phoneme Mapping

arxiv: v1 [cs.lg] 7 Apr 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Test Effort Estimation Using Neural Network

Lip Reading in Profile

THE enormous growth of unstructured data, including

arxiv: v1 [cs.cv] 10 May 2017

arxiv: v4 [cs.cv] 13 Aug 2017

Dialog-based Language Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Review: Speech Recognition with Deep Learning Methods

Attributed Social Network Embedding

Word Segmentation of Off-line Handwritten Documents

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v2 [cs.cl] 26 Mar 2015

Model Ensemble for Click Prediction in Bing Search Ads

CS Machine Learning

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Residual Stacking of RNNs for Neural Machine Translation

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Evolutive Neural Net Fuzzy Filtering: Basic Description

An empirical study of learning speed in backpropagation

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Deep Facial Action Unit Recognition from Partially Labeled Data

arxiv: v2 [cs.lg] 8 Aug 2017

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Moderator: Gary Weckman Ohio University USA

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Second Exam: Natural Language Parsing with Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Rule Learning With Negation: Issues Regarding Effectiveness

INPE São José dos Campos

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

A Deep Bag-of-Features Model for Music Auto-Tagging

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v4 [cs.cl] 28 Mar 2016

Offline Writer Identification Using Convolutional Neural Network Activation Features

Using focal point learning to improve human machine tacit coordination

Calibration of Confidence Measures in Speech Recognition

Artificial Neural Networks

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Georgetown University at TREC 2017 Dynamic Domain Track

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v2 [cs.cv] 30 Mar 2017

Introduction to Simulation

Notetaking Directions

Course Content Concepts

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

arxiv: v2 [cs.ir] 22 Aug 2016

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

AI Agent for Ice Hockey Atari 2600

Deep Neural Network Language Models

Learning From the Past with Experiment Databases

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

Human Emotion Recognition From Speech

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A study of speaker adaptation for DNN-based speech synthesis

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

arxiv: v1 [cs.dc] 19 May 2017

Transcription:

CIS680: Vision & Learning Assignment 2.a: Gradient manipulation. Due: Oct. 16, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their own answers, and each student must write their own code in the homework. It is admissible for students to collaborate in solving problems. To help you actually learn the material, what you write down must be your own work, not copied from any other individual. You must also list the names of students (maximum two) you collaborated with. There is no single answer to most problems in deep learning, therefore the questions will often be underspecified. You need to fill in the blanks and submit a solution that solves the (practical) problem. Document the choices (hyperparameters, features, neural network architectures, etc.) you made in the write-up. The assignment will describe the task on a high level. You are supposed to find out how to complete the assignment in the programming framework of your choice. While the text of the assignment should be sufficient to understand the task, you are welcome to read the references that will describe the used concepts in more detail. All the code should be written in Python. PyTorch to complete this homework. You should use either Tensorflow or The CIFAR-10 dataset can be downloaded from [1] and the MNIST dataset can be downloaded from [6]. PyTorch and Keras include Dataset classes for both datasets. You are free to use to them if you want. You must submit your solutions online on Canvas. You should submit 3 folders with code, one for each part. Submit your code compressed into a single ZIP file named <penn key>.zip. Jupyter notebooks are acceptable. Submit your PDF report to a separate assignment called HW2 PDF Submission. Note that you should include all results (answers, figures) in your report. 1

Introduction In this homework, we are continuing to explore the mathematical tools on which deep learning is based, while also moving towards real world network architectures. We will train simple CNNs on the CIFAR-10 and MNIST datasets, experiment with gradient approximation techniques, and use gradients to create adversarial images. 1 Data Pre-processing and Augmentation (30%) Large amounts of image data are essential for good performance of deep learning methods on computer vision tasks. However, there are simple techniques of augmenting data that allow to artificially enlarge an existing dataset to get even better performance. In this part, you will train a CNN on a complex, but small, dataset and experience how image processing plays an important role in the performance of the network. 1. (12%) Train a network with architecture shown in Table 1 using the raw images of CIFAR-10. Hint: You may start with the demo code in the lecture Practical Guide. Change the maximum of training iterations to 10,000 and steps of an epoch to 100 (with batch size 100). Also, be mindful of what s fed into the network. 2. (6%) Train the same network, but instead of feeding raw images, normalize images to zero mean and unit standard deviation. Explain the results compared to the previous question. 3. (6%) Train the same network, but in addition, flip the images randomly (with 50% chance) during training (before image normalization). Note that you should not flip the images during evaluation. Explain the difference of the results compared to the previous question. 4. (6%) Train the same network, but in addition, pad the images with 4 zero pixels on each side (after normalization) and crop a random 32 32 region of images during training. Note that you should not flip/pad/crop images during evaluation. 2

Layers Hyper-parameters Convolution 1 Kernel size = (5, 5, 32), SAME padding. Followed by BatchNorm and ReLU. Pooling 1 Average operation. Kernel size = (2, 2). Stride = 2. Padding = 0. Convolution 2 Kernel size = (5, 5, 32), SAME padding Followed by BatchNorm and ReLU. Pooling 2 Average operation. Kernel size = (2, 2). Stride = 2. Padding = 0. Convolution 3 Kernel size = (5, 5, 64), SAME padding Followed by BatchNorm and ReLU. Pooling 3 Average operation. Kernel size = (2, 2). Stride = 2. Padding = 0. Fully Connected Output channels = 64. Followed by BatchNorm and ReLU. Fully Connected Output channels = 10. Followed by Softmax. Table 1: Network architecture for part 1. Explain the difference of the results compared to the previous question. 2 Binary networks (35%) Binary neural networks (BNNs, [5]) are neural networks in which some of the computation is binarized. This might be beneficial from a few perspectives, including faster computation, smaller power consumption and the regularization effect. In this question, you have to implement a network that has binary activation values: either +1 or -1. You will use the Sign function as the activation function: { x b +1 if x 0, = Sign(x) = 1 otherwise, (1) The gradient of the Sign function is zero almost everywhere, which makes it impossible to train a BNN with gradient descent. Instead, a straight-through gradient approximator can be used [2, 4]: (Sign) = 1 x 1, (2) where 1 is the indicator function. In other words, the approximation of the gradient is one if the pre-activation value is within -1 to +1 range, and zero otherwise. 1. (10%) Train a CNN with architecture in table 2 on the MNIST dataset. Normalize the images in the ( 1, 1) range before feeding them in the network. Report training and testing curves. You should be able to reach 99% accuracy. 3

Layers Convolution 1 Convolution 2 Convolution 3 Convolution 4 Convolution 5 Fully Connected Fully Connected Hyper-parameters Kernel size = (3, 3, 32), Padding=1 (SAME), ReLU activation. Kernel size = (3, 3, 64), Stride=2, Padding=1, ReLU activation. Kernel size = (3, 3, 128), Stride=2, Padding=1, ReLU activation. Kernel size = (3, 3, 128), Stride=2, Padding=1, ReLU activation. Kernel size = (3, 3, 128), Stride=2, Padding=1, ReLU activation. Output channels = 100. ReLU activation Output channels = 10. Softmax activation Table 2: Network architecture for part 2. 2. (20%) Implement the Sign activation function and the straight-through gradient estimator. For this, you will need to implement a custom gradient function. Hint: MySign.backward() in PyTorch, and MySignGrad() in TensorFlow. In TensorFlow you will have to use the @tf.registergradient(mysigngrad). 3. (5%) Modify the CNN from part 1 of this question to use Sign instead of ReLU in all layers except the output layer. Report the testing and training accuracy plots of the resulting BNN. You should be able to reach comparable accuracy. Why does the approximate gradient that we use makes sense for training a neural network? 3 Adversarial Images (35%) In this part you will see how you can use the gradients of the network to generate adversarial images. Using these images that look almost identical the original you will be able to fool different neural networks. You will also see that these images also affect different neural networks and expose a security issue of CNNs that malicious users can take advantage of. An example is shown in Figure 1. You are encouraged to read the relevant papers [3, 7] before solving this part. 1. (10%) Use the trained network from question 2 to generate adversarial images with constraints. The constraints that you have are (a) You are not allowed to erase parts of the image, i.e. I pert I at each pixel location. (b) The perturbed image has to take valid values, i.e. 1 I pert 1. The algorithm works as follows: (a) Let I be a test image of your dataset that you want to perturb that is classified 4

Figure 1: An adversarial example demonstrated in [3]. correctly by the network. Let I ɛ be the perturbation that you should initialize with zeros. (b) Feed I pert = I + I ɛ in the network. (c) Calculate the loss given the ground truth label (y gt ). Let the loss be L(x, y θ) where θ are the learned weights. (d) Compute the gradients with respect to I pert, i.e., Ipert L(I pert, y gt θ). Using backpropagation, compute Iɛ L(I ɛ, y gt θ), i.e. the gradients with respect to the perturbed image. (e) Round up to a small perturbation and add to the input image, i.e., I ɛ = I ɛ + ɛ sign ( Iɛ L(I ɛ, y gt )), where ɛ is a small constant of your choice. (f) Repeat (a)-(d) until the network classify the input image I pert as an arbitrary wrong category with confidence (probability) at least 90%. Generate 2 examples of adversarial images. Describe the difference between adversarial images and original images. 2. (10%) For a test image from the dataset, choose a target label y t that you want the network to classify your image as and compute a perturbed image. Note that this is different from what you are asked in part 1, because you want your network to believe that the image has a particular label, not just misclassify the image. You need to modify appropriately the loss function and then perform gradient descent as before. You should still use the constraints from part 1. 3. (10%) Repeat part 1, with the additional constraint that the perturbation has to be binary. You should use the binary activation from the previous question for this part. 5

4. (5%) Use the adversarial images you generated in the previous parts and feed them in the network from question 2. What do you observe? References [1] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html. [2] Y. Bengio, N. Léonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arxiv preprint arxiv:1308.3432, 2013. [3] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arxiv preprint arxiv:1412.6572, 2014. [4] G. Hinton, N. Srivastava, and K. Swersky. Neural networks for machine learning. [5] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. In Advances in neural information processing systems, pages 4107 4115, 2016. [6] Y. LeCun, C. Cortes, and C. Burges. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2, 2010. [7] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. arxiv preprint arxiv:1610.08401, 2016. 6