Solving Higgs Boson Machine Learning. Challenge using Neural Networks

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Lecture 1: Machine Learning Basics

Artificial Neural Networks written examination

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Softprop: Softmax Neural Network Backpropagation Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

INPE São José dos Campos

Test Effort Estimation Using Neural Network

Second Exam: Natural Language Parsing with Neural Networks

(Sub)Gradient Descent

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Evolutive Neural Net Fuzzy Filtering: Basic Description

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.lg] 15 Jun 2015

CS Machine Learning

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Knowledge Transfer in Deep Convolutional Neural Nets

Artificial Neural Networks

Uta Bilow, TU Dresden

arxiv: v1 [cs.lg] 7 Apr 2015

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Model Ensemble for Click Prediction in Bing Search Ads

Learning Methods for Fuzzy Systems

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

CSL465/603 - Machine Learning

arxiv: v1 [cs.cv] 10 May 2017

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Axiom 2013 Team Description Paper

Human Emotion Recognition From Speech

Modeling function word errors in DNN-HMM based LVCSR systems

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Learning to Schedule Straight-Line Code

Classification Using ANN: A Review

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Circuit Simulators: A Revolutionary E-Learning Platform

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

An empirical study of learning speed in backpropagation

Assignment 1: Predicting Amazon Review Ratings

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Indian Institute of Technology, Kanpur

SARDNET: A Self-Organizing Feature Map for Sequences

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

THE enormous growth of unstructured data, including

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Calibration of Confidence Measures in Speech Recognition

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Attributed Social Network Embedding

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Speaker Identification by Comparison of Smart Methods. Abstract

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Dialog-based Language Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

arxiv: v2 [cs.ir] 22 Aug 2016

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Basic Concepts of Machine Learning

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Device Independence and Extensibility in Gesture Recognition

A Deep Bag-of-Features Model for Music Auto-Tagging

arxiv: v4 [cs.cl] 28 Mar 2016

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Residual Stacking of RNNs for Neural Machine Translation

Rule Learning with Negation: Issues Regarding Effectiveness

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v5 [cs.ai] 18 Aug 2015

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Deep Neural Network Language Models

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Early Model of Student's Graduation Prediction Based on Neural Network

Generative models and adversarial training

Lip Reading in Profile

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Using focal point learning to improve human machine tacit coordination

Semi-Supervised Face Detection

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

WHEN THERE IS A mismatch between the acoustic

Exploration. CS : Deep Reinforcement Learning Sergey Levine

On the Formation of Phoneme Categories in DNN Acoustic Models

Time series prediction

A Review: Speech Recognition with Deep Learning Methods

SORT: Second-Order Response Transform for Visual Recognition

A deep architecture for non-projective dependency parsing

Soft Computing based Learning for Cognitive Radio

Learning From the Past with Experiment Databases

Transcription:

Solving Higgs Boson Machine Learning Challenge using Neural Networks 1 Solving Higgs Boson Machine Learning Challenge using Neural Networks Varun Thumbe [13773] Satya Prakash P [14610] Indian Institute of Technology, Kanpur Mentor: Purushottam Kar

Solving Higgs Boson Machine Learning Challenge using Neural Networks 2 Abstract This project is an attempt to apply the machine learning class of techniques widely known as Neural Networks in order to solve the Higgs Boson Machine Learning posed by CERN on Kaggle.com. The data provided on the website is not complete with several missing values making the task of classifying events as signal or background difficult.

Solving Higgs Boson Machine Learning Challenge using Neural Networks 3 Introduction: The task of binary classification is a well-known problem solved using several machine learning techniques such as Support Vector Machines, Decision Trees, Regularized Greedy Forests, Bayesian Classification and Neural Networks. Neural Networks are machine learning models developed using algorithms inspired from their biological counterparts. Neural Networks have been shown to be impressive at solving difficult machine leaning tasks. They are being widely used in speech and image recognition tasks. The problem with neural networks is however the need for large training data and commutation to train them. When sufficient data is not present, it often becomes difficult to get competent results. Motivation: A particle physics experiment named ATLAS is being conducted by CERN using their Large Hadron Collider. While conducting experiments, data of larger particles decaying into smaller ones is obtained. It becomes important for scientists to make sense of the collected data. This task is difficult because the signal of a large particle (Higgs Boson) decaying into smaller particles (tau particles) becomes embedded in background noise. In this task, we are required to classify events into tau tau decay or background. An event is a crossing of accelerated bunches which results in production of hundreds of millions of proton-proton collisions. Winning teams used a multitude of classifiers and decided the output by averaging out the computed probabilities. On building our own neural networks, we recognized that we didn t have the required computational resources and time in order to train a large number of neural networks.

Solving Higgs Boson Machine Learning Challenge using Neural Networks 4 Dataset: Link to page containing data: https://www.kaggle.com/c/higgs-boson/data The data is of two types. There is a training set of 250000 events and 30 feature columns, and a test set of 550000 events and 30 feature columns. Several entries were meaningless or could not be computed. All such values were assigned -999.0. Theory: Feedforward Neural Networks: Feedforward Neural Networks allow signals to travel only in one direction, i.e. from input to output. The output of a certain layer does not affect the same layer; or in other words there is no feedback. For feedforward the error can be easily computed and can be used to train the network using methods like stochastic gradient descent using backpropagation. We used feedforward neural networks in our project. Image source :http://neuralnetworksanddeeplearning.com/images/tikz1.png Feedback Neural Networks:

Solving Higgs Boson Machine Learning Challenge using Neural Networks 5 In these kind of neural networks, the output of a layer can affect preceding layers and hence can affect itself. These neural networks are difficult to model and become complicated with even a small number of neurons. Image source :http://www.psych.utoronto.ca/users/reingold/courses/ai/cache/neural6.gif Backpropagation : The equations for backpropagation are as follows: Here, is the symbol for the Hadamard product of two vectors and is the output function, which in this case is the sigmoid function for sigmoid neurons. The sigmoid function works as follows:

Solving Higgs Boson Machine Learning Challenge using Neural Networks 6 The output of a neuron is given by: Stochastic Gradient Descent: When the training data is large, it becomes time consuming to iterate over all training data. We, hence, used stochastic gradient descent to thwart this problem. Below are the equations used to update the weight vectors and biases of feedforward neural networks. Here, C is the cost function, w the weight vector, b the bias vector and η is the learning rate. It is clear that we need the gradient C. In stochastic gradient descent, we estimate this by instead computing Cx for a small sample of randomly chosen training inputs. By averaging over this sample we get estimate of the true gradient C, helping the speed up gradient descent and learning rate. Fully Connected Layers As the name suggests, in these type of neural networks, all the neurons across two adjacent layers are connected to each other as shown in the image. In our project, we tried fully connected layers with 30 and 50 neurons. We tried using several layers of fully connected layers, but found that on doing so the weight vectors of the layers near the input layer become less susceptible to change as a result of how the back-propagation algorithm has been devised. Also,

Solving Higgs Boson Machine Learning Challenge using Neural Networks 7 it became apparent that in order to train a larger number layers, we need a greater number of epochs. Image source: http://cs231n.github.io/assets/nn1/neural_net2.jpeg Softmax Layer: Softmax layers are layers which use the softmax function to give output values. The output of the softmax layer is decides using the softmax function: Using this we can estimate the probability if whether a test instance belongs to specific class as follows: Since our challenge required us to construct a binary classifier, we only had two softmax neurons in our output layer. Dropout: Neural networks are prone to overfitting once with increase in the number of epochs. In order to prevent our neural network from overfitting we used dropout as a solution. We assign a

Solving Higgs Boson Machine Learning Challenge using Neural Networks 8 probability p to each node. This probability decides whether or the not that node will be used to testing. It has been found that a dropout value of 0.5 0.6 tends to give good results. Implementation of Neural Network: We build our neural networks for the project using the Theano library. The code was run on a computer with 2.6 Ghz Dual Core processor and a NVIDA GeForce GT 720M 625Mhz GPU. We would like to thank deeplearning.net and neuralneworksanddeeplearning.com for their excellent text and documentation making it possible for us to implement our own neural networks. Methodology and Experiments: 1. To prevent any unwanted biases to be learned by our neural network, we decided to first randomly distribute our data before it is fed to our neural network for training. 2. We first used lib-svm over the provided data to figure if there are any discernable hard and easy features. Hard features are those using which training of a classifier does not give accuracies. On doing so we found that few of the features where barely useful providing with accuracies of 60 % and that the best features gave us 65 %. We trained our neural network with the easiest features and with all the features. With only the easy

Solving Higgs Boson Machine Learning Challenge using Neural Networks 9 features we got an accuracy of only 64 %. While with the all the features, we got an accuracy of 65.5 %. On trying different permutations, we always found that the network learned best with a greater number of features. 3. We decided to normalize our data by calculating the mean and standard deviation of all the training data and found that we got an increase in accuracy by 4 % 4. The above results motivated us to try adding new derivative features from the exiting features. In order to develop these derivative features, we used the fact that the data recorded should be invariant along the x-y plane and the z-axis, since the collisions of the particles occur along the z-axis in the Large Hadron Collider. On doing so, we didn t find a significant change in accuracy and found an improvement of around 1 %. 5. We tried different configurations of layers with as many as 3 hidden layers but we did not any get significant increase in the accuracy of the testing data. 6. We then implemented dropout with different dropout probabilities. We Hard and Soft Features Removal of Hard features Addition of Derivative features Different Network Configurations Dropout L2 Reqularization Cross-validation reached 71 % on using dropout. In order to prevent over-fitting, we tried L2 regularization of the weight parameters and biases in the cost function in the presence and absence of dropout. We achieved 71.8% on the training data with a lambda of 0.5 and dropout probability of 0.6. 7. We now plan on further improving our results by cross-validating out data and also by implementing a small number of multiple neural networks and averaging out the results

Solving Higgs Boson Machine Learning Challenge using Neural Networks 10 References Adam-Bourdarios, Claire, et al. "Learning to discover: the Higgs boson machine learning challenge." URL http://higgsml. lal. in2p3. fr/documentation (2014). Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv:1207.0580. Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F., & Schmidhuber, J. (2013). Compete to compute. In Advances in Neural Information Processing Systems (pp. 2310-2318). Wang, H., & Raj, B. (2015). A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas. arxiv preprint arxiv:1510.04781. Gold, Steven, and Anand Rangarajan. "Softmax to softassign: Neural network algorithms for combinatorial optimization." Journal of Artificial Neural Networks 2.4 (1996): 381-399. Neural networks and deep learning http://neuralnetworksanddeeplearning.com Deep Learning Deeplearning.net/tutorials/