Solving Higgs Boson Machine Learning. Challenge using Neural Networks

Size: px

Start display at page:

Download "Solving Higgs Boson Machine Learning. Challenge using Neural Networks"

Sophia Nichols
5 years ago
Views:

1 Solving Higgs Boson Machine Learning Challenge using Neural Networks 1 Solving Higgs Boson Machine Learning Challenge using Neural Networks Varun Thumbe [13773] Satya Prakash P [14610] Indian Institute of Technology, Kanpur Mentor: Purushottam Kar

2 Solving Higgs Boson Machine Learning Challenge using Neural Networks 2 Abstract This project is an attempt to apply the machine learning class of techniques widely known as Neural Networks in order to solve the Higgs Boson Machine Learning posed by CERN on Kaggle.com. The data provided on the website is not complete with several missing values making the task of classifying events as signal or background difficult.

3 Solving Higgs Boson Machine Learning Challenge using Neural Networks 3 Introduction: The task of binary classification is a well-known problem solved using several machine learning techniques such as Support Vector Machines, Decision Trees, Regularized Greedy Forests, Bayesian Classification and Neural Networks. Neural Networks are machine learning models developed using algorithms inspired from their biological counterparts. Neural Networks have been shown to be impressive at solving difficult machine leaning tasks. They are being widely used in speech and image recognition tasks. The problem with neural networks is however the need for large training data and commutation to train them. When sufficient data is not present, it often becomes difficult to get competent results. Motivation: A particle physics experiment named ATLAS is being conducted by CERN using their Large Hadron Collider. While conducting experiments, data of larger particles decaying into smaller ones is obtained. It becomes important for scientists to make sense of the collected data. This task is difficult because the signal of a large particle (Higgs Boson) decaying into smaller particles (tau particles) becomes embedded in background noise. In this task, we are required to classify events into tau tau decay or background. An event is a crossing of accelerated bunches which results in production of hundreds of millions of proton-proton collisions. Winning teams used a multitude of classifiers and decided the output by averaging out the computed probabilities. On building our own neural networks, we recognized that we didn t have the required computational resources and time in order to train a large number of neural networks.

4 Solving Higgs Boson Machine Learning Challenge using Neural Networks 4 Dataset: Link to page containing data: The data is of two types. There is a training set of events and 30 feature columns, and a test set of events and 30 feature columns. Several entries were meaningless or could not be computed. All such values were assigned Theory: Feedforward Neural Networks: Feedforward Neural Networks allow signals to travel only in one direction, i.e. from input to output. The output of a certain layer does not affect the same layer; or in other words there is no feedback. For feedforward the error can be easily computed and can be used to train the network using methods like stochastic gradient descent using backpropagation. We used feedforward neural networks in our project. Image source : Feedback Neural Networks:

5 Solving Higgs Boson Machine Learning Challenge using Neural Networks 5 In these kind of neural networks, the output of a layer can affect preceding layers and hence can affect itself. These neural networks are difficult to model and become complicated with even a small number of neurons. Image source : Backpropagation : The equations for backpropagation are as follows: Here, is the symbol for the Hadamard product of two vectors and is the output function, which in this case is the sigmoid function for sigmoid neurons. The sigmoid function works as follows:

6 Solving Higgs Boson Machine Learning Challenge using Neural Networks 6 The output of a neuron is given by: Stochastic Gradient Descent: When the training data is large, it becomes time consuming to iterate over all training data. We, hence, used stochastic gradient descent to thwart this problem. Below are the equations used to update the weight vectors and biases of feedforward neural networks. Here, C is the cost function, w the weight vector, b the bias vector and η is the learning rate. It is clear that we need the gradient C. In stochastic gradient descent, we estimate this by instead computing Cx for a small sample of randomly chosen training inputs. By averaging over this sample we get estimate of the true gradient C, helping the speed up gradient descent and learning rate. Fully Connected Layers As the name suggests, in these type of neural networks, all the neurons across two adjacent layers are connected to each other as shown in the image. In our project, we tried fully connected layers with 30 and 50 neurons. We tried using several layers of fully connected layers, but found that on doing so the weight vectors of the layers near the input layer become less susceptible to change as a result of how the back-propagation algorithm has been devised. Also,

Solving Higgs Boson Machine Learning Challenge using Neural Networks 7 it became apparent that in order to train a larger number layers, we need a greater number of epochs.

7 Solving Higgs Boson Machine Learning Challenge using Neural Networks 7 it became apparent that in order to train a larger number layers, we need a greater number of epochs. Image source: Softmax Layer: Softmax layers are layers which use the softmax function to give output values. The output of the softmax layer is decides using the softmax function: Using this we can estimate the probability if whether a test instance belongs to specific class as follows: Since our challenge required us to construct a binary classifier, we only had two softmax neurons in our output layer. Dropout: Neural networks are prone to overfitting once with increase in the number of epochs. In order to prevent our neural network from overfitting we used dropout as a solution. We assign a

8 Solving Higgs Boson Machine Learning Challenge using Neural Networks 8 probability p to each node. This probability decides whether or the not that node will be used to testing. It has been found that a dropout value of tends to give good results. Implementation of Neural Network: We build our neural networks for the project using the Theano library. The code was run on a computer with 2.6 Ghz Dual Core processor and a NVIDA GeForce GT 720M 625Mhz GPU. We would like to thank deeplearning.net and neuralneworksanddeeplearning.com for their excellent text and documentation making it possible for us to implement our own neural networks. Methodology and Experiments: 1. To prevent any unwanted biases to be learned by our neural network, we decided to first randomly distribute our data before it is fed to our neural network for training. 2. We first used lib-svm over the provided data to figure if there are any discernable hard and easy features. Hard features are those using which training of a classifier does not give accuracies. On doing so we found that few of the features where barely useful providing with accuracies of 60 % and that the best features gave us 65 %. We trained our neural network with the easiest features and with all the features. With only the easy

Solving Higgs Boson Machine Learning Challenge using Neural Networks 9 features we got an accuracy of only 64 %. While with the all the features, we got an accuracy of 65.5 %.

9 Solving Higgs Boson Machine Learning Challenge using Neural Networks 9 features we got an accuracy of only 64 %. While with the all the features, we got an accuracy of 65.5 %. On trying different permutations, we always found that the network learned best with a greater number of features. 3. We decided to normalize our data by calculating the mean and standard deviation of all the training data and found that we got an increase in accuracy by 4 % 4. The above results motivated us to try adding new derivative features from the exiting features. In order to develop these derivative features, we used the fact that the data recorded should be invariant along the x-y plane and the z-axis, since the collisions of the particles occur along the z-axis in the Large Hadron Collider. On doing so, we didn t find a significant change in accuracy and found an improvement of around 1 %. 5. We tried different configurations of layers with as many as 3 hidden layers but we did not any get significant increase in the accuracy of the testing data. 6. We then implemented dropout with different dropout probabilities. We Hard and Soft Features Removal of Hard features Addition of Derivative features Different Network Configurations Dropout L2 Reqularization Cross-validation reached 71 % on using dropout. In order to prevent over-fitting, we tried L2 regularization of the weight parameters and biases in the cost function in the presence and absence of dropout. We achieved 71.8% on the training data with a lambda of 0.5 and dropout probability of We now plan on further improving our results by cross-validating out data and also by implementing a small number of multiple neural networks and averaging out the results

10 Solving Higgs Boson Machine Learning Challenge using Neural Networks 10 References Adam-Bourdarios, Claire, et al. "Learning to discover: the Higgs boson machine learning challenge." URL lal. in2p3. fr/documentation (2014). Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv: Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F., & Schmidhuber, J. (2013). Compete to compute. In Advances in Neural Information Processing Systems (pp ). Wang, H., & Raj, B. (2015). A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas. arxiv preprint arxiv: Gold, Steven, and Anand Rangarajan. "Softmax to softassign: Neural network algorithms for combinatorial optimization." Journal of Artificial Neural Networks 2.4 (1996): Neural networks and deep learning Deep Learning Deeplearning.net/tutorials/

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering