International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, www.ijcea.com ISSN 2321-3469 A CASE STUDY ON TENSORFLOW AND ARTIFICIAL NEURAL NETWORKS Vivekanandan B 1, Hemadarshini M V 2 1 AVIN Systems Private Limited Bangalore, India 2 Department of Mathematics Mount Carmel College Bangalore, India ABSTRACT: There are a number of machine learning software in use. In this paper, we look at the implementation of artificial neural networks using TensorFlow A machine learning software developed by Google. TensorFlow has been used to solve two image classification problems Classification of digits of MNIST database and classification of traffic signs of GTSRB database. Classification of MNIST digits could be done with 99% accuracy after 20000 iterations with every 500 iterations taking about 25 seconds. Classification of GTSRB traffic signs could be done with about 70% accuracy with every iteration taking about 9 minutes. Both these neural networks have been trained and tested on a CPU at 2.3GHz and 4 GB RAM. It has been noted that CPUs are fast enough to implement neural networks on a small scale. For real world applications, neural networks should have considerably more parameters which can conveniently be trained and implemented on GPUs. It has been noted that TensorFlow is capable of handling convolutional neural networks (CNNs) in an efficient manner. Keywords: TensorFlow, Artificial Intelligence, Neural Networks, Machine Learning, Image Classification [1] INTRODUCTION Artificial intelligence has started playing a role in our daily lives. In particular, artificial neural networks have been used to solve a number of complex problems such as object recognition, image classification, speech recognition, text classification, natural language processing and so on. Convolutional neural networks (CNNs) are well suited to image tasks such as object detection, segmentation and image classification [1]. The implementation of CNNs has Vivekanandan B and Hemadarshini M V 1

A CASE STUDY ON TENSORFLOW AND ARTIFICIAL NEURAL NETWORKS outperformed human performance in certain simple cases of image classification [2]. Deep convolutional neural networks like AlexNet [3] have given state-of-the-art performance in image classification in the ImageNet LSVRV-2010 contest. Object detection and classification is an important sub-problem in the field of autonomous vehicle research. Hierarchical CNNs have been used in the classification of traffic sign images and have performed well [4]. CNNs have also been used for end-to-end learning for self-driving cars [5] and road detection [6]. In this paper, a simple convolutional neural network has been described using TensorFlow [7] to classify images of handwritten digits. Visualizations have been done using TensorBoard. Another case of classification of traffic signs has also been done using TensorFlow. In both these cases, the performance of the network as well as runtime on the CPU has been determined. [2] TENSORFLOW TensorFlow is a machine learning software developed by Google. The unique feature of TensorFlow is that all the data objects are stored as tensors, including scalar values which are considered to be tensors of rank zero. In image processing, greyscale images are stored as rank two tensors (referred as arrays) and color images are stored as rank three tensors. This makes the implementation of convolutional neural network simple. The entire flow of computations in any TensorFlow program is stored as a computational graph which runs within a session object. For complex machine learning programs, the modules can also be distributed across two or more CPUs and GPUs. TensorBoard is a visualization software which helps users to interpret the performance of the machine learning algorithm used. [3] DIGIT CLASSIFICATION Digit classification is a benchmark problem in image classification. A 3-layer convolutional neural network using TensorFlow has been developed to solve this problem. MNIST database [8] (Modified National Institute of Standards and Technology database) has been used in this neural network. It consists of images of handwritten digits in greyscale. MNIST is a standard for image classification wherein several image-classifying neural networks are initially tested on MNIST before testing on the actual dataset. The MNIST database consists of a training set of 55000 images and a testing set of 10000 images. Each image is of standard pixel size 28 x 28 and has a label which indicates the correct classification. The task of the neural network is to classify each image of the dataset i.e., it has to predict which digit each image corresponds to. TensorBoard (A visualization tool for TensorFlow) has been used to interpret the performance of the network. [3.1] NETWORK ARCHITECTURE Vivekanandan B and Hemadarshini M V 2

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, www.ijcea.com ISSN 2321-3469 The neural network is a convolutional neural network in which each layer of the network is connected to only a fraction of the neurons of the next layer. The first layer is a convolution layer along with ReLU (Rectified Linear Unit) activation followed by a max pooling layer. The second layer is similar to the first layer with fewer number of filters or activation layers. This is followed by a fully connected layer and the output layer. The output is the prediction made by the network as to which digit has been input to the network. [3.2] NETWORK PARAMETERS Initialization: Initialization has been done by normal distribution so that cases of zero gradients are avoided. Pooling: Max pooling has been used where the activation layer is down-sampled by taking the maximum of the values in the pooling window. Loss function: Softmax of cross entropy [9] has been used. Cross entropy is given by: H = yi log(yi) Where yi is the true distribution and yi is the predicted distribution. Softmax is the normalized exponential function. The loss function used for the training is softmax of cross entropy. Optimizer: ADAM (Adaptive moment estimation) optimizer [10] has been used. This optimizer is unique in that given an initial learning rate, the gradient used in each step is updated with the help of momenta and thus outperforms the simple gradient descent optimizer. Stochastic training: Instead of training the network with the entire training set in each iteration, smaller random batches are used for each iteration. This reduces the training time effectively without altering the performance by a large amount. [3.3] TENSORBOARD SUMMARIES TensorBoard has been used in order to interpret the performance of the network. Summaries are files containing information which can then be written to an event file. When TensorBoard is opened to show a particular event file, all the summaries written in that file can be visualized. For the digit classification neural network program, summaries of loss and accuracy have been taken every 100 iterations. The computational graph of the neural network has also been visualized. Images of the trained filters of both the hidden layers in the network have also been added. Vivekanandan B and Hemadarshini M V 3

A CASE STUDY ON TENSORFLOW AND ARTIFICIAL NEURAL NETWORKS Figure: 1. Summary of input samples as seen in TensorBoard. Figure: 2. Computational graph of the neural network as visualized using TensorBoard. [3.4] RESULTS AND DISCUSSION Training time: It has been noted that every 500 iterations takes about 25 seconds on the CPU. To reach a high level of accuracy of about 99%, the network had to be trained for 20000 iterations which took 40 minutes on the CPU. Vivekanandan B and Hemadarshini M V 4

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, www.ijcea.com ISSN 2321-3469 Figure: 3. TensorBoard summary of accuracy history through 2500 iterations when trained from scratch. Accuracy: As shown in [Figure-3], the accuracy was observed to increase drastically in the first hundred iterations. It then starts fluctuating between 0.8 and 1. After a few hundred iterations, the fluctuations become smaller and then settle down at 97% at the end of 2500 iterations. Figure: 4. TensorBoard summary of loss history through 2500 iterations when trained from scratch. Loss: As seen in [Figure-4], the evaluation of loss is an inverse function of accuracy. As discussed above, a very good accuracy of about 99% could be reached when the neural network is trained for 40 minutes from scratch. When trained on a GPU, the training time would be much shorter. TensorBoard gives a dynamic representation of the entire computational graph as shown in [Figure-2]. This eases the debugging of the neural network in case of misconnections in the network. Filters or the weights of the trained neural network can also be visualized using TensorBoard. However, the neural network tends to remain a black box with respect to the Vivekanandan B and Hemadarshini M V 5

A CASE STUDY ON TENSORFLOW AND ARTIFICIAL NEURAL NETWORKS hidden layers even after visualization, as is evident from [Figure-5] and [Figure-6]. What each of these filters is trying to search for in the image is hard to understand. Figure: 5. Filters learned in the first layer by the neural network after training for 2500 iterations. Figure: 6. Filters learned in the second layer by the neural network after training for 2500 iterations. [4] TRAFFIC SIGN CLASSIFICATION Traffic sign classification is a sub-problem of image understanding in the domain of autonomous vehicles. Once a traffic sign has been detected, it has to be classified so that a corresponding decision is taken by the self-driving vehicle. For example, if the traffic sign detected corresponds to speed limit of 70 mph, then corrective measure has to be taken to bring the vehicle within the speed limit. GTSRB [11] (German Traffic Sign Recognition Benchmark) has been used. The GTSRB database consists of 32 x 32 x 3 images (3-channel color images) and has 43 classes or categories. The training set consists of about 39000 images and testing set consists of 12000 images. Vivekanandan B and Hemadarshini M V 6

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, www.ijcea.com ISSN 2321-3469 The network architecture is similar to the digit classification architecture, but has more number of hidden layers. It also incorporates dropout [12] after fully connected layers so that the network can generalize better. The training of the network is done with the entire training set for each iteration. [4.1] RESULTS AND DISCUSSION Training time: It has been observed that every iteration takes about 9 minutes on the CPU. This makes the training slow. A test accuracy of about 70% was achieved after 45 iterations or epochs which took about 7 hours to train on the CPU. For the training of the neural network in shorter time, GPUs can be used. Accuracy: An accuracy of about 70% has been achieved. For the accuracy to be higher, data augmentation [13] can be done. Data augmentation is the process of expanding the given dataset by making minor changes to the images of the dataset such as angle tilt and rotation. But, if data augmentation is done and the dataset is expanded, the training time would be longer. As discussed earlier, training the neural network on a CPU can take more time and could even take several weeks of training for more complex neural networks developed for real world applications. Therefore, GPUs can be used for faster training and testing [14]. [5] CONCLUSION TensorFlow is an easy-to-use software for machine learning as it provides a number of built-in functions which effectively hide the complex mathematics behind their implementation. Since the entire computational graph is known to TensorFlow, it automatically implements the backpropagation algorithm for tuning weights during training of the neural network. TensorBoard, which comes along with TensorFlow proves to be a handy tool for visualization of the computational graph, evolution of performance parameters such as accuracy. It even gives a glimpse of the hidden layers of the neural network such as filters. One of the disadvantages of TensorFlow is that it keeps construction and computation of the computational graph separate which makes it hard for debugging. This delinking between construction and computation also makes the development cycle very slow. In general, neural networks require enormous processing power during both training and testing which makes them slow on CPUs. CPUs can be used for implementing small scale neural networks. But they can become slow even for slightly complex neural networks with more than two hidden layers. This becomes all the more evident when a huge dataset is used. In such cases, it is better to go for GPUs. Vivekanandan B and Hemadarshini M V 7

A CASE STUDY ON TENSORFLOW AND ARTIFICIAL NEURAL NETWORKS REFERENCES [1] Ashwin Bhandare, Maithili Bhinde, Pranav Gokhale and Rohan Chandvarkar, Applications of Convolutional Neural Networks. In the International Journal of Computer Science and Information Technologies.2016. https://ijcsit.com/docs/volume%207/vol7issue5/ijcsit20160705014.pdf [2] Li Wan, Matthew Zeiler, Yann Le Cun, Regularization of Neural Network using DropConnect. In: ICML'13 Proceedings of the 30th International Conference on Machine Learning. 2013. yann.lecun.com/exdb/publis/pdf/wan-icml-13.pdf [3] Alex Krizhevsky, Ilya Sutskever and Geoffrey E.Hinton, ImageNet Classification Classification with Deep Convolutional Neural Networks. Advances in neural information processing systems. 2012. [4] Samer Hijazi, Rishi Kumar and Chris Rowen, Using Convoltuional Neural Networks for Image Recognition, URL https://ip.cadence.com>cnn_wp-pdf [5] Mariusz Bojarski, Davide Del Testa, End to End Learning for Self-Driving Cars. arxiv: 1604.07316v1 [cs.cv] 2016. [6] Jake Stolee, Yuan Wang, A Survey of Machine Learning Techniques for Road Detection URL https://www.cs.toronto.edu/~jstolee/projects/road_detection.pdf [7] Martin Abadi, Paul Barham, "TensorFlow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, USENIX Association pp. 265-283. (2016) [8] Yann LeCun, Corinna Cortes and Christopher J.C. Burges, THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/ [9] R.A.Dunne, N.A.Campbell, On The Pairing Of The Softmax Activation And Cross- Entropy Penalty Functions And The Derivation Of The Softmax Activation Function. In: Proceedings of 8th Australian Conference on Neural Networks, Melbourne. Pp 181-185 (1997) [10] Diederik.P.Kingma, Jimmy Lei Ba, A Method For Stochastic Optimization. arxiv: 1412.6980v9 [cs.lg] (2017). [11] J.Stallkamp, M.Sclipsing, J.Salmen, Man vs Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition. In: Neural Networks Vol 32. Aug 2012. Pp 323-332 [12] Nitish Srivastava, Geoffrey Hinton, Alex Krishevsky, Dropout: A Simple Way to Prevent Neural Networks from Overfitting. In: JMLR (2015). [13] Sebastien C. Wong, Adam Gatt, Victor Stamatescu. Understanding data augmentation for classification: when to warp?. arxiv: 1609.08764v2 (2016) [14] Zhongwen Luo, Hongzhi Liu, Xincai Wu. Artificial neural network Computation on graphic process unit. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005, Vol.1, pp 622-626 (2005) Vivekanandan B and Hemadarshini M V 8