Pricing Football Players using Neural Networks

Similar documents
Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Machine Learning Basics

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

Softprop: Softmax Neural Network Backpropagation Learning

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

An empirical study of learning speed in backpropagation

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v1 [cs.lg] 15 Jun 2015

Assignment 1: Predicting Amazon Review Ratings

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

INPE São José dos Campos

Learning Methods for Fuzzy Systems

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Knowledge Transfer in Deep Convolutional Neural Nets

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Model Ensemble for Click Prediction in Bing Search Ads

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v1 [cs.lg] 7 Apr 2015

Dropout improves Recurrent Neural Networks for Handwriting Recognition

CSL465/603 - Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Word Segmentation of Off-line Handwritten Documents

Attributed Social Network Embedding

Human Emotion Recognition From Speech

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

arxiv: v1 [cs.cv] 10 May 2017

Exploration. CS : Deep Reinforcement Learning Sergey Levine

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Rule Learning With Negation: Issues Regarding Effectiveness

A study of speaker adaptation for DNN-based speech synthesis

Georgetown University at TREC 2017 Dynamic Domain Track

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Evolutive Neural Net Fuzzy Filtering: Basic Description

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speech Emotion Recognition Using Support Vector Machine

Test Effort Estimation Using Neural Network

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Your School and You. Guide for Administrators

Modeling function word errors in DNN-HMM based LVCSR systems

How to Judge the Quality of an Objective Classroom Test

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Seminar - Organic Computing

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Application of Virtual Instruments (VIs) for an enhanced learning environment

Generative models and adversarial training

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

A Deep Bag-of-Features Model for Music Auto-Tagging

Offline Writer Identification Using Convolutional Neural Network Activation Features

WHEN THERE IS A mismatch between the acoustic

CROSS COUNTRY CERTIFICATION STANDARDS

On the Combined Behavior of Autonomous Resource Management Agents

SARDNET: A Self-Organizing Feature Map for Sequences

arxiv: v4 [cs.cl] 28 Mar 2016

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Deep Neural Network Language Models

TD(λ) and Q-Learning Based Ludo Players

ONE YEAR IN BARCELONA, PART I+II

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Calibration of Confidence Measures in Speech Recognition

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Lecture 1: Basic Concepts of Machine Learning

Sports Marketing Mgt 3205

An OO Framework for building Intelligence and Learning properties in Software Agents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

arxiv: v1 [cs.cl] 20 Jul 2015

Modeling function word errors in DNN-HMM based LVCSR systems

Cincinnati Country Day Middle School Parents Athletics Handbook

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

A Reinforcement Learning Variant for Control Scheduling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

WORK OF LEADERS GROUP REPORT

Algebra 2- Semester 2 Review

Speech Recognition at ICSI: Broadcast News and beyond

Learning to Schedule Straight-Line Code

Soft Computing based Learning for Cognitive Radio

Laboratorio di Intelligenza Artificiale e Robotica

CATALOG WinterAddendum

Using Proportions to Solve Percentage Problems I

Report on organizing the ROSE survey in France

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

MYCIN. The MYCIN Task

Transcription:

Pricing Football Players using Neural Networks Sourya Dey Final Project Report Neural Learning and Computational Intelligence April 2017, University of Southern California Abstract: We designed a multilayer perceptron neural network to predict the price of a football (soccer) player using data on more than 15,000 players from the football simulation video game FIFA 2017. The network was optimized by experimenting with different activation functions, number of neurons and layers, learning rate and its decay, Nesterov momentum based stochastic gradient descent, L2 regularization, and early stopping. Simultaneous exploration of various aspects of neural network training is performed and their trade-offs are investigated. Our final model achieves a top-5 accuracy of 87.2% among 119 pricing categories, and places any footballer within 6.32% of his actual price on average. Introduction: Football (or soccer, if you re in North America) is the most widespread team sport played in the world [1]. Apart from international football, the bulk of football matches take place at the level of domestic clubs. Most countries have an established football league where clubs representing different geographical regions compete against one another. There are no restrictions on the players who may represent a particular club. For example, the current starting lineup for Chelsea a club based in London has 2 players each from Brazil, Spain, France and Belgium, and 1 each from Nigeria, England and Serbia. One of the most intriguing aspects of football, and possibly the biggest headache of a football manager, is buying and selling players for appropriate prices. The price of a player is a function of his technical abilities such as ball control and tackling, but it also depends a lot on physical attributes such as speed and stamina due to the high energy levels required in the modern game. Factors such as the age of a player and reputation on the international stage also influence the price. Till date, there are no standardized means to price a football players and some star players such as Paul Pogba have recently switched clubs for prices in excess of $100 million [2]. The question which therefore arises is: what is the right price for a football player? This project uses data from FIFA 2017 to construct a pricing model for football players. FIFA 2017 is a football simulation video game developed by Electronic Arts (EA) Sports, which has official licenses for most of the major footballing teams and players in the world [3]. The complete roster in the game includes 17240 players, each of whom has attributes as mentioned above and a preset price. All these attributes and prices are updated on a weekly basis based on real-life performances, which allows FIFA 2017 to keep its data fresh [4]. The initial data for each player is gathered from a team of 9000 data reviewers comprising managers, scouts and season ticket holders [5]. They rate each player in 300 categories and provide EA Sports with the data, which is then fed into a weighting algorithm to compute the final attributes for each player to be used in the game. In rare cases, EA Sports bumps up or down the rating of a certain player on a subjective basis.

Percentage Accuracy This project builds a neural network which accepts player attributes as input and computes a price. A neural network is a machine learning technique where layers of neurons perform computations and update internal parameters based on training data. Supervised learning [6] with stochastic gradient descent [7] has been used for this project. There are 41 input features 37 on a scale of 0-99, age on a scale of 16-43, and 3 on a scale of 1-5 stars [8]. Goalkeeping features have not been used and consequently, the model does not work for goalkeepers. There are 119 pricing categories, the lowest being $43,000 and the highest $36,700,000. The prices occurring in-game are quantized to specific values, for example, a few hundred players are priced at $1,100,000. Taking a cue from how the MNIST dataset [9] is split, 10,914 players have been used for training, 1926 for validation, and 2500 for test. This project experiments with different activation functions [10] for different network layers, the number of hidden layers and the number of neurons in each, appropriate values for learning rate and its annealing, weight regularization using L2 norm, stochastic gradient descent using different batch sizes, Nesterov momentum based parameter updates in gradient descent, and early stopping of training to prevent overfitting [11], [12]. The final top-5 accuracy obtained is 87.2%. Network Experiments: Activation Functions: Any activation function can be used for the hidden layers, such as Rectified Linear Unit (ReLU), hyperbolic tangent (tanh) or sigmoid. The ideal output is one-hot encoded in 119 categories. So the activation function for the output layer needs to be between 0 and 1, leading to a choice between the squashing sigmoid function and the softmax probability density. Experiments led me to pick ReLU activation for all the hidden layers except for the output, which is softmax. 6 5 4 3 2 1 0 500 1500 2500 3500 4500 Number of Hidden Neurons Sigmoid+Softmax Tanh+Softmax ReLU+Softmax ReLU+Sigmoid Number of Neurons in 1 st Hidden Layer: A trend wasn t clearly discernible. Maximum accuracy is obtained for 3900 neurons, but after other parameters are varied, 2000 neurons is a better choice.

Percentage Accuracy 6 5.5 5 4.5 4 3.5 3 2.5 2 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of Hidden Neurons Learning Rate: I initially varied the learning rate logarithmically, then switched to linear variations to fine tune it. A value of 0.01 is chosen. 21 18 15 12 9 6 3 0 1.0E-06 1.0E-05 1.0E-04 1.0E-03 1.0E-02 Learning Rate

Percentage Accuracy Percentage Accuracy 22 20 18 16 14 12 10 0.008 0.010 0.033 0.067 0.100 0.133 0.167 0.200 0.250 0.300 Learning Rate 200 1000 2000 3000 The color coded numbers (200,1000,2000,3000) are the number of neurons in the 2 nd hidden layer. Number of Neurons in subsequent Hidden Layers: No trend was clearly observed for the 2 nd hidden layer. For the 3 rd, less neurons gave better performance. The final numbers chose were 1500 and 500. This gives a final network configuration of [41,2000,1500,500,119]. 21 20 19 18 17 16 15 500 1000 1500 2000 2500 Number of neurons in 2nd hidden layer

Percentage Accuracy 27 26 25 24 23 22 21 500 1000 1500 2000 2500 Number of neurons in 3rd hidden layer Learning Rate Decay (Annealing): As the network learns, the cost function gets minimized according to the gradient descent algorithm. With a slow learning rate, the network takes a long time to learn. With a large learning rate, there is a danger of oscillating about the minimum point instead of settling in it. To mitigate the above issues, it is beneficial to pick a large learning rate to being with and reduce (anneal) it with every epoch, as shown [13]: The rule followed is: η t = η 0 1 + kt where η 0 is the initial learning rate, η t is the learning rate after t epochs, and k is the annealing coefficient. From experiments, I picked annealing coefficient = 0.001.

15 12 9 6 3 0 1.0E-05 1.0E-04 1.0E-03 1.0E-02 Learning Rate Decay Coefficient k Nesterov Momentum: Ordinary gradient descent is akin to simple harmonic motion, where the restoring force on the pendulum is proportional to its position. The update is given as: w t+1 = w t η (Cost) w t In reality, the motion is damped by air resistance proportional to the velocity of the pendulum. This analogy also applies to friction affecting a ball rolling down a hill. Then the update becomes: (Cost) w t+1 = (w + μ w) t η (w + μ w) t This is shown in the left figure. On the other hand, Nesterov momentum updates first compute the new position of the ball and take the derivative with respect to that, as shown in the right figure [12]. The corresponding update is: (Cost) w t+1 = (w + μ w) t η (w + μ w) t

Percentage Accuracy where μ is a hyperparameter which has to be less than 1. I picked a value of 0.99. 36 33 30 27 24 0.8 0.9 0.99 Momentum Coefficient mu L2 Regularization: Performance of the network can be improved by penalizing high values of weights, so that no particular weight gets out of hand and adversely affects the network. This is done by adding the following term to the existing cost: Extra Cost = λ w 2 I picked λ = 5 10 4 41 40 39 38 37 36 35 34 1.0E-07 1.0E-06 1.0E-05 1.0E-04 1.0E-03 L2 parameter lambda Early Stopping: Overtraining is a common issue in neural networks. There comes a point when the network is learning the specific data, it isn t learning general features any more. As a result, training accuracy keeps

on improving, but validation and test accuracy suffers as shown [14]. Training should be stopped when validation accuracy doesn t increase for a certain number of epochs, which I picked to be 10. Results: The final network parameters chosen were: Network configuration = [41,2000,1500,500,119] ReLU activation for all hidden layers, finally softmax output L2 Regularization Coefficient = 0.0005 Learning rate = 0.01, annealing coefficient = 0.001 Nesterov momentum coefficient = 0.99 Minibatch size for stochastic descent = 20 Top-1 accuracy (or simply, accuracy) was the metric chosen for network optimizations. This means that a test sample is correctly classified if the predicted output class matches exactly with the actual output class. Since there are 119 output classes, this metric fails to give excellent results. For an application such as pricing an item, it is more important to predict a price close to the actual value instead of getting an exact match. Note that the output neurons are arranged in ascending order of prices. This means that if the neuron for the predicted class is in close vicinity of the neuron for the actual class, the prediction is satisfactory. Based on this, top-3 accuracy and top-5 accuracy metrics can be defined where the predicted neuron is at an absolute distance no less than 1 and 2, respectively, from the actual neuron. Another metric used was Average Percentage Error (APE) in price, defined as: True price Predicted price APE = (Avg All Test Samples ) 100 True price The test results of the final trained network using these metrics are:

Conclusion: Neural networks are extremely versatile machine learning tools that can learn features and use this knowledge to make predictions. This project demonstrates their capability by solving a pertinent realworld problem pricing football players in the transfer market which is one of the leading contentious issues among football players, managers, agents, owners and, of course, fans, as they wait with bated breath for new talent to potentially arrive at their club before the transfer window closes. The neural network model used here utilizes several machine learning features such as regularization, annealing and momentum descent, and places a footballer within 6.32% of his actual price. The model does not take goalkeepers into account and fails to predict prices of outlying star players such as Lionel Messi, who has a price tag estimated to be well in excess of $100 million [15]. The problem of vanishing gradients in a deep network has not been investigated, which opens up the doors to potential improvements such as a using different learning rates for each layer. Technical Details: The project uses Keras [16] a machine learning library for Python, with Theano [17] as the backend. Complete datasets and code for this project are available at https://github.com/souryadey/footballerprice.git

Acknowledgements: I would like to acknowledge my professor Bart Kosko for giving me the opportunity to perform this project. I would also like to acknowledge the website sofifa.com for providing player attributes from FIFA 2017 in a user-friendly fashion. Finally, thanks to Nitin Kamra for allowing me to use some of his code written for CSCI567 Machine Learning a course offered by USC in Fall 2016. Bibliography: [1] Encyclopedia Britannica, "Football," 2 February 2017. [Online]. Available: https://www.britannica.com/sports/football-soccer. [2] ESPN FC, "Paul Pogba completes record transfer to Manchester United from Juventus," 8 August 2016. [Online]. Available: http://www.espnfc.us/story/2915734/paul-pogba-completes-record-move-tomanchester-united-from-juventus. [3] E. Sports, "FIFA 17 - ALL LEAGUES & TEAMS IN FIFA 17," 16 September 2016. [Online]. Available: https://www.easports.com/fifa/news/2016/fifa-17-leagues-and-teams. [4] EA Sports, "FIFA 17 Ratings Refresh," [Online]. Available: https://www.easports.com/fifa/ratings. [5] A. Lindberg, "FIFA 17's player ratings system blends advanced stats and subjective scouting," ESPN FC, 26 September 2016. [Online]. Available: http://www.espnfc.us/blog/espn-fc-unitedblog/68/post/2959703/fifa-17-player-ratings-system-blends-advanced-stats-and-subjective-scouting. [6] Wikipedia, "Supervised learning," [Online]. Available: https://en.wikipedia.org/wiki/supervised_learning. [7] Wikipedia, "Stochastic gradient descent," [Online]. Available: https://en.wikipedia.org/wiki/stochastic_gradient_descent. [8] SOFIFA, [Online]. Available: http://sofifa.com/players/top. [9] Y. LeCun, C. Cortes and C. J. C. Burges, "THE MNIST DATABASE of handwritten digits," [Online]. Available: http://yann.lecun.com/exdb/mnist/. [10] Wikipedia, "Activation function," [Online]. Available: https://en.wikipedia.org/wiki/activation_function. [11] M. Nielsen, "Neural Networks and Deep Learning," January 2017. [Online]. Available: http://neuralnetworksanddeeplearning.com/index.html. [12] Stanford University, "CS231n: Convolutional Neural Networks for Visual Recognition," [Online]. Available: http://cs231n.github.io/neural-networks-3/. [13] Y. Bryan, "Bryan's Notes for Big Data & Career," [Online]. Available: http://blog.bryanbigdata.com/2014/11/algorithm-stochastic-gradient_4.html. [14] R. Cartenet, "Which signals do indicate that the convolutional neural network is overfitted?," [Online]. Available: https://www.quora.com/which-signals-do-indicate-that-the-convolutional-neural-network-isoverfitted. [15] Independent, "Lionel Messi to Chelsea: Barcelona star could cost 500m in total - but only Manchester United and Real Madrid could afford him," 9 January 2015. [Online]. Available: http://www.independent.co.uk/sport/football/transfers/lionel-messi-to-chelsea-barcelona-star-could-cost- 500m-in-total-and-only-manchester-united-and-real-9967992.html. [16] F. Chollet, "Keras: Deep Learning library for Theano and TensorFlow," [Online]. Available: https://keras.io/. [17] Université de Montréal, "Theano: Welcome," [Online]. Available: http://deeplearning.net/software/theano/.