LEARNING AGENTS IN ARTIFICIAL INTELLIGENCE PART I

Journal of Advanced Research in Computer Engineering, Vol. 5, No. 1, January-June 2011, pp. 1-5 Global Research Publications ISSN:0974-4320 LEARNING AGENTS IN ARTIFICIAL INTELLIGENCE PART I JOSEPH FETTERHOFF * AND MORTEZA MARZJARANI * 1. INTRODUCTION It seems as though man has always been obsessed with the creation of life. The concept of a thinking, feeling life form suddenly being created, whether through evolution or through intelligent design, is absolutely fascinating and nearly impossible to comprehend with our current knowledge. Perhaps that is why many in the field of Information and Technology have begun studying the science of Artificial Intelligence. Artificial Intelligence has many branches but all of them lead to the same goal: creating a machine that is capable of thinking and learning for itself. Of course there are many steps in between now and the ultimate goal but the dream remains the same. Think of it like this, if we work towards creating a thinking machine, we come that much closer to understanding our own mind and our own life. In the end, that is what science is all about. We are always trying to understand ourselves and the world around us. Artificial Intelligence and its many fields is just one more field of study that brings us closer to doing just that. Before simply diving into any one of the many topics in the field of Artificial Intelligence, however, you must first have at least a general understanding of what Artificial Intelligence means. One of the best and simplest definitions comes from John McCarthy of the Computer Science Department at Stanford University. Simply put, It is the science and engineering of making intelligent machines 1. Of course, this leaves us with the question of what constitutes intelligence. McCarthy explains that intelligence is the computational part of the ability to achieve goals. In this sense, it is a being, or machine s ability to achieve a goal through computation. Computation does not simply mean crunching numbers as we usually think of it. Rather, it is the ability to analyze a situation based on given factors and predict the best course of action to achieve the overall goal. Using the example of a video game, a computer generated character ultimately has the goal to defeat the human player. In this sense, its intelligence would be measured by its ability to analyze the player s actions, the digital environment that surrounds it, and the actions of any other computer generated characters to achieve its goal. Of course, this example illustrates a computer imitating a human. While this shows a level of intelligence, it fails to * Computer Science and Information Systems Department, Saginaw Valley State University, University Center, MI 48710, USA. recognize that a computer does not have to mimic a human being to be considered intelligent. It is important to note that a machine could be intelligent without understanding human interaction as long as it may achieve goals based on computational analysis. Now, the question becomes what that definition truly entails. Though there are many ways to perceive intelligence, it is clear that intelligence is almost synonymous with the ability to learn and to use what is learned for some form of gain. The gain in this sense is basically the development or improvement of a task that the machine can complete. However, it is not that simple. In this study, we will look into several methods of machine learning. Though there are many more out there and there will be many more to come, the basic models examined here provide a good background to machine learning in general. Linear Regression is a very simple tool that machines can use, followed by Decision Trees, Squashed linear Functions, Bayesian Classifications, and Neural Networks. Each has its own history, its own algorithms and its own pros and cons. However each one is important to understanding the current methods of machine learning as well as what the future might hold. 2. BACKGROUND One of the most important areas of research to examine in the field of Artificial Intelligence is a machine s ability to learn. Learning may seem like a very human concept, especially after stating that a machine does not have to mimic a human to be intelligent. When we think of learning, we often think of a more abstract idea that involves going to school, taking classes, and being taught how to perform a task. However, we learn every day whether we are being taught or just through our daily experiences. Likewise, it is possible for a machine to learn a new process or improve its performance in tasks it is already capable of. Before we can explore the different techniques of machine language, we first have to understand some of the basic principles such as how we measure success or failure, what constitutes a task, and the difference between online and offline learning, just to name a few. This is the backbone of machine learning. To begin with, this study focuses specifically on supervised learning. Supervised learning is a method of machine learning in which the machine is given a set of

2 Journal of Advanced Research in Computer Engineering training examples where each example is an input and output pair. As would be expected, the other extreme in machine learning is called unsupervised learning. In this case, no classification is indicated within the examples and the machine must recognize the correlations and patterns in the data it collects. However, this could itself be the topic of a whole report, so we will not go into detail here. The aim of supervised learning is for the machine to accurately predict the output for a given input based on the training examples it has processed 2. Each input in the examples can be separated in to classifications or categories. From these classifications, the learner can produce models (many of which will be described later) that represent the probabilities and distribution of data so that a hypothesis can be formed to predict the outcome in future data inputs 3. Think of it as school for machines. In school the teacher gives examples for students to complete. With each example, Students are more able to recognize patterns and can better understand the whole picture. The same is true in supervised machine learning. Data is collected from the examples given to the machine. The data is then used to create a model that fits the examples while reducing the risk for an error. The more data that is collected, the more accurate the model becomes and the more likely it becomes that a machine will predict the outcome correctly for future input values. This concept is the backbone of the rest of this study. However, there is much more to supervised learning than such a simple definition. For example, how is a task even defined? We say that learning involves gaining the ability to perform a task, perform it better or expand on the ability to perform a task. But what constitutes a task in the first place? The dictionary defines the word as a piece of work that has been assigned to a person. This is not far from the definition in the digital world. However, it can be narrowed down even further. In the world of Artificial Intelligence, a task is simply the objective to be completed by the machine. Generally speaking this refers to classifying data, modeling data, and finally predicting outputs 2. For example, the training set given to the machine is associated with a task. The machine s task is to identify the pattern of the input data in correlation to the output data and set up some form of model to represent that pattern. Therefore, when we say that learning involves improving a task, what is really implied is that the machine is becoming more accurate with its interpretation of given classifications, modeling, and ultimately better at predicting the final output value. It is also important to note that tasks are defined in unsupervised learning as well. In this case, the classifications are not already known, and the machine must classify the data elements without help or supervision. This is in contrast to supervised learning in which the classifications of the data are already known 2. Regardless, it can be seen clearly that tasks are a very important part of machine learning. Another important distinction to make in supervised learning is the difference between online and offline learning. Online is such a common term used in today s society it is not difficult to understand this concept. Nonetheless it is important to note in the study of Artificial Intelligence. First of all, we usually think of online as it pertains to computer networking. In this sense, the terms online and offline refer to a machine s connection status. In networking, online means that the machine is connected to the network while offline means that it has been disconnected. Online and offline have different meanings in Artificial Intelligence and machine learning. When a machine learns offline, it means that the learner is given training examples to use before it is ever required to act on that information. On the other hand, online learning requires the learning machine to receive training examples as it acts 2. This means that the machine must already have a representation of the data formulated before it receives a new example. Not only that, but it must be able to update the old model after the new data has been received. Even more complex is a version of online learning called active learning. In this form of learning, it is actually the machine s job to analyze which examples would be useful and then act to acquire them 2. Though each method has its own unique properties, both online and offline learning are considered supervised. This is due to the fact that they both involve analyzing data that has been classified before it reaches the learning agent. The last thing that should be covered before diving into the different models and representations that can be used by a machine to learn is how success or failure is measured. Machine learning has already been defined as a machine s ability to improve its performance in a task. By that very definition, we have to judge whether a machine s learning was a failure or success by finding a way to measure its performance. This does not mean that we have to monitor the agent s performance in the training examples. Rather, it is more important to look at its ability to predict the output for new examples. Think of a classroom. It is not important for the student to understand the material as soon as the teacher presents it. All that truly matters is that the student can perform well on the exams and projects. Likewise, a common way of measuring success in machine learning is to use two sets of data. The first set is a set of training examples while the second can be thought of as the exam 2. The machine s ability to predict the correct value for each example in the test set is easily calculated by percentage of accurate predictions. The measurement of a machine s success in learning is a simple concept but cannot be forgotten when it comes to designing algorithms for representation of data. The groundwork for machine learning can be found in the topics covered here. The concepts of performance measuring, the difference between online and offline learning and even the definition of a task are certainly not complex by any means. However, without at least a basic understanding of them, it is not possible to move

Learning Agents in Artificial Intelligence Part I 3 forward and build more knowledge about an ever expanding field. 3. APPLYING THE CONCEPT It is simple to understand the core concepts of Artificial Intelligence. Online learning, supervised learning, measure of success or failure, these are all abstract terms that can be understood at a high level. Obviously, Artificial Intelligence is not that simple. No form of intelligence is simple. If you think of the human brain, it is an unbelievably complex network of neurons and electrical signals that allow us to think and feel. While this has not yet been matched in a machine, steps have been taken and algorithms developed that allow a machine to maintain data patterns and improve performance. There are several models that can be used by computer systems to represent test data it receives in a manner that will allow for accurate predictions of future output values. Some of them are simple like Linear Regression and its relative, the squashed linear function. Then more complex models were put in effect such as Decision Trees and Bayesian Classifiers. Finally, the most complex model this essay will cover is, in a way, a simplified model of a human brain: a Neural Network. There is a history behind each model that is used. Also, each model has its own unique algorithm as well as benefits and cons to its implementation. 4. LINEAR REGRESSION Possibly the simplest method of data representation used by computing machines is the use of Linear Regression. The basic concept of Linear Regression lies with the idea that some sets of data will show a linear correlation when the set of data points are plotted on a chart. This means that the data points are not random. As the input value increases, the output value either increases, decreases, or stays consistently at the same level. The important part is that there is some correlation between the input value and the output value that allows the output value to be predicted. The concept has existed for centuries, but the name for the phenomenon was not coined until the late 1800s by Francis Galton 4. Galton was a biologist who while studying the relationship between the heights of fathers and their sons, noticed that a son s height depended linearly on his father s height. Looking at Figure 1, we can see that the taller the father was, the taller the son was likely to be. Of course there are always odd cases in any set of data as can be seen in the same chart. Some fathers 75 inches tall had a son who was less than 70 while others who were less than 65 inches tall had a son almost 75 inches. For the most part however, the data points display that the taller the father is, the taller his son will be. This much can be seen simply by looking at the chart. However, the concept alone is not enough hard evidence to be of any value to a machine. In order to provide a solid representation of data that can be used to predict future outcomes, we need a line of best fit. Figure 1: Image from Lecture by Czellar 4 The concept here goes back to the early 1800s and Karl Friedrich Gauss. Though he did not actually publish his work, he is credited with developing a mathematical method called least squares 5. This method actually uses a formula to find the line that will represent the data with the smallest margin of error possible. This means that the line could not be altered in any way that would not increase the average error for the data points in the set. The use of this method allows the analyst to predict any outcome given a specific input value and be fairly certain that there will be little to no error in their assumption. Clearly, the concept has been in place for centuries and has been proven to have its uses. However, that raises the question of how it applies to Artificial Intelligence. Looking at the work of Gauss, we see that a line of best fit can be calculated and implemented to predict future outcomes. So, in the field of Artificial Intelligence, that means that if a machine is given a set of inputs and their respective outputs, a line of best fit can be used to estimate the output value of any future input value based on a simple linear equation. That brings us to the formula for finding the line of best fit. After all, the machine has to compute it before it can use it. The simplest way for a computer to calculate the output would be to use a line in the form of y=mx+b where y is the output, m is the slope of the line, x is the input value and b is the intercept. This is a basic concept from math. However, it gets more complicated when the line is calculated with points that do not necessarily fall on the line. Therefore we have to use the least squares method to minimize the errors. n xy ( x)( y) m = 2 2 n ( x ) ( x) With this method we get y m( x) b = n This will give the computer a numerical value for the slope and the intercept, everything needed to generate a

4 Journal of Advanced Research in Computer Engineering linear graph 6. To give an example with real numbers, consider the following data set: X Y xy x^2 2 1 2 4 4 3 12 16 3 2 6 9 5 4 20 25 Sums 14 10 40 54 Figure 3: Random scatter plot Figure 2: Chart Plotted Using Data Points to the Left Plugging this into the formula above we get 4(40) (14)(10) m = = 1 2 4(54) 14 10 (1)(14) b = = 1 4 That would make the line of best fit y = (1) x 1 What does this mean? If a computer implements this model using the training set of data it will calculate the formula for the line of best fit that will minimize the errors for the dataset. Then, when it is given an input for which the output value is unknown, it can take the input value and plug it in for x and calculate y. This would give the machine a fairly accurate estimate of the output for data in which there is a linear correlation between input and output. This model definitely has to be used when it is appropriate to do so. Like all methods of estimation and prediction, it has its benefits and its drawbacks. One of the biggest benefits to using this method is that it is simple to implement. From a programming standpoint, the Linear Regression model is 100% calculation. There is a simple mathematical formula that has been known for centuries that can be used to base predictions off of. Though the formula may take a little while to calculate by hand, the computer could do it in the blink of an eye using a few variables and a couple loops to calculate the sums, then just plug in the values and produce a solution. Furthermore, it is capable of predicting a numerical value (we ll get into how this is important when we talk about other models). As Galton noticed in his observations, if there is a correlation between an input and an output, this model could be used to predict the output value based on any input number that is real (as opposed to an imaginary number like i) As long as the input is real, the best fit line will represent it somewhere on the graph and an output can be calculated. Unfortunately, not every dataset has a linear correlation between its input and output. Sometimes the data points, especially when there are only a small number in the set, will produce a scatter plot that seems completely random. If we are to use the Linear Regression model and incorporate a line of best fit, the formula will create a line, but that line may not really be of any use in predicting future outcomes. For example Figure 3 could very easily be a set of data points in a sample. A line of best fit could be computed for this set, but it would not accurately represent the data. Instead, it would produce a line that attempted to find a correlation in a data set where there isn t any. As a result, any predictions would be completely invalid. The other concern with this method is that in a linear function it is possible to have an output value greater than 1. For examples like the one of Galton, this is not a problem. However, if we have a set of data where the only possible prediction is yes or no (1 or 0) we cannot use a linear function because it is possible to get a response that is greater than 1 or less than 0 based on the inputs. That does not render this method useless, of course, it just means that it is not the one and only go-to model. REFERENCES [1] J. McCarthy (2007), Artificial Intelligence: Basic Questions. HTML. Available http://www-formal.stanford.edu/jmc/whatisai/ node1.html [2] D. Poole, et al. (2010), Artificial Intelligence: Foundations of Computational Agents. Published by Cambridge University Press. [3] S. B. Kotsiantis (2007), Supervised Machine Learning: A Review of Classification Techniques. PDF. Available http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.9683 [4] V. Czellar. Lecture 7: Simple Linear Regression. PDF. Available https://studies2.hec.fr/jahia/webdav/site/hec/shared/ sites/czellarv/acces_anonyme/lecture7statprev.pdf

Learning Agents in Artificial Intelligence Part I 5 [5] E. Weisstein (2007), Gauss, Karl Friedrich. HTML. Available http://scienceworld.wolfram.com/biography/gauss.html [6] S. Waner (2008), Linear and Exponential Regression. HTML. Available http://www.zweigmedia.com/realworld/calctopic1/ regression.html [7] W. Varey (2003), Unlimited Growth: How to Sustain Success. [8] I. M. Graham (1991), Research and Development in Expert Systems VIII. Published by Cambridge University Press. [9] M. F. Triola. Bayes Theorem. PDF. Available http:// faculty.washington.edu/tamre/bayestheorem.pdf [10] I. Russel. Neural Networks Module. HTML. Available http:// uhaweb.hartford.edu/compsci/neural-networks-history.html [11] K.M. Fauske. Example: Neural Network. HTML. Available http://www.texample.net/tikz/examples/neural-network/ [12] Multilayer Perceptron Neural Networks. Available <http:// www.dtreg.com/mlfn.htm>