THE DESIGN OF A LEARNING SYSTEM Lecture 2

Challenge: Design a Learning System for Checkers What training experience should the system have? A design choice with great impact on the outcome Choice #1: Direct training examples. Just a bunch of board states together with a correct move. Choice #2: Indirect training. A bunch of recorded games, where the correctness of the moves is inferred by the result of the game. credit assignment for each move (how good or bad)

Challenge: Design a Learning System for Checkers What amount of interaction should there be between the system and the supervisor? Choice #1: No freedom. Supervisor provides all training examples. Choice #2: Semi-free. Supervisor provides training examples, system constructs its own examples too, and asks questions to the supervisor in cases of doubt. Choice #3: Total-freedom. System learns to play completely unsupervised How daring the system should be in exploring new boards?

Challenge: Design a Learning System for Checkers Which training examples? There is an huge huge number of possible games. No time to try all possible games. System should learn with examples that it will encounter in the future. For example, if the goal is to beat humans, it should be able to do well in situations that humans encounter when they play (this is hard to achieve in practice).

So far : Choosing the Training Experience Determine Type of Training Experience.. etc Games against experts Games against self Table of correct moves Next step??

The Wikipedia Get To Philosophy game and the analogue for Machine Learning Machine Learning: Get to Math We start from a high level definition of the problem we want to solve. We progressively reduce it to something more mathematical.

What should be learned exactly? get in the mind of a player The computer program knows the legal moves. Should learn how to choose the best move. The computer should learn a hidden function. Perhaps the function should be Knowing the value of the board can be used to pick the best move. Choose the move that leads to the board with the highest value. So far, in two steps, we ve reduced the original problem to that of learning the function V.

Let s attempt to define the function V 1. If b is an end-of-game win board state: V(b) = 100 2. If b is an end-of-game lose board state: V(b) = -100 3. If b is an end-of-game draw board state: V(b) = 0 4. If b is not a final board state then V(b)=V(b ) where b is the best final board state that can be achieved from b, assuming optimal playing. With this definition, every board b has a value of 100, -100, or 0. To compute it we need to run all possibilities until the final state. So this is a non-operational definition, because we can t work with it in practice. Finding an operational definition may be even impossible. In practice, we will try to find an operational definition of an approximation to V.

A break for notation. People in math really like using `hats to denote approximations. approximate person person

Let s attempt to define the approximation to the function V 1. A large table with boards b and values or. 2. An artificial neural network that implements it. Imagine a neuron for every cell of the board, firing when a piece is present, and many connections between neurons, or.. 3. A polynomial function of predefined board features etc There is a trade-off between very expressive representations that can get very close to V but are nearly non-operational or simpler representations that are efficiently computable. x 1 number of black pieces x 2 number of red piece x 3 number of black kings x 4 number of red kings x 5 number of black pieces under threat x 6 number of red pieces under threat

The reduction steps so far Determine Type of Training Experience.. etc Games against experts Games against self Table of correct moves Determine Target Function.. etc Board Move Board Value Polynomial Determine Representation of Learned Function Neural Network Linear Function of Six Features

Estimating training values 1. Recall that training experience is indirect. We work only with the outcomes of the moves, i.e. the end-of-game boarding states. So at least we know the value of for these final boards. For the previous boards we can initialize 2. Learning has a significant feedback component. A new trace of game comes in, and the system will use it to update using the previously learned approximation. This makes sense. In learning, we build upon previous knowledge. 3. Let s make a wish. The approx function should be equal to the function defined by: Is it possible? In the first training step, all previous-to-last boards of the game get a value among 100,-100 and 0. Reflects certainty towards end-of-game.

Why can t the new hat-v be V train? Because the values we just assigned to V train most probably don t obey the basic definition of hat-v:: Even if we pick the values of w i, we won t be able to match the values of V train. However we can pick values for w i, so that the new hat-v is as close as possible to the values of V train. To put it mathematically, we will try to find a hat-v which minimizes. Least Mean Squares Problem (LMS): A classic problem in numerical analysis with known solution that we can use to update w i

The Final Design Experiment Generator New problem (initial board state) Hypothesis V-hat Performance System (player hat-v) Generalizer Solution Trace (game history) Critic Training Examples <b 1, V train (b 1 )> <b 2, V train (b 2 )>

Would it play well? Probably not, because the representation we chose is too simple. However, a better representation V-hat could produce a very good system. The overall design looks similar to a very good learning system for backgammon. In that case V-hat is a neural network. In any case, this is just a possible design. There are others, for instance 1. Store many training examples (boards/moves), and for every new board try to find the one that looks closest among the training boards. Use it to determine the move. [nearest neighbor search] 2. Train many systems, not just one. Have them fight each other, and choose the best for the next generation. [genetic algorithms] 3. Try to imitate a more human-like approach: understanding of the move based on explanations. [explanation-based learning]

General Perspectives and Issues in Machine Learning In general, a learning algorithm searches a large (potentially infinite) set of hypotheses, and tries to find one that fits best the data. The various techniques of Machine Learning are in most cases different ways of hypothesis representation. Different representations suite better different tasks. The class will review representations and explain how they exploit the underlying structure for different problems.

General Perspectives and Issues in Machine Learning What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems and representations? How much training data is needed to achieve a level of confidence to the learned hypothesis? Are there any theoretical and general bounds? Can prior knowledge be useful during training, even when approximate? How should we chose the training experiences? What is the relationship between the strategy and the complexity of the learning problem. Is it possible to make automatic the process of picking target functions for learning? Is it possible to have a system that doesn t have a fixed hypothesis representation but keeps changing it in response to the performance?

Next Lecture: Chapter 2 Concept Learning and General-to-Specific Ordering In general, a learning algorithm searches a large (potentially infinite) set of hypotheses, and tries to find one that fits best the data. View learning as a search in a space of hypotheses. Present several algorithms for performing the search.