Introduction to Computational Neuroscience A. The Brain as an Information Processing Device

Introduction to Computational Neuroscience A. The Brain as an Information Processing Device Jackendoff (Consciousness and the Computational Mind, Jackendoff, MIT Press, 1990) argues that we can put off questions about the "phenomenological mind", and concern ourselves with the relation between the brain and the "computational mind". The brain is viewed as an information processing device. The mind is viewed as a collection of functions. The principle function of the brain is to process information.

Kosslyn & Koenig (Wet Mind: The New Cognitive Neuroscience, Free Press, 1995) propose a Cognitive Neuroscience Triangle: Behavior is at the top of the triangle because the goal is to explain specific abilities which can only be revealed by behavior. Studies in cognitive psychology and linguistics deepen understanding of the phenomena to be explained.

According to this view, cognitive neuroscience seeks to understand the nature of the information processing by which the brain produces the observed behavior. Computational analyses & computer models are used to understand how the brain can produce cognitive behaviors. Since the language of information processing is derived from computers, we must use key concepts from this language to specify how the brain processes information.

B. Historical Perspective The modern history of artificial intelligence can be traced back to the 1940's. There are two complementary approaches to the field that both go back to those early days. The Serial Symbol Processing approach began in the 1940's, when the architecture of the modern digital computer was designed by John Von Neumann and others. They were heavily influenced by the work of Alan Turing on finite computing machines. The Turing Machine is a list of instructions for carrying out a logical operation. The Von Neumann computer follows this theme. It: a) performs one operation at a time b) operates by an explicit set of instructions c) distinguishes explicitly between stored information & the operations that manipulate information.

The Parallel Distributed Processing (PDP) approach has its roots in the work of Donald Hebb. In 1949, Hebb constructed a theoretical framework for the representation of short-term and long-term memory in the nervous system. The functional unit in Hebb's theory is the Nerve Cell Assembly (NCA): a population of mutually excitatory neurons that when excited together becomes functionally linked. He also introduced the Hebbian learning rule: when unit A and unit B are simultaneously excited, the strength of the connection between them is increased. Hebb s work built on the earlier work of McCulloch and Pitts, who proposed a simple model of a neuron as a binary threshold unit. The model neuron computes a weighted sum of its inputs from other units, and outputs a one or zero according to whether this sum is above or below a threshold. McCulloch & Pitts (1943) proved that an assembly of such neurons is capable in principle of universal computation, if the weights are chosen suitably. This means that such an assembly could in principle perform any computation that an ordinary digital computer can.

A leading proponent of the PDP approach was Frank Rosenblatt, who developed the concept of the perceptron: a single-layer feedforward network of linear threshold units without feedback. The work focussed on the problem of determining appropriate weights for particular computational tasks. For the single-layer perceptron, Rosenblatt developed a learning algorithm a method for changing the weights iteratively, based on error signals, so that a desired computation was performed.

The properties of perceptrons were carefully analyzed by Minsky & Papert in their 1969 book "Perceptrons". They showed that Rosenblatt s learning algorithm only applied to those problems which the network structure is capable of computing. They showed that some elementary computations could not be done by the single-layer perceptron. The simplest example was the exclusive or problem (the output unit turns on if 1 or the other of 2 input lines is on, but not when neither or both are on). Rosenblatt believed that multi-layer structures could overcome the limitations of the simple perceptrons, but he could not discover a learning algorithm for determining the way to arrive at the weights necessary to implement a given calculation.

Minsky & Papert s analysis of the limitations of one-layer networks suggested to many in the fields of artificial intelligence and cognitive psychology that perceptron-like computational devices were not useful. This put a damper on the PDP approach, and the late 1960's and most of the 1970's were dominated by the serial processing approach & the Von Neumann computer. However, after many grandiose claims for the serial symbolic processing approach were not fulfilled, there was a resurgence of interest in PDP models in the late 1970's. It was realized that, although Minsky & Papert were exactly correct in their analysis of the one-layer perceptron, their analysis did not extend to multi-layer networks nor to systems with feedback loops. Much of the resurgence of the PDP approach was fueled by discovery of the backward error propagation algorithm, which fulfilled Rosenblatt s dream of a general learning algorithm for multi-layer networks.

The PDP approach has gained a wide following since the early 1980's. Many neuroscientists believe that it embodies principles that are more neurally realistic than the serial symbolic approach. Because PDP models are thought to work like brain regions, they are often called Artificial Neural Networks.

C. Properties of Artificial Neural Networks 1. Neural networks are organized as layers of units. A feedforward network has an input layer, an output layer, and one or more hidden layers. In recurrent networks, there are excitatory or inhibitory connections from output units back to earlier units, e.g. input units. The feedback modulates the input.

2. Each unit has an output which is its activity level, and a threshold which is a level which must be exceeded by the sum of its inputs for the unit to fire (give an output). The output of each unit (neuron) conveys information (defined in most general sense as reduction of uncertainty ) by its activity.

3. Connections between units can be excitatory or inhibitory. Each connection has a weight which measures the strength of the influence of 1 unit on another. 4. Inputs and outputs to the network are patterns of activity.

5. A neural network is trained by teaching it to produce certain output when given certain input. The backward error propagation training technique: (1) randomize the weights (2) present an input pattern (3) compare the output with the desired output (i.e. compute the error) (4) slightly adjust the weights to reduce the error (5) repeat (2) - (4)

6. The trained network functions as an associative memory: it relates patterns from the input domain to corresponding patterns in the output domain. 7. The pattern of weights on the internal connections of the network can be considered to be a representation: it represents the combinations of input features that identify output patterns.

8. Input patterns are informationally interpretable in that they have a specific impact on a network s output, i.e. they contain information that is used to direct the network behavior in one way or another. 9. The network can be considered to perform a systematic mapping of input space to output space: it preserves a specific relationship between properties of input and output. The mapping is determined by the connections within the network. 10. Networks perform computations: the informationally interpretable systematic mapping from input to output. 11. The set of input/output mappings performed by a network is the function it computes.

12. The systematic relationship between input and output may be described by a rule. (The computational system does not actually contain rules, but is connected in such a way that it can be described by rules). 13. Networks can be linked together so that the output of one network is fed as input to another network. 14. A set of connected networks comprising a larger network may be called a processing subsystem (a subset of the entire nervous system). 15. Processing subsystems carry out algorithms. An algorithm is a set of computational steps that operates on a particular input to produce a particular output. The algorithm can be described without specifying all of the internal steps that are required.

D. Concept of Levels 1. Levels of organization Levels are based on structural organization at different scales, e.g. molecular, synaptic, neuronal, network, systems. Each structural level has its own corresponding functional description. A description of algorithm, i.e. how a task is accomplished, may be provided at many different levels. Cognitive phenomena may be associated with a variety of levels.

2. Levels of processing Levels are based on pathway distance from sensory receptors. Measured by number of intervening synapses from periphery. Does not imply that information only flows from periphery to more central regions. There are many pathways from higher to lower levels. Also, it is not possible to assign levels outside of sensory systems.

3. Levels of analysis Levels are based on classes of questions that can be asked. A good example of this view of levels comes from David Marr (Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. New York: Freeman, 1982. pp29-61). Marr identified 3 levels of analysis: a) abstract problem analysis b) algorithm c) physical implementation

E. Identifying Networks in the Brain Because of the complexity of the brain, a pragmatic strategy is needed for determining which functions are elementary, and identifying specific networks in the brain that perform those functions. Two basic approaches to this problem are described in reference to levels of analysis: top-down and bottom-up.

1. The Marr computational analysis is an example of a topdown approach. It begins with a behavior to be explained & logically analyzes the information processing steps that would be needed to produce it. a) abstract problem analysis: a computational theory is constructed of a subsystem which performs a complex input/output mapping. At this level, the questions are: what is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out? b) algorithm: how can the computational theory be implemented; what is the representation for the input and output, and what is the algorithm for the transformation? c) physical implementation: how can the representation & algorithm be realized physically (in the brain or in a machine)?

These levels were proposed by Marr to be independently solvable. His approach has a functionalist flavor it might be interpreted to imply that neuroscience is irrelevant for understanding cognition. These levels may be independent in a formal sense, i.e. an algorithm can be specified without reference to the physical implementation. Many useful algorithms are created in this way. However, algorithms that are relevant to human cognition may require consideration of neurobiological constraints. There are many conceivable solutions to the problem of how a cognitive operation could be accomplished. Furthermore, some functions may be too difficult to understand without considering neurobiology. The assumption of cognitive neuroscience is that neurobiological data provide essential constraints on computational theories. They provide an efficient means of narrowing the search space for computational solutions.

2. An example of a bottom-up approach, proposed by Freeman, is the theory of Katchalsky sets. Based on neuronal analysis. K0 set: network with common input source, common sign of output, and no interconnections KIe set: network with common input source, common sign of output, dense mutual excitation KIi set: network with common input source, common sign of output, dense mutual inhibition KII set: network with dense interconnections between two KI sets KIII set: combination of KII sets

CECN attempts a synthesis of top-down and bottom-up approaches. It is concerned with the implementation of highlevel cognitive functions, so it is top-down, but it tries to incorporate fundamental constraints from neuroscience in the Leabra simulator, so it is also bottom-up.