Object Identication in Dynamic Images Based on the Memory-Prediction Theory of Brain Function Marek Bundzel, PhD. Technical University Košice Dept. of Cybernetics and Artificial Intelligence marek.bundzel@tuke.sk www.ai-cit.sk
This work was supported by a research fellowship of the Japan Society for the Promotion of Science, 2008-2010. Host: Prof. Shuji Hashimoto SHALAB, Department of Applied Physics Waseda University, 3-4-1 Okubo Shinjuku, Tokyo 169-8555, Japan Link to the paper: Object Identification in Dynamic Images Based on the MemoryPrediction Theory of Brain Function http://www.scirp.org/journal/paperinformation.aspx?paperid=3336
Intro The aim is to create a visual recognition system for a mobile robot. The training process of the system is somewhat similar to the way a child learns about the world. The child sees things and learns about the existence of various objects alone. No adult trains a child to see. It can recognize the objects again even though it does not know their names or function. Later the adult can tell the child names of some objects of interest (put a human readable label on them) so they can be referred to in communication. First, the system collects visual data recorded at a steady framerate while moving around various objects. Unsupervised learning is applied to identify entities comprising the environment. Human operator can assign names to the entities found.
In 2004, Jeff Hawkins presented a memory-prediction theory of brain function in his book On Intelligence, and later used it to create the Hierarchical Temporal Memory model (HTM). Several of the concepts described in the theory are applied here in a computer vision system for a mobile robot application. NuPic: http://www.numenta.com/archives/software.php HTM demo
Memory-prediction Theory of Brain Function The underlying, basic idea is that the brain is a mechanism predicting the future and that hierarchical regions of the brain predict their future input sequences. The theory is motivated by the observed fact that the mammalian neocortex is remarkably uniform in appearance and structure. Principally, the same hierarchical structures are used for a wide range of behaviors, and if necessary, the regions of the neocortex normally used for one function can learn to perform a different task. Adults who are born deaf process visual information in regions that normally become auditory regions. Blind adults use what is normally a visual cortex to read braille although braille involves touch. The memory-prediction framework provides a unified basis for thinking about the adaptive control of complex behavior.
Assumptions 1. patterns from different senses are equivalent inside the brain 2. the same biological structures are used to process the sensory inputs 3. a single principle or algorithm underlies processing of the patterns
Discovering frequent temporal sequences is essential for the functioning of the brain. Patterns coming from different senses are structured in both space and time. What Hawkins considers one of the most important concepts is that: the cortex's hierarchical structure stores a model of the hierarchical structure of the world 1. Discover causes in the world 2. Infer causes of a novel input.
Cortical Hierarchy
Discovering Causes The sensory input changes but the cause persists for some time Interfacing through one or more senses
Hierarchical Temporal Memory Machine learning model developed by Jeff Hawkins and Dileep George of Numenta, Inc. that models some of the structural and algorithmic properties of the neocortex using an approach somewhat similar to Bayesian networks. HTMs are claimed to be biomimetic models of cause inference in intelligence.
Hierarchy of Computational Nodes dog, face, car... causes of intermediate complexity edge, line corner... Nodes send their beliefs to their parent nodes
Hierarchy in HTM Provides a Mechanism for Covert Attention Switching on and of pathways thus limiting what the HTM sees Needed to recognize objects in complex scenes Finding the object in the scene
Node Operation Visual Example Spatial patterns Common patterns (remember) Uncommon patterns (ignore) Temporal patterns Common sequence (assign to cause) Common sequence (assign to cause) Uncommon sequence (ignore) time
Why is hierarchy important? Shared representations lead to generalization and storage efficiency Prototype and transformation method - for most real world objects the prototypical image can't be identified and the number of possible transformations is nearly unlimited. HTM does not store a prototype they match inputs to the previously seen piece at a time and in a hierarchy HTM fail if a new object is not made up of previously learned subobjects (rarely a limitation in the real world) After sufficient initial trainning most of new learning occurs in the upper levels of HTM hierachy Most of the real-world environments have spatial and temporal structure and both are hierarchical in nature Hierarchical representation affords mechanism for attention
Why is time necessary to learn? Each node learns common sequences therefore it must be presented with sequences For many applications it is the only way: e.g. understanding language, music, touch... For some applications beneficial (e.g. inference with static images). Generalization the outside and the inside of watermelon do not represent two different causes. During progress from seeing the outside to seeing the inside of watermelon the cause watermelon persists. The role of supervision faster training imposing a prior expectation on the top levels node(s).
Algorithms
Comparison to Similar Models The proposed system is closely related to other models based on the memory-prediction framework. It can be considered an HTM with a sequence memory (the original implementations of HTM searched for frequent itemsets, not episodes). Neither the systems provide means of storing durations. The proposed system is specialized for a real time computer vision application on a mobile robot. This required implementation of a mechanism for minimizing the influence of the robot's changing velocity on storing and recalling the frequent episodes and usage of relatively fast methods. In contrast to HTM, the system does not supervise learning on the top level. The human-machine interface for labeling the categories of the identified objects is used instead. In other words, the proposed system is allowed to isolate objects on its own.
Supporting Software and Structures Case for the cameras: Cameras are 100mm over ground 50mm apart Designed for a robot encountering small objects vision field 61x48 degrees
Data Stream Gabor filtered 2fps, 320x240 1140 Images, 60% train 1.7x2.1m arena
Hierarchy 32x16 pixels
Assign Name to Objects
Unidentified: 29% Train set 40% Test set
Problems The problems with application of the described system on a mobile robot are largely related to the balance which must be achieved between the speed of the robot's movements, the framerate at which are the images taken, the dimensions of the crops at layer 0 and measure of discretization of the input space (number of clusters).
Discussion One of the main problems of the system is a large number of user set parameters Like HTM, the system can be modified for supervised learning. The algorithm matching BS with the frequent episodes E searches for absolute match. Predictions and feedback are to be implemented. The brain presumably includes information on the organism's own movements when making predictions about future inputs. In addition to learning new sequences also forgetting should be considered. There is no mechanism for cover attention. Integrating the information from separate cameras at higher levels of the hierarchy, not in layer 0 directly. Strain on the human operator is high
Future work Processing large datasets Increase framerate and resolution (side view?) Supervised learning Parcelize image + covert attention (localization of objects in the scene) Using feedback Incremental learning Forgetting sequences Search sequences for approximate match Use information on own movements More sophisticated human-system interaction
Thank you for your attention.
Robot Control The assumption: we have a robust method for recognition of objects in static images and sequences of images (time based inference and moving robot). Stereovision: step up from 2D to 3D. Additional source of information relevant to the nature of objects Attention mechanism: enables to identify objects in the scene and to calculate their possitions. MPF Model of the world Human Rule based system for robot control - Odometry - Sonars
Model of the World The robot does not have any other source of information except of its own sensors (stereovision, odometry, sonars, laser distance meter etc.) No outer localization and positioning system The robot has to rely on what it perceives The model of the world is neither static nor accurate Information about object constelations is used together with odometry estimated x,y coordinates Probabilistic approach to localization of the robot
Idea World Exploration Starting position Origin of the coordinate system
Idea World Exploration - Search for other objects near by - Estimate x,y of the objects - Perform triangulation [x,y] [x,y] [x,y] (measure distances of the objects and calculate angles in the triangles)
Idea World Exploration
Idea Localization Robot has found 3 objects: There are two possible constellations stored Calculate the most likely position using my known distance to the objects and odometry estimated x,y X