A FIRST APPROACH TO LEARNING A MODEL OF TRAFFIC SIGNS USING CONNECTIONIST AND SYNTACTIC METHODS

Similar documents
INPE São José dos Campos

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Word Segmentation of Off-line Handwritten Documents

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning Methods for Fuzzy Systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

LEGO MINDSTORMS Education EV3 Coding Activities

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Human Emotion Recognition From Speech

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Speech Emotion Recognition Using Support Vector Machine

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Python Machine Learning

Probabilistic Latent Semantic Analysis

A Case-Based Approach To Imitation Learning in Robotic Agents

Using focal point learning to improve human machine tacit coordination

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

ProFusion2 Sensor Data Fusion for Multiple Active Safety Applications

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Lecture 1: Machine Learning Basics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

A Computer Vision Integration Model for a Multi-modal Cognitive System

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Modeling function word errors in DNN-HMM based LVCSR systems

AQUA: An Ontology-Driven Question Answering System

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Evolution of Symbolisation in Chimpanzees and Neural Nets

Axiom 2013 Team Description Paper

Reducing Features to Improve Bug Prediction

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 1: Basic Concepts of Machine Learning

CS 598 Natural Language Processing

Speech Recognition at ICSI: Broadcast News and beyond

Seminar - Organic Computing

An empirical study of learning speed in backpropagation

Automating the E-learning Personalization

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Softprop: Softmax Neural Network Backpropagation Learning

Abstractions and the Brain

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Assignment 1: Predicting Amazon Review Ratings

WHEN THERE IS A mismatch between the acoustic

Speaker Identification by Comparison of Smart Methods. Abstract

Using computational modeling in language acquisition research

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Knowledge-Based - Systems

Learning Methods in Multilingual Speech Recognition

Circuit Simulators: A Revolutionary E-Learning Platform

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

On-Line Data Analytics

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Modeling function word errors in DNN-HMM based LVCSR systems

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Time series prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

MYCIN. The MYCIN Task

Applications of memory-based natural language processing

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Problems of the Arabic OCR: New Attitudes

Rule Learning With Negation: Issues Regarding Effectiveness

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

CSL465/603 - Machine Learning

Data Fusion Models in WSNs: Comparison and Analysis

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Robot manipulations and development of spatial imagery

Large vocabulary off-line handwriting recognition: A survey

Test Effort Estimation Using Neural Network

Course Law Enforcement II. Unit I Careers in Law Enforcement

SITUATING AN ENVIRONMENT TO PROMOTE DESIGN CREATIVITY BY EXPANDING STRUCTURE HOLES

arxiv: v1 [cs.cl] 2 Apr 2017

Parsing of part-of-speech tagged Assamese Texts

Classification Using ANN: A Review

XXII BrainStorming Day

An Automated Data Fusion Process for an Air Defense Scenario

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Mining Association Rules in Student s Assessment Data

On the Formation of Phoneme Categories in DNN Acoustic Models

Airplane Rescue: Social Studies. LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group The LEGO Group.

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

Transcription:

A FIRST APPROACH TO LEARNING A MODEL OF TRAFFIC SIGNS USING CONNECTIONIST AND SYNTACTIC METHODS Miguel SAINZ and Alberto SANFELIU Instituto de Cibernética, Universidad Politécnica de Catalunya - CSIC e-mail: sainz@ic.upc.es, sanfeliu@ic.upc.es Abstract A system to learn and recognize traffic signs is described. The system uses neural network image processing and syntactic methods. The learning process is based on the representation of traffic signs by means of a grammar, which is inferred from a set of positive and negative samples. The recognition of traffic signs in a scene is done in two steps. First, the sign is located in the scene by using a connectionist segmentation method. Second, the sign is coded and analysed to determine which traffic sign it is. The system has been tested successfully only for the first step. The second step is currently under development. 1 Introduction During the last few years much research effort has been devoted to autonomous vehicle navigation using digital image processing. Most has been aimed at road boundary detection and obstacle avoidance, and several very robust and reliable systems have been implemented. Several different techniques have been applied, as shown in [1], [2], but lately the use of neural networks has produced promising results because of their robustness and computational simplicity ([5], [4]). Besides road boundary detection and obstacle avoidance, another aspect of autonomous vehicle navigation is traffic sign detection and recognition. Only a few systems have been designed for that purpose. Some of them are described in [3]. The main purpose of our research is to develop a system for learning and recognizing traffic signs using neural networks and syntactic methods. We have studied the use of automatic learning to see to what extent it is possible to use a learning process instead of having to develop a specific method for every problem. We use two levels of learning: segmentation learning based on neural networks, and model learning based on grammatical inference. The two levels of learning will be explained in the following section. The recognition process uses the results of the learning process as inputs for identifying the traffic signs. In this work we will describe the learning and recognition process and we will discuss the results obtained so far.

2 The learning process The learning process consists of two levels: the segmentation learning process and the model learning process. We use a color image of 512 x 512 pixels obtained from the TV camera on the vehicle. The analog signal of the TV camera is digitized by an 8-bit A/D converter into three channels corresponding to the red, green and blue color components. The segmentation learning process is used to learn the different classes of pixels that we will use to segment the scene in the recognition process. In order to reduce the amount of data of the 512x512 pixels images, we work with 4x4 pixels windows in this level. A human operator decides how many different labels the system will consider and then he marks and labels some areas in a set of images ( these areas will be called segmentation areas). He then selects from those images several positive samples which will be used in the segmentation learning module. In the traffic sign recognition problem we have considered the following five classes: road, road lines, sky, grass and traffic signs. The last label considered is used to locate the traffic sign in the scene. Neural Training process Network Labelled Image Trained Neural Net 4x4 pixels labelled windows Figure 1 Segmentation learning process. The segmentation module consists of a three-layered neural net. This net has 48 inputs corresponding to a 4x4 pixels x 3-color window of the image and one output for each considered label ( 5 in our case ). The number of neurons in the input, hidden and output layers are set by the operator. In our case and after several tests we have set the numbers of neurons to be 10 in the first layer and 10 in the hidden layer. The output layer has the same number of neurons as the number of considered output labels. This net is trained by the back-propagation method using the set of samples from the segmentation areas selected by the human operator. Once the net is trained, we perform a validation test over a set of test images to check the learning performances. At this point the human operator can modify the samples or the net parameters to improve the learning of the segmentation module. When the segmentation learning level is completed the operator can proceed to the model learning level.

In this level the operator marks the areas on the scenes where the model to learn is located. These areas will be called model areas. These areas are the set of positive samples for model learning. Before the learning process starts it is necessary to preprocess the image. This preprocessing has three parts: optimization of the areas, normalization of the sizes and coding into symbols the contents of the sample areas. The normalization of the size of the model areas is performed in order to get the same information from any of the different images areas. We have set an arbitrary size of 50x50 pixels because this is the average size of the traffic sign samples. We code each pixel of the model area into one of the following 4 symbols: red (R), white (W), black (B) and the remaining of colors ($). These four symbols will be the four primitives of the grammar. We perform a linear transformation of the R,G,B channels in order to intensify the red and the white colors and we classify the pixels by a standard histogram based thresholding. After the coding, a morphological process is applied to improve the shape of the traffic sign by removing holes and smoothing the contour. Also, the information (the speed limit,...) inside of the traffic sign is removed and is changed to a white pixel ( the information of the traffic sign will be taken into account once the system will be completely tested ). This is done because it is desired to learn only the shape of the sign (round, triangular or square). Figure 2 Traffic sign primitive extraction At this point, we are able to extract the primitive chains by reading the primitives from the coded samples. Now, the operator may introduce some negative samples into the sample set. Then the learning process of the model begins. The methodology used is that of active grammatical inference learning described in [6] and [7]. After an argument regular expression ( a context sensitive grammar ) is inferred, a validation test is applied to evaluate how good the system is. Here, the operator may restart the model learning level changing the samples and the learning parameters. Once the two learning levels are completed, the results are transferred to a recognition system.

3 The recognition process The recognition process of a traffic sign from a road scene is divided into three steps. The first step is the location of the traffic sign. We use a segmentation module which consists of a pre-trained neural net to segment the image. See Figure 3. Then, a morphological process is applied to remove noise and fill up gaps, and the system then looks for all the objects labelled as traffic sign by the neural net that are located in the right half of the scene. The objects found are analysed by applying morphological processing and shape contour extraction methods. The smallest square window that contains the traffic sign candidate is located and the system proceeds to the second step. Figure 3 Scene segmentation process. Now the system optimizes the size of the window and the inside of the square window is coded into symbols. After this coding, a size normalization of the traffic sign candidate is applied and noise removal processes are applied to clean up the window. The third step is the recognition of the traffic sign. This step is divided into two phases. First, the system recognizes the shape of the traffic sign by finding a distance measure between the extracted symbol chain and each inferred grammar of the traffic sign models. In this phase, an error-correcting parser is used. The traffic signs will be identified by the class with the lowest value of the distance measured if this distance is below a threshold. At this point we know the shape of all the traffic signs found in the scene. The next phase is to analyse the symbol inside the sign. Once we know both the shape and the symbol, our system will identify the traffic sign. 4 Results In this section we will show some examples of road scene segmentation and traffic sign coding into symbols. On the left side of Figure 4 we can see two road scenes. On the right side we have the two segmented road scenes. They are labelled from black to white. There are 6 labels

corresponding to unknowns (black), grass, blue sky, road, white lines and traffic signs (white). Figure 4: Neural net segmentation results. As we see, the segmentation process gives very good results without using morphological processes. The noise level is very low and can be improved easily by applying noise removal techniques. Traffic sign detection becomes very easy in those low-noise segmented images. In Figure 5 we can see the traffic signs from the scenes and the results of coding them into grammar symbols. The traffic signs have different sizes but they are normalized during the coding process. This normalization has to be improved because it introduces noise and shape distortions in the images.

Figure 5: Traffic sign coding.

5 Conclusions At this time, only the segmentation learning process has been completed. Our system is able to segment the road scene and find the traffic sign. The segmentation with neural networks gives very good results on labelling colored images. We have tested our neural nets on different road scenes ( obstacles or shadows on the road, noisy images...). The system has shown to be very robust. The system is also able to locate the traffic sign and code it into symbols with a very small amount of noise. We are presently developing the model level learning process. Traffic sign coding into symbols has been achieved. Presently we are adapting the grammatical inference methodology to the two-dimensional problem. As shown in [7], this methodology gives good results in one dimension. We are evaluating how good the results are with 2D image inputs.

References [1] Charles Thorpe and Tadeo Kanade. 1987 Year End Report for Road Following at Carneige Mellon, CMU-RI-TR-88-4. The Robotics Institute. Carnegie Mellon Uiversity. April 1988. [2] Graefe, V.,Blöchl, B. Visual Recognition of Traffic Situations for an Inteligent Automatic Copilot. PROMETHEUS Workshop, Proceedings of the 5th workshop, Munich, 1991, pp.98-108. [3] Austermeier H., Büker U., Mertsching B., Zimmermann S. Analysis of Traffic Scenes by Using the Hierarchical Structure Code. Advances in Structural and Syntactic Pattern Recognition, proc. of the International Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, August 1992. Bunke and Wang, Series in machine perception and artificial inteligence, Vol. 5, pp.561-570. [4] Català A., Grau A., Morcego B., Fuertes J.M. A Neural Network Texture Segmentation System for Open Road Vehicle Guidance.Proc. of the Intelligent Vehicles 92 Symposium. pp. 247-252, 1992. [5] Pomerleau, D.A. ALVINN: An Autonomous Land Vehicle in a Neural Network, Technical Report CMU-CS-89-107. School of Computer Sience. Carnegie Mellon University. 1989. [6] R. Alquézar, A Sanfeliu. A hybrid connectionist-symbolic approach to regular grammatical inference based on neural learning and hierarchical clustering. Grammatical Inference and Aplicattions, Proc. of the Second Int. Colloquium, ICGI- 94, Alicante (Spain), September 1994, R.C. Carrasco, J. Oncina, eds., Springer Verlag, Lecture Notes in Artificial Intelligence 862, pp.203-211. [7] A. Sanfeliu, R Alquézar. Active grammatical inference: a new learning methodology. Proc. of IAPR Int. Workshop on Structural and Syntatic Pattern Recognition, SSPR'94, Nahariya (Israel), October 4-6, 1994. [8] Shun-Ichi Amari. Mathematical Foundations of Neurocomputing. Proc. of the IEEE, Vol. 78, No. 9, September 1990. pp1443-1462. [9] Fu K.S. Syntactic Pattern recognition and applications. Prentice-Hall 1982.