CSC 400 Mini Proposal: An Universal Text Detection & Recognition System

Similar documents
Word Segmentation of Off-line Handwritten Documents

Dropout improves Recurrent Neural Networks for Handwriting Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Speech Emotion Recognition Using Support Vector Machine

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Probabilistic Latent Semantic Analysis

Computerized Adaptive Psychological Testing A Personalisation Perspective

An Ocr System For Printed Nasta liq Script: A Segmentation Based Approach

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Large vocabulary off-line handwriting recognition: A survey

Automating the E-learning Personalization

Python Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

An Online Handwriting Recognition System For Turkish

Teaching Algorithm Development Skills

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

A Pipelined Approach for Iterative Software Process Model

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 1: Basic Concepts of Machine Learning

Introduction of Open-Source e-learning Environment and Resources: A Novel Approach for Secondary Schools in Tanzania

Introduction to Causal Inference. Problem Set 1. Required Problems

Learning Methods in Multilingual Speech Recognition

Agent-Based Software Engineering

Laboratorio di Intelligenza Artificiale e Robotica

Certified Six Sigma - Black Belt VS-1104

Matching Similarity for Keyword-Based Clustering

INPE São José dos Campos

Knowledge Transfer in Deep Convolutional Neural Nets

Deploying Agile Practices in Organizations: A Case Study

Georgetown University at TREC 2017 Dynamic Domain Track

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics

Rule Learning With Negation: Issues Regarding Effectiveness

Axiom 2013 Team Description Paper

Forget catastrophic forgetting: AI that learns after deployment

Towards a Mobile Software Engineering Education

A Case-Based Approach To Imitation Learning in Robotic Agents

Knowledge-Based - Systems

A student diagnosing and evaluation system for laboratory-based academic exercises

arxiv: v2 [cs.ro] 3 Mar 2017

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Data Fusion Models in WSNs: Comparison and Analysis

SITUATING AN ENVIRONMENT TO PROMOTE DESIGN CREATIVITY BY EXPANDING STRUCTURE HOLES

Nonfunctional Requirements: From Elicitation to Conceptual Models

ZHANG Xiaojun, XIONG Xiaoliang School of Finance and Business English, Wuhan Yangtze Business University, P.R.China,

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT

TD(λ) and Q-Learning Based Ludo Players

Rule Learning with Negation: Issues Regarding Effectiveness

Softprop: Softmax Neural Network Backpropagation Learning

Learning Methods for Fuzzy Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Cooperative Training of Power Systems' Restoration Techniques

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Mining Association Rules in Student s Assessment Data

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

A study of speaker adaptation for DNN-based speech synthesis

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Skillsoft Acquires SumTotal: Frequently Asked Questions. October 2014

Arabic Orthography vs. Arabic OCR

Reducing Features to Improve Bug Prediction

Problems of the Arabic OCR: New Attitudes

Time series prediction

IMPROVE THE QUALITY OF WELDING

An Introduction to Simio for Beginners

From Virtual University to Mobile Learning on the Digital Campus: Experiences from Implementing a Notebook-University

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Probability estimates in a scenario tree

Integrating E-learning Environments with Computational Intelligence Assessment Agents

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

On-Line Data Analytics

WHEN THERE IS A mismatch between the acoustic

Investigation and Analysis of College Students Cognition in Science and Technology Competitions

Humboldt-Universität zu Berlin

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Role of Blackboard Platform in Undergraduate Education A case study on physiology learning in nurse major

Specification of the Verity Learning Companion and Self-Assessment Tool

Human Emotion Recognition From Speech

Educator s e-portfolio in the Modern University

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

UNIVERSITY OF DAR-ES-SALAAM OFFICE OF VICE CHANCELLOR-ACADEMIC DIRECTORATE OF POSTGRADUATE STUDIUES

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Laboratorio di Intelligenza Artificiale e Robotica

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Evolutive Neural Net Fuzzy Filtering: Basic Description

Blended Learning Module Design Template

Transcription:

CSC 400 Mini Proposal: An Universal Text Detection & Recognition System Xiong Zhang September 2015 1 Project Summary As a natural way for people to communicate with each other, text (printed, handwritten) is widely used in people s life. With the digitalization trend of these text, a robust engine is needed to help people easily retrieve and make use of the large amount of data. However, existing technologies mostly focus on narrow scenarios, and often come with performance problems when inputs from other sources are given. To deal with this situation, this project aims to provide an universal solution for text detection & recognition in natural scene. Both academic research and engineering development will be involved, which means except pushing forward the research frontier, robust system will also be built. This project will have 6-9 students working for 18 months. By the end of the project, a robust text detection & recognition system will be available for both developers and end-users. Also from this project, large scale data sets will be collected, which will benefit the research community. Novel applications based on the system will also be investigated too. Since one of the main goals of this project is to provide an entry points for people to easily access text images, potential collaborations with other research areas (web searching, product recommendation, etc) are also welcomed. 2 Project Description 2.1 Background Text is everywhere. It is one of the most natural way for human to communicate. From historical documents, handwritten notes, whiteboards, street scenery, restaurant menus (see Fig1), text is basically everywhere in our daily life. As more and more text is being digitalized to image, this large amount of information, if by proper ways of storing, indexing and retrieval, could have large impacts on people s life and working productivity. 1

Figure 1: Natural scene text samples[1] However, we usually lack of ways to achieve this goal. Although people have been working on the research and development of text detection and recognition systems for many years, and practical systems under restricted conditions have also been developed, some are even made commercial (e.g. automatic check reader, mail address reader, etc)[2, 3, 4], there is still not a robust text detection & recognition machine which can deal with universal text image inputs. 2.2 Objectives From this project, we want to achieve: Build an universal text detection & recognition system which can provide robust indexing and searching for the arbitrary text images. Push forward the state-of-the-art technologies for text detection. Push forward the state-of-the-art technologies for text recognition. Provide novel applications of the text detection & recognition system. 2.3 Relation to Longer-Term Goals As text detection & recognition is at the intersection of many active research fields (e.g. image processing, pattern recognition, machine learning, natural language processing, etc), research on this project matches with the general 2

research interests in our lab. So by building and maintaining a system like this, we can keep the momentum of contributing to the mentioned research area. Also by deploying the system and seeking novel applications based on this system, potential commercial interests are also available. Our long-term goal is to make this project active involved both in research communities and industry companies, and can involve more people. 2.4 Relation to the Present State of Knowledge In the past years a lot of research has been done on this area. For example, text detection technologies for both printed text and handwritten text are available [5, 6], and the printed ones are more robust than the handwritten ones. However, these technologies often lead to separate systems, specially tailored skills for specific data (printed, handwritten) are developed. Universal solutions for both kinds of data still stay in a immature status. Same things also happen when comes to the recognition phase. So in this project we want to leverage techniques from different aspects of the field to come up with an universal solution for data from different sources. Among all the possibilities, deep learning techniques [7] could be one possible solution, because its successful application on many AI fields (speech recognition, face recognition, etc) show its strong power of integrating deep structures of data from different sources. 3 Research Plan In this project we will mainly have 3 teams: Text team, which mainly focuses on research and development of the text detection module. Text recognition team, which mainly focuses on research and development of the text recognition module. System integration and deployment team, which focuses on integrating modules from the first two teams into one system, and deploy it out for external access and testing. This team may also be responsible to develop novel applications based on the end-to-end system. Each team will have 2-3 graduate students as the main researchers and developers. Other short-term visiting scholar or temporary student positions are also available. Note the system team may be founded only after interesting research results come out from the first two teams. We are planning to use 18 months for this project. More specifically: 1. Month 1 to 3: survey phase, try existing technologies on both text detection, & recognition, build a baseline system 3

2. Month 4 to 9: main research phase, propose and implement ideas to improve the baseline system 3. Month 10 to 12: iterating phase, tune parameters inside the system and do benchmark comparisons 4. Month 13 to 17: system integrating 5. Month 18: external testing and finalizing Note the data collection will start from day 1 and go through the whole project. 4 Broader Impacts of the Proposed Work The text detection & recognition system can be treated as an entry point for many other Internet services. For example web searching, product recommendation, and user needs mining can all be based on the recognition results of the system. This will provide opportunities in lots of areas, whether in research wise or industry application wise. Also, by deploying out the service, huge amount of data uploaded from users (under proper user agreement) can be valuable for future research and development. Selected data sets can even be used as standard benchmark sets. 5 Required Resources The following resources are needed: Computation machines (server, work stations) Data sets (may be bought from other sources or collected by ourselves) Funding for other uses (travelling, conference registration, etc) References [1] J. Feild, Improving text recognition in images of natural scenes, 2014. [2] N. Gorski, V. Anisimov, E. Augustin, O. Baret, and S. Maximov, Industrial bank check processing: the a2ia checkreadertm, International Journal on Document Analysis and Recognition, vol. 3, no. 4, pp. 196 206, 2001. [3] A. Kaltenmeier, T. Caesar, J. M. Gloger, and E. Mandler, Sophisticated topology of hidden markov models for cursive script recognition, in Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on, pp. 139 142, IEEE, 1993. 4

[4] S. N. Srihari, Handwritten address interpretation: a task of many pattern recognition problems, International journal of pattern recognition and artificial intelligence, vol. 14, no. 05, pp. 663 674, 2000. [5] D. Chen, J.-M. Odobez, and H. Bourlard, Text detection and recognition in images and video frames, Pattern recognition, vol. 37, no. 3, pp. 595 608, 2004. [6] Y. Li, Y. Zheng, and D. Doermann, Detecting text lines in handwritten documents, in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, vol. 2, pp. 1030 1033, IEEE, 2006. [7] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, A novel connectionist system for unconstrained handwriting recognition, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 5, pp. 855 868, 2009. 5