CSC 400 Mini Proposal: An Universal Text Detection & Recognition System Xiong Zhang September 2015 1 Project Summary As a natural way for people to communicate with each other, text (printed, handwritten) is widely used in people s life. With the digitalization trend of these text, a robust engine is needed to help people easily retrieve and make use of the large amount of data. However, existing technologies mostly focus on narrow scenarios, and often come with performance problems when inputs from other sources are given. To deal with this situation, this project aims to provide an universal solution for text detection & recognition in natural scene. Both academic research and engineering development will be involved, which means except pushing forward the research frontier, robust system will also be built. This project will have 6-9 students working for 18 months. By the end of the project, a robust text detection & recognition system will be available for both developers and end-users. Also from this project, large scale data sets will be collected, which will benefit the research community. Novel applications based on the system will also be investigated too. Since one of the main goals of this project is to provide an entry points for people to easily access text images, potential collaborations with other research areas (web searching, product recommendation, etc) are also welcomed. 2 Project Description 2.1 Background Text is everywhere. It is one of the most natural way for human to communicate. From historical documents, handwritten notes, whiteboards, street scenery, restaurant menus (see Fig1), text is basically everywhere in our daily life. As more and more text is being digitalized to image, this large amount of information, if by proper ways of storing, indexing and retrieval, could have large impacts on people s life and working productivity. 1
Figure 1: Natural scene text samples[1] However, we usually lack of ways to achieve this goal. Although people have been working on the research and development of text detection and recognition systems for many years, and practical systems under restricted conditions have also been developed, some are even made commercial (e.g. automatic check reader, mail address reader, etc)[2, 3, 4], there is still not a robust text detection & recognition machine which can deal with universal text image inputs. 2.2 Objectives From this project, we want to achieve: Build an universal text detection & recognition system which can provide robust indexing and searching for the arbitrary text images. Push forward the state-of-the-art technologies for text detection. Push forward the state-of-the-art technologies for text recognition. Provide novel applications of the text detection & recognition system. 2.3 Relation to Longer-Term Goals As text detection & recognition is at the intersection of many active research fields (e.g. image processing, pattern recognition, machine learning, natural language processing, etc), research on this project matches with the general 2
research interests in our lab. So by building and maintaining a system like this, we can keep the momentum of contributing to the mentioned research area. Also by deploying the system and seeking novel applications based on this system, potential commercial interests are also available. Our long-term goal is to make this project active involved both in research communities and industry companies, and can involve more people. 2.4 Relation to the Present State of Knowledge In the past years a lot of research has been done on this area. For example, text detection technologies for both printed text and handwritten text are available [5, 6], and the printed ones are more robust than the handwritten ones. However, these technologies often lead to separate systems, specially tailored skills for specific data (printed, handwritten) are developed. Universal solutions for both kinds of data still stay in a immature status. Same things also happen when comes to the recognition phase. So in this project we want to leverage techniques from different aspects of the field to come up with an universal solution for data from different sources. Among all the possibilities, deep learning techniques [7] could be one possible solution, because its successful application on many AI fields (speech recognition, face recognition, etc) show its strong power of integrating deep structures of data from different sources. 3 Research Plan In this project we will mainly have 3 teams: Text team, which mainly focuses on research and development of the text detection module. Text recognition team, which mainly focuses on research and development of the text recognition module. System integration and deployment team, which focuses on integrating modules from the first two teams into one system, and deploy it out for external access and testing. This team may also be responsible to develop novel applications based on the end-to-end system. Each team will have 2-3 graduate students as the main researchers and developers. Other short-term visiting scholar or temporary student positions are also available. Note the system team may be founded only after interesting research results come out from the first two teams. We are planning to use 18 months for this project. More specifically: 1. Month 1 to 3: survey phase, try existing technologies on both text detection, & recognition, build a baseline system 3
2. Month 4 to 9: main research phase, propose and implement ideas to improve the baseline system 3. Month 10 to 12: iterating phase, tune parameters inside the system and do benchmark comparisons 4. Month 13 to 17: system integrating 5. Month 18: external testing and finalizing Note the data collection will start from day 1 and go through the whole project. 4 Broader Impacts of the Proposed Work The text detection & recognition system can be treated as an entry point for many other Internet services. For example web searching, product recommendation, and user needs mining can all be based on the recognition results of the system. This will provide opportunities in lots of areas, whether in research wise or industry application wise. Also, by deploying out the service, huge amount of data uploaded from users (under proper user agreement) can be valuable for future research and development. Selected data sets can even be used as standard benchmark sets. 5 Required Resources The following resources are needed: Computation machines (server, work stations) Data sets (may be bought from other sources or collected by ourselves) Funding for other uses (travelling, conference registration, etc) References [1] J. Feild, Improving text recognition in images of natural scenes, 2014. [2] N. Gorski, V. Anisimov, E. Augustin, O. Baret, and S. Maximov, Industrial bank check processing: the a2ia checkreadertm, International Journal on Document Analysis and Recognition, vol. 3, no. 4, pp. 196 206, 2001. [3] A. Kaltenmeier, T. Caesar, J. M. Gloger, and E. Mandler, Sophisticated topology of hidden markov models for cursive script recognition, in Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on, pp. 139 142, IEEE, 1993. 4
[4] S. N. Srihari, Handwritten address interpretation: a task of many pattern recognition problems, International journal of pattern recognition and artificial intelligence, vol. 14, no. 05, pp. 663 674, 2000. [5] D. Chen, J.-M. Odobez, and H. Bourlard, Text detection and recognition in images and video frames, Pattern recognition, vol. 37, no. 3, pp. 595 608, 2004. [6] Y. Li, Y. Zheng, and D. Doermann, Detecting text lines in handwritten documents, in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, vol. 2, pp. 1030 1033, IEEE, 2006. [7] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, A novel connectionist system for unconstrained handwriting recognition, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 5, pp. 855 868, 2009. 5