Analysis of Error Count Distributions for Improving the Postprocessing Performance of OCCR
|
|
- Richard Moody
- 5 years ago
- Views:
Transcription
1 Analysis of Error Count Distributions for Improving the Postprocessing Performance of OCCR Yue-Shi Lee and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan, R.O.C. {leeys, Submitted on 23 July, 1996, Revised on 17 November, 1996 and Accepted on 29 November, 1996 Abstract Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is not so good as expect when the test data contain more unseen context, e.g., proper nouns such as personal names and organizational names. This paper addresses the importance of analyzing the error count distributions before applying the language models. According to the analysis, more than 50% of errors can be reduced and more than 90% of time can be saved on the average based on the Markov character bigram model. Keywords: Contextual Language Processing, Error Count Distributions, Image Processing, Markov Model, OCCR, Unseen Context 1 Introduction To improve the interface with computers, the development of input devices such as optical character recognition (OCR) device and speech recognition (SR) device is expected. The OCR device is a good choice while the printed documents are provided. However, optical Chinese character recognition (OCCR) is an extremely challenging task due to the different multifonts, complex shapes and the very large vocabularies 1. Because the misrecognitions of image processing are hard to be avoided, contextual postprocessing (language processing) of the recognition is indispensable for both reducing the recognition errors caused by the preprocessing (image processing) of the recognition and saving the time in human proofreading. Contextual language processing for the postprocessing of OCR is not new. Shyu, et al. [1] adopt a word-lattice-based Markov character bigram model suggested by [2] to the OCCR system. Chou and Chang [3] use a Markov word unigram model and a confusion matrix to decide the most plausible characters. Chang and Chen [4] combine a noisy channel model and a language model to implement the postprocessing of the OCCR. Araki, et al. [5-6] propose a selective error-correction method to detect and correct erroneous characters in Japanese text input through an OCR. Shinghal [7], Sinha and Prasada [8] also propose approaches for English. The purpose of the contextual language processing is to find the most plausible candidate for each image character with the maximum likelihood probability. The above approaches claim that it plays an important role and has much effect in the postprocessing of the OCR systems. In general, it performs well. However, its performance is not so good as expect when the test data contain more unseen context, e.g., proper nouns such as personal names and organizational names. Besides, some frequently used characters are always selected by the language models, but they may be wrong in some cases. Therefore, if we can predict which image characters have been recognized correctly by the image processing module, the above problems can be alleviated. That is, it is important to make an analysis before applying the language models. This paper is organized as follows. Section 2 presents our OCCR system. Section 3 introduces the language models used in this paper. Section 4 describes the analysis methods and demonstrates the proposed methods. Section 5 is the concluding remarks. 2 System Description The proposed system shown in Fig. 1 consists of two major modules: (1) Image Processing Module (Preprocessing) and (2) Language Processing Module (Postprocessing). The image processing module contains three submodules: (1) Image Segmentation, (2) Feature Extraction, and (3) Feature Matching. An optical scanner scans the printed document and converts it into an image document. The image segmentation submodule segments the entire image into blocks and then classifies each block into a text, graphic or picture block. The text blocks are further segmented into individual character blocks subsequently. Each character block stands for an image character. Fig. 2(a) shows a simplified example of image document. After the image segmentation submodule is applied, the segmented image document is shown in Fig. 2(b). Once the image characters have been segmented properly, the feature extraction submodule extracts the features from each image character. In the feature matching stage, the extracted features of each individual image character are matched to a feature database to recognize the character. The top ten candidates, which form a candidate set, for each image character are generated for the subsequent language processing. Fig. 3 shows the candidates for each image character of Fig. 2(b). 1 There are about 13,000 characters in Chinese.
2 Image Document Image Processing Module (Preprocessing) Image Segmentation Image base Feature Extraction Feature Extraction Dictionary Feature Matching Feature base Language Processing Module (Postprocessing) Analysis of Error Count Distributions Markov Character Bigram Model Character Bigram Table Character Unigram Table Text Document Fig. 1. Block Diagram of the Proposed System Character Blocks (Image ) (a) (b) Fig. 2. An Example of Image Segmentation Â Ã Ä Å Æ Ç È É ! " # $ % & ' ( ) * , / Fig. 3. The Top Ten Candidates for Each Image Character of Fig. 2(b) In Fig. 3, a number follows each candidate. The number indicates the error count between the current image character and the image character stored in the image database according to the features. The lower the error count is, the more the similarity between two image characters is. Thus, the first candidate of each image character is the most plausible candidate based on the image processing module. The error count can be used to calculate the probability of each candidate given the image character. Given the k-th image character i k, the matching score of the j-th candidate c j (SCORE j ) is defined as follows [9-10]. ÂÃÄÅ = ÅÇÇÈÇÉÂÈÉÈÉÂ Æ ÅÇÇÈÇÉÂÈÉÈÉ + Æ Â Based on the definition of SCORE j, the probability of the j-th candidate c j given the image character i k is calculated as follows. ÅÆÇÈÉ Ã Â Âà à ÄÄ = ÇÈ ÅÆÇÈÉ Å ÅÆÇ During the language processing stage, the analysis of error count distributions and the Markov character bigram model are adopted simultaneously to deal with the problems of recognition errors caused by the image processing module and yield the final text document. This paper focuses on the postprocessing of the recognition especially for the analysis of error count distributions.
3 3 Language Models in OCCR The problem of OCCR can be defined as how to convert a sequence of image characters I into the corresponding sequence of characters  correctly based on the language models. In this paper, a statistical Markov character bigram model is adopted to improve the recognition rate. Let I=<i 1, i 2, i 3,..., i n > be an image character string and C=<c 1, c 2, c 3,..., c n > be one of the possible character strings. Here, c i denotes one of the characters in the i-th candidate set. The conversion can be formulated as follows.  Âà ÄÅ Â Âà ÃÄÅÆÇÈ Â Äà ÃÄÇ... (1) The former probability, i.e., P IP (C I), is produced by the image processing module and the latter probability, i.e., P LP (C) is calculated by the language processing module. If the contextual information, i.e., P LP (C), is ignored, the above formula becomes as follows.  Âà ÄÅ The P IP (C I) is defined as follows. ÂÃÄÅÆ =   à à Ã= Â Ã Ä Å Æ Ä Å Â Âà ÃÄÅÆÇ (2) The definition of P IP (c j i j ), i.e., the probability of candidate c j given the image character i j, is described in Section 2. By using Formula 2, the first candidate, which has the lowest error count, is always selected as the result. If more than one candidate has the same error count, the most frequently used character is selected as the result through dictionary lookup (see Fig. 1). Similarly, if the P IP (C I) is ignored in, the formula becomes as follows.  Âà ÄÅ Â Âà ÃÄÅ (3) In this paper, the P LP (C) is simplified as a Markov character bigram model shown below. Ä + à à +  ÂÃÄ Â ÂÃÄ ÅÆ ÂÃÄ ÇÄ Å Â Â ÂÃ Ã Ä Å Ä Æ Â Â = à =  In this formula, c 0 and c n+1 mark the beginning and the ending of the character string, respectively. According to the above formulas, the preliminary results are shown in Table 1. Table 1. The Preliminary Results Correct Rate for Correct Rate for Formula 2 Correct Rate for % 98.48% 88.75% % 96.57% 88.56% % 97.00% 90.19% % 97.28% 89.47% % 96.51% 87.89% % 94.76% 89.16% Total 97.84% 96.83% 88.97% In these experiments, a Chinese unsegmented newspaper corpus is adopted as the source of the training data to train the Markov character bigram probabilities. It includes approximately 360,000 sentences (about 4,000,000 characters). The test data (6 articles) are scanned from the Liberty Times. It includes 237 sentences (2457 characters). In Table 1, it is clear that using the contextual information () only to select the most plausible candidate does not gain the advantages in these experiments. This is because the image processing module has the excellent performance and the test data (news) contains many proper nouns such as personal names and organizational names which are difficult to be solved by the language models. Besides, some frequently used characters are always selected by, but they may be wrong in some cases. Because combines P IP (C I) and P LP (C), these effects are alleviated. However, they still have some influences. The subsequent analysis will demonstrate this point. The preliminary results for Formula 2 are discussed in detail and the statistic information is shown in Table 2. Table 2. The Statistic Information of the Preliminary Results for Formula 2 Correctly Image Wrongly Image Correct within the Top Ten Candidates Total In the above table, 1 has 5 image characters wrongly recognized by using Formula 2. That is, the first candidate is not the correct result in these five image characters. But 4 of 5 can be found within the top ten candidates. From Table 2, 84.62% (( )/78) wrongly recognized image characters can be recovered to the correct ones by using the characters within the top ten candidates. This is a good phenomenon while the contextual information can be successfully applied to the wrongly recognized positions. Tables 3 and 4 show the detail statistic information of the preliminary results for Formulas 1 and 3, respectively. Table 3. The Detail Statistic Information of the Preliminary Results for Correct Wrong CC CW WC WW Net Gain Total
4 Table 4. The Detail Statistic Information of the Preliminary Results for Correct Wrong CC CW WC WW Net Gain Total In these two tables, Correct (Wrong) denotes the number of correctly (wrongly) recognized image characters 2. Columns 4, 5, 6 and 7 indicate the performance changes from image processing module (preprocessing) to language processing module (postprocessing). They can be classified into four types: Correctto-Correct (CC), Correct-to-Wrong (CW), Wrong-to-Correct (WC) and Wrong-to-Wrong (WW). In the CW type, an image character which is correctly recognized by the image processing module is changed to a wrong one by the language processing module. In the WC type, a wrongly recognized character is recovered to the correct one by the language processing module. In the CC type, no characters are changed. In the WW type, a wrongly recognized character is not changed or is changed to another wrong one. The performance of the language processing module can be evaluated as the net gain shown as follows. Net Gain = WC - CW In Table 4, the Net Gains are all negative. It reveals the language processing module cannot be effectively applied to the OCCR application when the P IP (C I) is ignored. But the Net Gains of 1 and 2 in Table 3 are also negative even the P IP (C I) is incorporated with the P LP (C). In Table 3 (4), 32 (247) image characters which are correctly recognized by the image processing module are changed to the wrong ones by the language processing module. However, 57 (54) image characters which are wrongly recognized by the image processing module are recovered to the correct ones by the language processing module. Because Table 2 shows that 66 ( ) wrongly recognized characters may be recovered by the language processing module, the language processing module performs well in these wrongly recognized positions. That is, if we can predict that which position has correctly recognized by the image processing module, the first candidate is selected as the result. The other candidates (from the second candidate to the tenth candidate) can be removed from the candidate set and will not be tried by the language processing module. Under this way, the Net Gain can be turned to positive value and the effects of language processing module can be shown. In the next section, we will describe how to predict if a position is correctly or wrongly recognized by the image processing module. 2 Correct = CC + WC Wrong = CW + WW 4 Analysis of Error Count Distributions To decide which image character has been recognized by the image processing module correctly, the only information that we can use is the error count of each candidate. In this paper, an image character is assumed to be correctly recognized by the image processing module based on the following two hypotheses. (1) The error count of the first candidate in the candidate set must be less than a threshold value A. (2) The difference of the error count between the first candidate and the second candidate in the candidate set must be greater than a threshold value B. Table 5 shows the error count distribution for the first hypothesis. Table 5. The Error Count Distribution for the First Hypothesis The Range of the Error Count for the First Candidate Correctly Image Character Wrongly Image Character 0 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ In Table 5, Column 1 indicates the range of the error count for the first candidate. Column 2 (3) indicates the number of the image characters which are correctly (wrongly) recognized by the image processing module given the condition in Column 1. For example, there are 407 correctly recognized image characters and 6 wrongly recognized image characters when the error counts of their first candidates are between 2500 and In Table 5, we can find that an image character is correctly recognized by the image processing module if the error count of its first candidate is less than 2500, i.e., threshold value A. That is, total 234 ( ) positions can be correctly detected. The first candidate can be selected as the correct result and the other candidates are not tried by the language processing module. However, this hypothesis only obtains little improvements (9.52%) because total number of positions (image characters) is Table 6 shows the error count distribution for the second hypothesis. In Table 6, Column 1 indicates the range of the difference of the error count between the first candidate and the second candidate in the candidate set. Column 2 (3) indicates the number of correctly (wrongly) recognized image characters under the condition in Column 1. For example, there are only 136 correctly recognized image characters and 63 wrongly recognized image characters when the difference of the error count between the first candidate and the second candidate is less than 200, i.e., threshold value B. -4-
5 Table 6. The Error Count Distribution for the Second Hypothesis The Range of the Difference of the Error Count Correctly Image Wrongly Image 0 ~ ~ ~ ~ ~ That is, if we assume that the first candidate is correct when the difference of the error count is greater than 200, total 2258 ( ) positions are identified. Of these, 2243 are correct and 15 (78-63) are wrong. That is, these 15 wrongly recognized characters are identified as correct based on this hypothesis. However, only 199 (136+63) positions have to be considered further. It is clear that this analysis is useful because most of the characters are identified correctly in advance. Tables 7 and 8 show the experimental results for Formulas 1 and 3 based on two hypotheses. Table 7. The Experimental Results for Based on Two Hypotheses Correct Wrong Total Table 8. The Experimental Results for Based on Two Hypotheses Correct Wrong Total In Tables 7 and 8, A and B denote the threshold values for Hypothesis I and Hypothesis II, respectively. The performance in these two experiments is increased very much. For example, the original Markov character bigram model based on (3) has 53 (271) errors. After the analysis, the recognition errors reduce to 30 (53) under the threshold values, i.e., and. That is, 43.40% (80.44%) of errors on the average are reduced by the analysis. The threshold values A and B in hypotheses I and II highly depend on the quality of the printed documents. It does not depend on the type and domain of the context. Another 7 printed documents are also scanned for testing. The experimental results are shown in Tables 9 and 10. Table 9. The Experimental Results before the Analysis Correctly Wrongly Correctly Wrongly Total Table 10. The Experimental Results after the Analysis Correctly Wrongly Correctly Wrongly Total The threshold values, A and B, are set to 2500 and 300, respectively. It is clear that the experimental results are similar to the previous ones. Without the analysis, the correct rates for Formulas 1 and 3 are 96.81% and 86.49%, respectively. By using the analysis, the correct rates for Formulas 1 and 3 are 98.40% and 97.54%, respectively. That is, 50.00% and 81.82% of errors on the average are reduced by the analysis for Formulas 1 and 3, respectively. Besides, the processing speed is also saved after applying the analysis. Without the analysis, the processing speed is 1.67 characters per second. By using the analysis, the -5-
6 processing speed becomes characters per second under PC- 486/DX That is, the analysis saves 92.26% of time on the average. 5 Concluding Remarks A standard approach to reduce the recognition errors caused by the preprocessing, i.e., image processing, is to use the corpusbased language models in the postprocessing, i.e., language processing. This paper proposes the analysis of error count distributions to alleviate the problems caused by the contextual language processing. The experimental results show the analysis can reduce more than 50% of errors and save more than 90% of time on the average based on the Markov character bigram model. Besides, this simple but effective analysis can also be applied to other natural language applications such as speech recognition [2] and handwriting recognition [9,10,11]. References [1] K.H. Shyu, et al., "An OCR Based Translation System between Simplified and Complex Chinese," Computer Processing of Chinese and Oriental Languages, Vol. 9, No. 1, pp , [2] L.S. Lee, et al., "Golden Mandarin (II) - An Improved Single-Chip Real-Time Mandarin Dictation Machine for Chinese Language with Very Large Vocabulary," Proceedings of ICASSP, pp , [3] B.H. Chou and J.S. Chang, "The Language Models in Optical Chinese Character Recognition," Proceedings of ROCLING V, pp , [4] J.S. Chang and S.D. Chen, The Postprocessing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model, Proceedings of PACLIC, pp , [5] T. Araki, S. Ikehara, et al., An Evaluation of a Method to Detect and Correct Erroneous in Japanese Input through an OCR Using Markov Models, Proceedings of Applied Natural Language Processing, pp , [6] T. Araki, S. Ikehara, et al., An Evaluation to Detect and Correct Erroneous Wrongly Substituted, Deleted and Inserted in Japanese and English Sentences Using Markov Models, Proceedings of COLING, pp , [7] R. Shinghal, "A Hybrid Algorithm for Contextual Text Recognition," Pattern Recognition, Vol. 16, No. 2, pp , [8] R.M.K. Sinha and B. Prasada, "Visual Text Recognition Through Contextual Processing," Pattern Recognition, Vol. 21, No. 5, pp , [9] H.J. Lee, C.H. Tung and C.H. Chang Chien, "A Markov Model in Handwritten Chinese Text Recognition," Proceedings of ICDAR, pp , [10] C.H. Tung and H.J. Lee, "Increasing Character Recognition Accuracy by Detection and Correction of Erroneously Identified," Pattern Recognition, Vol. 27, No. 9, pp , [11] C.H. Chang, " Word Class Discovery for Postprocessing Chinese Handwriting Recognition," Proceedings of COLING, pp ,
Noisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion
Computational Linguistics and Chinese Language Processing vol. 3, no. 2, August 1998, pp. 79-92 79 Computational Linguistics Society of R.O.C. Noisy Channel Models for Corrupted Chinese Text Restoration
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationLongest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov Dec. 2015), PP 01-07 www.iosrjournals.org Longest Common Subsequence: A Method for
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationInvestigation on Mandarin Broadcast News Speech Recognition
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2
More informationSIE: Speech Enabled Interface for E-Learning
SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationDistributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning
Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning Ben Chang, Department of E-Learning Design and Management, National Chiayi University, 85 Wenlong, Mingsuin, Chiayi County
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationK 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11
Iron Mountain Public Schools Standards (modified METS) - K-8 Checklist by Grade Levels Grades K through 2 Technology Standards and Expectations (by the end of Grade 2) 1. Basic Operations and Concepts.
More informationAn Online Handwriting Recognition System For Turkish
An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationLarge vocabulary off-line handwriting recognition: A survey
Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationHistorical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach To cite this
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationDeveloping True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability
Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan
More informationHow to Judge the Quality of an Objective Classroom Test
How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationEffectiveness of Electronic Dictionary in College Students English Learning
2016 International Conference on Mechanical, Control, Electric, Mechatronics, Information and Computer (MCEMIC 2016) ISBN: 978-1-60595-352-6 Effectiveness of Electronic Dictionary in College Students English
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationLearning Microsoft Office Excel
A Correlation and Narrative Brief of Learning Microsoft Office Excel 2010 2012 To the Tennessee for Tennessee for TEXTBOOK NARRATIVE FOR THE STATE OF TENNESEE Student Edition with CD-ROM (ISBN: 9780135112106)
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationBiome I Can Statements
Biome I Can Statements I can recognize the meanings of abbreviations. I can use dictionaries, thesauruses, glossaries, textual features (footnotes, sidebars, etc.) and technology to define and pronounce
More informationUsing computational modeling in language acquisition research
Chapter 8 Using computational modeling in language acquisition research Lisa Pearl 1. Introduction Language acquisition research is often concerned with questions of what, when, and how what children know,
More informationSearch right and thou shalt find... Using Web Queries for Learner Error Detection
Search right and thou shalt find... Using Web Queries for Learner Error Detection Michael Gamon Claudia Leacock Microsoft Research Butler Hill Group One Microsoft Way P.O. Box 935 Redmond, WA 981052, USA
More informationDigital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown
Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology Michael L. Connell University of Houston - Downtown Sergei Abramovich State University of New York at Potsdam Introduction
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationMultilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationCase study Norway case 1
Case study Norway case 1 School : B (primary school) Theme: Science microorganisms Dates of lessons: March 26-27 th 2015 Age of students: 10-11 (grade 5) Data sources: Pre- and post-interview with 1 teacher
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationReading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-
New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationFull text of O L O W Science As Inquiry conference. Science as Inquiry
Page 1 of 5 Full text of O L O W Science As Inquiry conference Reception Meeting Room Resources Oceanside Unifying Concepts and Processes Science As Inquiry Physical Science Life Science Earth & Space
More informationRead&Write Gold is a software application and can be downloaded in Macintosh or PC version directly from https://download.uky.edu
UK 101 - READ&WRITE GOLD LESSON PLAN I. Goal: Students will be able to describe features of Read&Write Gold that will benefit themselves and/or their peers. II. Materials: There are two options for demonstrating
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationPAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))
Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationA Quantitative Method for Machine Translation Evaluation
A Quantitative Method for Machine Translation Evaluation Jesús Tomás Escola Politècnica Superior de Gandia Universitat Politècnica de València jtomas@upv.es Josep Àngel Mas Departament d Idiomes Universitat
More informationThe Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University
The Effect of Extensive Reading on Developing the Grammatical Accuracy of the EFL Freshmen at Al Al-Bayt University Kifah Rakan Alqadi Al Al-Bayt University Faculty of Arts Department of English Language
More informationCircuit Simulators: A Revolutionary E-Learning Platform
Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationDisciplinary Literacy in Science
Disciplinary Literacy in Science 18 th UCF Literacy Symposium 4/1/2016 Vicky Zygouris-Coe, Ph.D. UCF, CEDHP vzygouri@ucf.edu April 1, 2016 Objectives Examine the benefits of disciplinary literacy for science
More informationLEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano
LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES Judith Gaspers and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University {jgaspers cimiano}@cit-ec.uni-bielefeld.de ABSTRACT Semantic parsers
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationLEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE
LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)
More informationTimeline. Recommendations
Introduction Advanced Placement Course Credit Alignment Recommendations In 2007, the State of Ohio Legislature passed legislation mandating the Board of Regents to recommend and the Chancellor to adopt
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationInterpreting ACER Test Results
Interpreting ACER Test Results This document briefly explains the different reports provided by the online ACER Progressive Achievement Tests (PAT). More detailed information can be found in the relevant
More informationMath 96: Intermediate Algebra in Context
: Intermediate Algebra in Context Syllabus Spring Quarter 2016 Daily, 9:20 10:30am Instructor: Lauri Lindberg Office Hours@ tutoring: Tutoring Center (CAS-504) 8 9am & 1 2pm daily STEM (Math) Center (RAI-338)
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationDublin City Schools Broadcast Video I Graded Course of Study GRADES 9-12
Philosophy The Broadcast and Video Production Satellite Program in the Dublin City School District is dedicated to developing students media production skills in an atmosphere that includes stateof-the-art
More information5. UPPER INTERMEDIATE
Triolearn General Programmes adapt the standards and the Qualifications of Common European Framework of Reference (CEFR) and Cambridge ESOL. It is designed to be compatible to the local and the regional
More informationSubject: Opening the American West. What are you teaching? Explorations of Lewis and Clark
Theme 2: My World & Others (Geography) Grade 5: Lewis and Clark: Opening the American West by Ellen Rodger (U.S. Geography) This 4MAT lesson incorporates activities in the Daily Lesson Guide (DLG) that
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationListening and Speaking Skills of English Language of Adolescents of Government and Private Schools
Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present
More informationGCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education
GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationRobot manipulations and development of spatial imagery
Robot manipulations and development of spatial imagery Author: Igor M. Verner, Technion Israel Institute of Technology, Haifa, 32000, ISRAEL ttrigor@tx.technion.ac.il Abstract This paper considers spatial
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationAnalysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription
Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription Wilny Wilson.P M.Tech Computer Science Student Thejus Engineering College Thrissur, India. Sindhu.S Computer
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationAnalysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:
In 1956, Benjamin Bloom headed a group of educational psychologists who developed a classification of levels of intellectual behavior important in learning. Bloom found that over 95 % of the test questions
More informationDetecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011
Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More information