Optical Character Recognition Domain Expert Approximation Through Oracle Learning

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Knowledge Transfer in Deep Convolutional Neural Nets

Rule Learning with Negation: Issues Regarding Effectiveness

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning With Negation: Issues Regarding Effectiveness

CSL465/603 - Machine Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CS Machine Learning

Python Machine Learning

Cooperative evolutive concept learning: an empirical study

An investigation of imitation learning algorithms for structured prediction

Handling Concept Drifts Using Dynamic Selection of Classifiers

Learning Methods for Fuzzy Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

SOFTWARE EVALUATION TOOL

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Mining Association Rules in Student s Assessment Data

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Lecture 1: Basic Concepts of Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Reducing Features to Improve Bug Prediction

Learning From the Past with Experiment Databases

Abstractions and the Brain

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Henry Tirri* Petri Myllymgki

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

WHEN THERE IS A mismatch between the acoustic

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Lecture 1: Machine Learning Basics

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Data Stream Processing and Analytics

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Universidade do Minho Escola de Engenharia

Probabilistic Latent Semantic Analysis

Evolution of Symbolisation in Chimpanzees and Neural Nets

The Boosting Approach to Machine Learning An Overview

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Computerized Adaptive Psychological Testing A Personalisation Perspective

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Assignment 1: Predicting Amazon Review Ratings

A Case Study: News Classification Based on Term Frequency

Learning Rules from Incomplete Examples via Implicit Mention Models

Hayward Unified School District Community Meeting #2 at

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Test Effort Estimation Using Neural Network

Speech Recognition at ICSI: Broadcast News and beyond

Using focal point learning to improve human machine tacit coordination

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Mathematics Success Grade 7

Device Independence and Extensibility in Gesture Recognition

Combining Proactive and Reactive Predictions for Data Streams

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

1. READING ENGAGEMENT 2. ORAL READING FLUENCY

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Modeling function word errors in DNN-HMM based LVCSR systems

Interactive Whiteboard

Three Strategies for Open Source Deployment: Substitution, Innovation, and Knowledge Reuse

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Department of Economics Phone: (617) Boston University Fax: (617) Bay State Road

Classification Using ANN: A Review

Chapter 4 - Fractions

Learning Methods in Multilingual Speech Recognition

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Semi-Supervised Face Detection

Arabic Orthography vs. Arabic OCR

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Using computational modeling in language acquisition research

Calibration of Confidence Measures in Speech Recognition

Action Models and their Induction

A Critique of Running Records

Generative models and adversarial training

Content-based Image Retrieval Using Image Regions as Query Examples

The MEANING Multilingual Central Repository

Modeling function word errors in DNN-HMM based LVCSR systems

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

CS 598 Natural Language Processing

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A survey of multi-view machine learning

Key concepts for the insider-researcher

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Playing It By Ear The First Year of SCHEMaTC: South Carolina High Energy Mathematics Teachers Circle

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

SARDNET: A Self-Organizing Feature Map for Sequences

Evolutive Neural Net Fuzzy Filtering: Basic Description

Transcription:

Optical Character Recognition Domain Expert Approximation Through Oracle Learning Joshua Menke NNML Lab BYU CS josh@cs.byu.edu March 24, 2004 BYU CS

Optical Character Recognition (OCR) optical character recognition (OCR): given an image, give the letter R BYU CS 1

OCR with ANNs Artificial Neural Networks (ANNs) Powerful adaptive machine learning models Trained for OCR to recognize images as letters 98%+ accuracy ANN R BYU CS 2

Problem: Varying Noise The amount of noise in a given image can vary for the same letter Yields two domains, noisy and clean. BYU CS 3

Problem: Varying Noise The amount of noise in a given image can vary for the same letter Yields two domains, noisy and clean. BYU CS 4

Varying Noise: Common Solution Train one ANN (ANN mixed ) on clean and noisy images mixed Problem: Noisy regions in the domain are more difficult to approximate ANNs will learn the easier, clean images first. Then will continue training to learn the noisy regions The ANN can overfit the clean domain, lowering overall accuracy BYU CS 5

The Domain Experts: Domain Experts ANN clean trains on / recognizes clean images ANN noisy trains on / recognizes noisy images Separates clean and noisy training, so no overfit to clean images. Problem: Choosing the right ANN given a new letter. Solutions*: Train a separate ANN to distinguish clean from noisy letters. Use both ANNs and choose the one with the most confidence. *Difficult to do in practice BYU CS 6

The Oracle Learning Process Originally used to create reduced sized ANNs. 1. Obtain the Oracle: Large 2. Label the Data 3. Train the Oracle-Trained Network (OTN): Small BYU CS 7

The Oracle Learning Process Obtain the most accurate ANN regardless of size. ANN large Training Data BYU CS 8

The Oracle Learning Process Use the trained oracle to relabel the training data with its own outputs. Relabeled Training Data ANN large Training Data BYU CS 9

The Oracle Learning Process Use the relabeled training set to train a simpler ANN. Oracle Outputs = New Targets ANN small Oracle-labeled Training Data BYU CS 10

Domain Expert Approximation Through Oracle Learning: Bestnets We introduce the bestnets method. Use Oracle learning [7] to train an ANN to approximate the behavior of: ANN clean on clean images ANN noisy on noisy images Successfull approximation gives ANN bestnets : The accuracy of ANN clean on clean images The accuracy of ANN noisy on noisy images An implicit ability to distinguish between clean and noisy No fear of overfitting. Overfitting the oracles is desirable. BYU CS 11

Approximation Prior Work Menke et al. [7, 6]: Oracle Learning Domingos [5]: Approximated a bagging [1] ensemble with decision trees [8] Zeng and Martinez [9] approximated a bagging ensemble with an ANN Craven and Shavlik approximated an ANN with rules [3] and trees [4] Bestnets approximates domain experts (novel) Varying Noise: Mostly unrelated work. Assume one type of noise OR Vary the noise but train / test each separately OR Assume knowledge about the type of noise (SNR, etc.) Not always realistic BYU CS 12

Bestnets Method for OCR Three steps: 1. Obtain the Oracles. In this case two oracles: Find the best ANN for clean only images (ANN clean ) Find the best ANN for noisy only images (ANN noisy ) 2. Relabel the images with the oracles Relabelel clean images with ANN clean s outputs Relabelel noisy images with ANN noisy s outputs 3. Train a single ANN (ANN bestnets ) with the relabeled images BYU CS 13

Note About Output Targets The OCR ANNs have an output for every letter we d like to recognize. Given an image, the output corresponding to the correct letter should have a higher value than the other outputs. These values range between 0 and 1. To train an ANN to do this every incorrect output is trained to output 0 and the correct one 1. With Oracle Learning, instead of training to 0-1, the OTN trains to output what its oracles output instead, always more relaxed (greater than 0 or less than 1). May be an easier to learn according to Caruana [2]. BYU CS 14

Bestnets Process Train the domain experts. ANN noisy Noisy Training Images ANN clean Clean Training Images BYU CS 15

Bestnets Process Use the trained experts to relabel the training data with their own outputs. Relabeled Noisy Images Relabeled Clean Images ANN noisy Noisy Training Images ANN clean Clean Training Images BYU CS 16

Bestnets Process Use the relabeled training set to train a single ANN on the oracles outputs. Expert Outputs = New Targets ANN bestnets Relabeled Clean and Noisy Training Images BYU CS 17

Example: Original Training Image Image Target All 0 s except for the output corresponding to R which is 1 Domain Noisy BYU CS 18

Example: Getting the Oracle s Outputs ANN noisy < 0.2, 0.3, 0.13,..., R = 0.77,..., 0.44 > BYU CS 19

Example: Resulting Training Image Image Target < 0.2, 0.3, 0.13,..., R = 0.77,..., 0.44 > BYU CS 20

Experiment 1. Train ANN clean on only the clean images 2. Train ANN noisy on only the noisy images 3. Relabel the clean letter set s output targets with ANN clean s outputs 4. Relabel the noisy letter set s output targets with ANN noisy s outputs 5. Train a single ANN (ANN bestnets ) on the relabeled images from both sets 6. Train standard ANN mixed on both clean and noisy with standard 0-1 targets BYU CS 21

Initial Results ANN1 ANN2 Data set Difference p-value ANN clean ANN mixed Clean 0.0307 < 0.0001 ANN noisy ANN mixed Noisy 0.0092 < 0.0001 ANN bestnets ANN mixed Mixed 0.0056 < 0.0001 ANN clean ANN bestnets Clean 0.0298 < 0.0001 ANN noisy ANN bestnets Noisy -0.0011 0.1607 p-values from a McNemar test comparing the two classifiers in each row on a test set. BYU CS 22

Conclusion and Future Work Conclusion: The bestnets-trained ANN: Improves over standard (mixed) training Retains the performance of ANN noisy Future Work Increase the improvement focusing on clean Investigate why it works (Caruana [2], may be easier to learn) BYU CS 23

References [1] L. Breiman. Bagging predictors. Machine Learning., 24(2):123 140, 1996. [2] Rich Caruana, Shumeet Baluja, and Tom Mitchell. Using the future to sort out the present: Rankprop and multitask learning for medical risk evaluation. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 959 965, Cambridge, MA, 1996. The MIT Press. [3] Mark Craven and Jude W. Shavlik. Learning symbolic rules using artificial neural networks. In Paul E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 73 80, San Mateo, CA, 1993. Morgan Kaufmann. BYU CS 24

[4] Mark W. Craven and Jude W. Shavlik. Extracting tree-structured representations of trained networks. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 24 30, Cambridge, MA, 1996. The MIT Press. [5] Pedro Domingos. Knowledge acquisition from examples via multiple models. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 98 106, San Francisco, 1997. Morgan Kaufmann. [6] Joshua Menke and Tony R. Martinez. Simplifying ocr neural network through oracle learning. In Proceedings of the 2003 International Workshop on Soft Computing Techniques in Instrumentation, Measurement, and Related Applications. IEEE Press, 2003. BYU CS 25

[7] Joshua Menke, Adam Peterson, Michael E. Rimer, and Tony R. Martinez. Neural network simplification through oracle learning. In Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN 02, pages 2482 2497. IEEE Press, 2002. [8] J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993. [9] Xinchuan Zeng and Tony Martinez. Using a neural networks to approximate an ensemble of classifiers. Neural Processing Letters., 12(3):225 237, 2000. BYU CS 26