Optical character recognition (ICDAR - International Conference on Document Analysis and Recognition)

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

CS Machine Learning

Lecture 10: Reinforcement Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

CSL465/603 - Machine Learning

Probabilistic Latent Semantic Analysis

arxiv: v2 [cs.cv] 30 Mar 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Australian Journal of Basic and Applied Sciences

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Rule Learning With Negation: Issues Regarding Effectiveness

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Speech Recognition at ICSI: Broadcast News and beyond

Word Segmentation of Off-line Handwritten Documents

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Artificial Neural Networks written examination

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Assignment 1: Predicting Amazon Review Ratings

WHEN THERE IS A mismatch between the acoustic

Axiom 2013 Team Description Paper

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods in Multilingual Speech Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Knowledge Transfer in Deep Convolutional Neural Nets

Laboratorio di Intelligenza Artificiale e Robotica

A Case Study: News Classification Based on Term Frequency

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

BYLINE [Heng Ji, Computer Science Department, New York University,

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Human Emotion Recognition From Speech

Using dialogue context to improve parsing performance in dialogue systems

Seminar - Organic Computing

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Linking Task: Identifying authors and book titles in verbose queries

12- A whirlwind tour of statistics

Switchboard Language Model Improvement with Conversational Data from Gigaword

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Reinforcement Learning Variant for Control Scheduling

CS 446: Machine Learning

Chapter 2 Rule Learning in a Nutshell

The stages of event extraction

Software Maintenance

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Semi-Supervised Face Detection

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

A survey of multi-view machine learning

AQUA: An Ontology-Driven Question Answering System

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Speech Emotion Recognition Using Support Vector Machine

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Generative models and adversarial training

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Softprop: Softmax Neural Network Backpropagation Learning

SARDNET: A Self-Organizing Feature Map for Sequences

Online Updating of Word Representations for Part-of-Speech Tagging

Disambiguation of Thai Personal Name from Online News Articles

Corrective Feedback and Persistent Learning for Information Extraction

Learning Methods for Fuzzy Systems

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Student Perceptions of Reflective Learning Activities

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

An investigation of imitation learning algorithms for structured prediction

Mining Student Evolution Using Associative Classification and Clustering

Calibration of Confidence Measures in Speech Recognition

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

A Vector Space Approach for Aspect-Based Sentiment Analysis

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Reinforcement Learning by Comparing Immediate Reward

Introduction to Simulation

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Exposé for a Master s Thesis

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Probabilistic Mission Defense and Assurance

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Transcription:

What is Machine Learning A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Successful applications of machine learning T. Mitchell Speech recognition (L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993) Optical character recognition (ICDAR - International Conference on Document Analysis and Recognition) Learning to drive an autonomous vehicle (DARPA Grand Challenge) Game playing (IBM s Deep Blue) Components of a well-posed learning problem task to be addressed by the system (e.g. recognizing handwritten characters) performance measure to evalute the learned system (e.g. number of misclassified characters) training experience to train the learning system (e.g. labelled handwritten characters) Designing a machine learning system 1. Formalize the learning task 2. Collect data 3. Extract features 4. Choose class of learning models 5. Train model 6. Evaluate model Formalize the learning task Define the task that should be addressed by the learning system (e.g. recognizing handwritten characters from images) A learning problem is often composed of a number of related tasks. E.g.: Segment the image into words and each word into characters. Identify the language of the writing. Classify each character into the language alphabet. Choose an appropriate performance measure for evaluating the learned system (e.g. number of misclassified characters)

Collect data A set of training examples need to be collected in machine readable format. Data collection is often the most cumbersome part of the process, implying manual intervention especially in labelling examples for supervised learning. Recent approaches to the problem of data labeling try to make use of the much cheaper availability of unlabeled data (semi-supervised learning) Extract features A relevant set of features need to be extracted from the data in order to provide inputs to the learning system. Prior knowledge is usually necessary in order to choose the appropriate features for the task at hand. Too few features can miss relevant information preventing the system from learning the task with reasonable performance. Including noisy features can make the learning problem harder. Too many features can require a number of examples greater than those available for training. Choose learning model class A simple model like a linear classifier is easy to train but insufficient for non linearly separable data. A too complex model can memorize noise in training data failing to generalize to new examples. 2

Train model Training a model implies searching through the space of possible models (aka hypotheses) given the chosen model class. Such search typically aims at fitting the available training examples well according to the chosen performance measure. However, the learned model should perform well on unseen data (generalization), and not simply memorize training examples (overfitting). Different techniques can be used to improve generalization, usually by trading off model complexity with training set fitting Evaluate model Entia non sunt multiplicanda praeter necessitatem (Entities are not to be multiplied without necessity) William of Occam (Occam s razor) The learned model is evaluated according to its ability to generalize to unseen examples. Evaluation can provide insights into the model weaknesses and suggest directions for refining/modifying it. Evaluation can imply comparing different models/learners in order to decide the best performing one Statistical significance of observed differences between performance of different models should be assessed with appropriate statistical tests. Supervised learning The learner is provided with a set of input/output pairs (x i, y i ) X Y The learned model f : X Y should map input examples into their outputs (e.g. classify character images into the character alphabet) A domain expert is typically involved in labeling input examples with the corresponding outputs. Unsupervised learning The learner is provided with a set of input examples x i X, with no labeling information The learner models training examples, e.g. by grouping them into clusters according to their similarity Semi-supervised learning The learner is provided with a set of input/output pairs (x i, y i ) X Y A (typically much bigger) additional set of unlabeled examples x i X is also provided. Like in supervised learning, the learned model f : X Y should map input examples into their outputs Unlabelled data can be exploited to improve performance, e.g. by forcing the model to produce similar outputs for similar inputs, or by allowing to learn a better internal representation of examples. 3

Reinforcement learning The learner is provided a set of possible states S, and for each state, a set of possible actions, A moving it to a next state. In performing action a from state s, the learner is provided an immediate reward r(s, a). The task is to learn a policy allowing to choose for each state s the action a maximizing the overall reward (including future moves). The learner has to deal with problems of delayed reward coming from future moves, and trade-off between exploitation and exploration. Typical applications include moving policies for robots and sequential scheduling problems in general. Supervised learning tasks Classification binary Assign an example to one of two possible classes (often a positive and a negative one). E.g. digit vs non-digit character. multiclass Assign an example to one of n > 2 possible classes. E.g. assign a digit character among {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} multilabel Assign an example to a subset m n of the possible classes. E.g. predict the topics of a text. Supervised learning tasks Regression Assign a real value to an example. E.g. predict biodegradation rate of a molecular compound under aerobic conditions. Ordinal regression or ranking Order a set of examples according to their relative importance/quality wrt the task. E.g. order emails according to their urgency. Unsupervised learning tasks Dimensionality reduction Reduce dimensionality of the data maintaining as much information as possible. E.g. principal component analysis (PCA), random projections. Clustering Cluster data into homogeneous groups according to their similarity. E.g. cluster genes according to their expression levels Novelty detection Detect novel examples which differ from the distribution of a certain set of data. E.g. recognize anomalous network traffic indicating a possible attack. 4

Probabilistic Reasoning Reasoning in presence of uncertainty Evaluating the effect of a certain piece of evidence on other related variables Estimate probabilities and relations between variables from a set of observations Choice of Learning Algorithms Information available Full knowledge of probability distributions of data: Bayesian decision theory Form of probabilities known, parameters unknown: Parameter estimation from training data Form of probabilities unknown, training examples available: discriminative methods: do not model input data (generative methods), learn a function predicting the desired output given the input Form of probabilities unknown, training examples unavailable (only inputs): unsupervised methods: cluster examples by similarity 5