Machine Learning in Patent Analytics:: Binary Classification for Prioritizing Search Results

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

Human Emotion Recognition From Speech

Artificial Neural Networks written examination

Rule Learning with Negation: Issues Regarding Effectiveness

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lecture 1: Basic Concepts of Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Linking Task: Identifying authors and book titles in verbose queries

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Issues in the Mining of Heart Failure Datasets

Reducing Features to Improve Bug Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CS 446: Machine Learning

A Case Study: News Classification Based on Term Frequency

Generative models and adversarial training

A study of speaker adaptation for DNN-based speech synthesis

Word Segmentation of Off-line Handwritten Documents

Assignment 1: Predicting Amazon Review Ratings

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Extending Place Value with Whole Numbers to 1,000,000

Laboratorio di Intelligenza Artificiale e Robotica

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v2 [cs.cv] 30 Mar 2017

INPE São José dos Campos

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Learning Methods for Fuzzy Systems

Pod Assignment Guide

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CSL465/603 - Machine Learning

The stages of event extraction

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Axiom 2013 Team Description Paper

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Switchboard Language Model Improvement with Conversational Data from Gigaword

Australian Journal of Basic and Applied Sciences

Corrective Feedback and Persistent Learning for Information Extraction

Learning Methods in Multilingual Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Automatic document classification of biological literature

Speech Recognition at ICSI: Broadcast News and beyond

WHEN THERE IS A mismatch between the acoustic

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

School Leadership Rubrics

A Comparison of Two Text Representations for Sentiment Analysis

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Special Education Program Continuum

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Software Maintenance

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

An investigation of imitation learning algorithms for structured prediction

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

BMC Medical Informatics and Decision Making 2012, 12:33

Modeling function word errors in DNN-HMM based LVCSR systems

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Calibration of Confidence Measures in Speech Recognition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Probability and Statistics Curriculum Pacing Guide

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Chapter 2 Rule Learning in a Nutshell

New Venture Financing

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Using dialogue context to improve parsing performance in dialogue systems

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

AQUA: An Ontology-Driven Question Answering System

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

Grade 6: Correlated to AGS Basic Math Skills

Laboratorio di Intelligenza Artificiale e Robotica

GACE Computer Science Assessment Test at a Glance

Lecture 10: Reinforcement Learning

Indian Institute of Technology, Kanpur

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Modeling function word errors in DNN-HMM based LVCSR systems

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Support Vector Machines for Speaker and Language Recognition

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Hardhatting in a Geo-World

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

Online Updating of Word Representations for Part-of-Speech Tagging

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

A Neural Network GUI Tested on Text-To-Phoneme Mapping

An Introduction to the Minimalist Program

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Transcription:

Machine Learning in Patent Analytics:: Binary Classification for Prioritizing Search Results Anthony Trippe Managing Director, Patinformatics, LLC Patent Information Fair & Conference November 10, 2017 Tokyo, Japan

INTRODUCTION TO MACHINE LEARNING

What is machine learning? Machine learning, a branch of Statistical Learning, is about the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders. The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory also referred to as statistical learning Note: Taken theory. from https://en.wikipedia.org/wiki/machine_learning 3

There are different types of Machine Learning Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are: Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning). Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent. Note: Taken from https://en.wikipedia.org/wiki/machine_learning 4

Machine Learning can be used to solve many problems Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system: In classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are "spam" and "not spam". In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. Density estimation finds the distribution of inputs in some space. Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics. Note: Taken from https://en.wikipedia.org/wiki/machine_learning 5

CLASSIFICATION AND SUPPORT VECTOR MACHINES

Classification is a supervised Machine Learning task In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. Classification can be thought of as two separate problems: Binary classification, a better understood task, only two classes are involved. Multiclass classification involves assigning an object to one of several classes. Since many classification methods have been developed specifically for binary classification, multiclass classification often requires the combined use of multiple binary classifiers. Note: Taken from https://en.wikipedia.org/wiki/statistical_classification 7

A Support Vector Machine (SVM) is an algorithm used for classification Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (a list of p items), and we want to know whether we can separate such points with a (p 1)-dimensional hyperplane. This is called a linear classifier. There are many hyperplanes that might classify the data. One reasonable choice as the best is the one that represents the largest separation, or margin, between the two classes. Note: Taken from https://en.wikipedia.org/wiki/support_vector_machine 8

APPLICATION TO PATENT SEARCHING AND ANALYSIS

When building a collection consider recall vs. precision Information retrieval or searching effectiveness is traditionally described in terms of two measures, recall and precision. These items are defined as follows: Recall how much of the useful information has my search retrieved? Precision how much of the information that I have retrieved is useful? Precision and recall are normally opposed to one another such that with an increase in recall there is usually a subsequent drop in the level of precision 1

Precision and recall should be considered separately I would like to suggest that when it comes to patent searching that it might be more productive to separate precision, and recall so that they can be maximized independently. It might be more productive to begin with creating methods that produce high recall exclusive of precision. Once this is accomplished the results can be ranked using different methods to improve precision and manage the way the results are shared with the searcher. Instead of expecting a single method to do both it would be useful to the patent searching community if the process was done stepwise to maximize the value to the user. 1

Using Binary Classification for precision Binary classification provides a means for categorizing large collections of patent documents into the references that are likely to be of highest interest to the information professional, and those that are likely not related, but were still retrieved in a broad search A training set will be made up of references that are highly relevant to the interests of the analyst In training the classifier, the analyst will need to identify documents that are off-topic as well, so the classifier can establish a hyperplane that will distinguish between the two categories 1

A practical example of putting these ideas into practice PRIORITIZING FITNESS BAND PATENTS USING A SUPPORT VECTOR MACHINE

Identifying Jawbone fitness band patents Several years ago, the author developed an interest in wearable fitness monitors and began using this field as an example when exploring machine learning methods and the problem of recall, and precision in patent data collections Two of the major companies working in the space at the time were Aliphcom (doing business as Jawbone) and Nike Both organizations sell other products, and have extensive patent portfolios, which cover their fitness monitors, as well as many additional items Searching worldwide, several hundred patent documents are assigned to Aliphcom Of these, more than 100 are associated with their personal fitness band, based on a previous analysis conducted using a manual method of classification Ten of these documents were used to represent the positive examples in the training set The Aliphcom portfolio also contains patent documents associated with Bluetooth headsets and speakers, Ten documents associated with these items were identified as the negative examples 1

Identifying Jawbone fitness band patents After only three training rounds a classifier was created that successfully classified all but one of the Aliphcom documents correctly into those covering the personal fitness monitored compared with the remainder of the company s products The one document, and its equivalent members were new documents, recently published that dealt with a new application of the product line All and all, with minimal effort, a result with greater than 95% precision was achieved. 1

Finding Nike FuelBand patents a little more challenging 11,126 worldwide patent documents from Nike were submitted to a SVM based on the model build for the Jawbone Up fitness band patents An initial training collection of 20 documents was created As one might expect, the initial use of this classifier did not produce stellar results Having looked at patents associated with the Nike FuelBand using traditional searching methods, many of these documents did not score well with the classifier 1

Finding Nike FuelBand patents This situation was remedied by selecting more relevant and irrelevant documents, and retraining the classifier After three generations of training, the classifier had successful scored ~85% of the Nike documents accurately It still scored some of the originally discovered documents poorly, but frankly, many of these were associated more with the Nike + ipod sensor system than they were with the FuelBand Conversely, the classifier identified several Nike families that were not discovered using a reasonable traditional search 1

Conclusions Recall should be maximized before being attempting to increase recall Classification methods, especially Support Vector Machines can be used to score records for relevance Even with very large collections, where recall has been optimized at the expense of precision Machine Learning methods can be used to identify the most relevant documents With 3-5 rounds of training a relative high degree of precision can be accomplished Documents that score very highly, or lowly can easily be accepted as relevant, and irrelevant Manual review if desired can be done on a much smaller collection, as opposed to the entire collection saving a tremendous amount of time 1

Contact Us +1.614.787.5237 tony@patinformatics.com 1