Machine Learning in Patent Analytics:: Binary Classification for Prioritizing Search Results Anthony Trippe Managing Director, Patinformatics, LLC Patent Information Fair & Conference November 10, 2017 Tokyo, Japan
INTRODUCTION TO MACHINE LEARNING
What is machine learning? Machine learning, a branch of Statistical Learning, is about the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders. The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory also referred to as statistical learning Note: Taken theory. from https://en.wikipedia.org/wiki/machine_learning 3
There are different types of Machine Learning Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are: Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning). Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent. Note: Taken from https://en.wikipedia.org/wiki/machine_learning 4
Machine Learning can be used to solve many problems Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system: In classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are "spam" and "not spam". In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. Density estimation finds the distribution of inputs in some space. Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics. Note: Taken from https://en.wikipedia.org/wiki/machine_learning 5
CLASSIFICATION AND SUPPORT VECTOR MACHINES
Classification is a supervised Machine Learning task In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. Classification can be thought of as two separate problems: Binary classification, a better understood task, only two classes are involved. Multiclass classification involves assigning an object to one of several classes. Since many classification methods have been developed specifically for binary classification, multiclass classification often requires the combined use of multiple binary classifiers. Note: Taken from https://en.wikipedia.org/wiki/statistical_classification 7
A Support Vector Machine (SVM) is an algorithm used for classification Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (a list of p items), and we want to know whether we can separate such points with a (p 1)-dimensional hyperplane. This is called a linear classifier. There are many hyperplanes that might classify the data. One reasonable choice as the best is the one that represents the largest separation, or margin, between the two classes. Note: Taken from https://en.wikipedia.org/wiki/support_vector_machine 8
APPLICATION TO PATENT SEARCHING AND ANALYSIS
When building a collection consider recall vs. precision Information retrieval or searching effectiveness is traditionally described in terms of two measures, recall and precision. These items are defined as follows: Recall how much of the useful information has my search retrieved? Precision how much of the information that I have retrieved is useful? Precision and recall are normally opposed to one another such that with an increase in recall there is usually a subsequent drop in the level of precision 1
Precision and recall should be considered separately I would like to suggest that when it comes to patent searching that it might be more productive to separate precision, and recall so that they can be maximized independently. It might be more productive to begin with creating methods that produce high recall exclusive of precision. Once this is accomplished the results can be ranked using different methods to improve precision and manage the way the results are shared with the searcher. Instead of expecting a single method to do both it would be useful to the patent searching community if the process was done stepwise to maximize the value to the user. 1
Using Binary Classification for precision Binary classification provides a means for categorizing large collections of patent documents into the references that are likely to be of highest interest to the information professional, and those that are likely not related, but were still retrieved in a broad search A training set will be made up of references that are highly relevant to the interests of the analyst In training the classifier, the analyst will need to identify documents that are off-topic as well, so the classifier can establish a hyperplane that will distinguish between the two categories 1
A practical example of putting these ideas into practice PRIORITIZING FITNESS BAND PATENTS USING A SUPPORT VECTOR MACHINE
Identifying Jawbone fitness band patents Several years ago, the author developed an interest in wearable fitness monitors and began using this field as an example when exploring machine learning methods and the problem of recall, and precision in patent data collections Two of the major companies working in the space at the time were Aliphcom (doing business as Jawbone) and Nike Both organizations sell other products, and have extensive patent portfolios, which cover their fitness monitors, as well as many additional items Searching worldwide, several hundred patent documents are assigned to Aliphcom Of these, more than 100 are associated with their personal fitness band, based on a previous analysis conducted using a manual method of classification Ten of these documents were used to represent the positive examples in the training set The Aliphcom portfolio also contains patent documents associated with Bluetooth headsets and speakers, Ten documents associated with these items were identified as the negative examples 1
Identifying Jawbone fitness band patents After only three training rounds a classifier was created that successfully classified all but one of the Aliphcom documents correctly into those covering the personal fitness monitored compared with the remainder of the company s products The one document, and its equivalent members were new documents, recently published that dealt with a new application of the product line All and all, with minimal effort, a result with greater than 95% precision was achieved. 1
Finding Nike FuelBand patents a little more challenging 11,126 worldwide patent documents from Nike were submitted to a SVM based on the model build for the Jawbone Up fitness band patents An initial training collection of 20 documents was created As one might expect, the initial use of this classifier did not produce stellar results Having looked at patents associated with the Nike FuelBand using traditional searching methods, many of these documents did not score well with the classifier 1
Finding Nike FuelBand patents This situation was remedied by selecting more relevant and irrelevant documents, and retraining the classifier After three generations of training, the classifier had successful scored ~85% of the Nike documents accurately It still scored some of the originally discovered documents poorly, but frankly, many of these were associated more with the Nike + ipod sensor system than they were with the FuelBand Conversely, the classifier identified several Nike families that were not discovered using a reasonable traditional search 1
Conclusions Recall should be maximized before being attempting to increase recall Classification methods, especially Support Vector Machines can be used to score records for relevance Even with very large collections, where recall has been optimized at the expense of precision Machine Learning methods can be used to identify the most relevant documents With 3-5 rounds of training a relative high degree of precision can be accomplished Documents that score very highly, or lowly can easily be accepted as relevant, and irrelevant Manual review if desired can be done on a much smaller collection, as opposed to the entire collection saving a tremendous amount of time 1
Contact Us +1.614.787.5237 tony@patinformatics.com 1