A New Approach to Recommender Systems

Li Haochong Shen Abstract In this project, we seek out a new way to build a recommendation system. We chose the problem of matching professor with ones research interest as a testing ground. Specifically, we gathered all publications of Stanford artificial intelligence professors and used supervised learning to train our model. We also used unsupervised learning to generate baseline reference to verify our model output in real use cases. 1. Introduction Recommender systems have been widely adopted everywhere in our life. News websites push news to you in relevant topics you selected. Streaming websites recommend new movies and TV series using on your demographic information and past viewing history. Online shopping websites hint items you are might be interested in purchasing based on your shopping habits. Most prominently, online advertising personalize advertisements for you using various heuristics. Nevertheless, there still are a lot of areas that could use the help of a recommender system where no suitable ones are implemented yet. Also, most of the work on recommender system, aside from improving quality and accuracy, focuses on cold start, scalability etc. In this project, we focus on a specific problem which represents a larger set of problems, the problem of matching ones research interest with professors. The problem is very different from other rec-. Correspondence to: Li <wl336@stanford.edu>, Haochong Shen <haochong@stanford.edu>. Project Report of CS229 Marchine Learning, Fall 2017. Copyright 2017 by the author(s). ommender problems in several ways. First, choosing a professor to pursue an advanced degree is usually a one-time decision. Secondly, the cost of wrong choice is very high. Thirdly, due to nature of research, each professors focus can be very different even if their focus is in similar or same discipline. All these properties makes this type of problems unique and cannot be solved by applying news, videos or shopping recommender systems. We propose a new approach to the problem. Our algorithm will be using a single classifier as our recommender. We chose 3 traditional machine learning models, multinomial naive Bayes (MNB), support vector machine (SVM) and logistic regression, to compare for the classifier. We also attempted to use recurrent neural network (RNN). We trained the classifier with professor names as label and we transformed their publication to a feature vector as model input. Final user will input their past papers, statement of purpose or a small paragraph of description of their interest and get professor recommendation as output. 2. Analysis of other approaches 2.1. Manual Search Most of people faced trying to find the right professor to pursue an advance degree will rely on online resources. Some will do an extensive web search using popular search engines such as Google, Baidu and Bing using specific keywords of their area of interest. They might alternatively start from reputable universities and scan through department directories and narrow done the professors with matching fields of study. This will narrow down their selection to tens of professors. Then they might read through the publications of each professor to see if there is a matching interest. Such manual search is very timeconsuming and tedious and often not exhaustive. There are thousands of institutions globally. Each

institution will have multiple professors working in similar area. Collectively there will be thousands of or even more publications in one area of study. It is virtually impossible to examine all of those or even find all of them. Here is when selection bias come in. One may tend to focus on more famous professors and schools and ignore some lessor known but better alternatives. One may also ignore foreign schools and favor domestic ones and miss a better match. 2.2. Collabrative Filtering One common approach to recommender system is collaborative filtering. Collaborative filtering methods usually are based on collecting and compiling a significant amount of record of users behaviors, choices and preferences and trying to predict what the end users preference on specific items based on the choice or preferences of other users that is similar to end user, i.e. has made similar choice or has similar preferences on other items. This type of recommender system has its advantage in that it requires no understanding of the item recommender recommends since the recommendation is based on user input thus it does not require parsing any content or understand the meaning of the item. However, such method make a very strong assumption that people who agreed and make similar choice in the past will be making similar choice in the future, and that they will like similar kinds of items in the future as they liked in the past. This assumption works well in some problem but will not work in others. In our problem of matching research interest to professors. When people further their study into on area, their focus usually diverge. One may start studying general machine learning and dive into deep learning, and then choose deep reinforcement learning and finally landing in a specific problem associated with it. Another problem is what usually called cold start problem. This type of system requires a large amount of existing data on other users in order to give acceptable recommendation. A news, video or e-commerce website have the opportunity to gather user input for a long period of time before rolling out their recommender system. There is no easy way to gather any user input for people searching for professors. Other problems of sparsity and scalability also applies to various extends. A professor usually have a very limited number of student and it will be hard to make correlation based on limited amount of samples. 2.3. Content-Based Recommender Another common approach is content-based filtering. Traditional content-based recommender system uses keywords to describe items and a user profile is build to indicate preference based on keywords. This approach, though very effective in some cases, loses a lot of finer-grained information. For example, in the problem of matching professor with ones research interest. Multiple professor might be labeled with same labels, e.g. supervisedlearning, deep-learning etc, but they might be focusing on different sub-domain, where there might be no appropriate model. Recommender using simple keyword tags will not yield much better results than web search. Newer recommender system extract features from the item content. A widely used algorithm is the tf-idf representation. To create a user profile, the system still needs two types of information, the model of users preference and user interaction with the recommender system. The system creates a content-based profile of user based on a weighted vector of the items features extracted from the item. The weights represents the importance of each feature is to the user and can be inferred from rated content vectors using a number of techniques. Simple approach can be an average of all rated item vector. Some other approaches uses machine learning techniques. Direct feedback from the users also helps to adjust weights on certain attributes. A key issue with content-based filtering is that the systems ability to handle cross domain content. The recommendation is usually confined in the same type of the content. This is not a problem in our use case. In fact, our approach is built upon contentbased idea. The main issue we need to solve is how to build user profile efficiently or completely bypass this step. 2.4. Hybrid Recommender Newer recommendation system uses both approaches described above, combining collaborative filtering and content-based filtering could be more effective in some cases. Hybrid system can be

implemented in several ways: by creating separate content-based and collaborative-based prediction and them combine them with some heuristics or by adding one capabilities onto the other type of recommenders system. Since collaborative-filtering technique will not work in our problems, we chose not to pursue this path. 3. Dataset and Features 3.1. Dataset Our dataset is obtained from Scopus database. We selected 10 Stanford computer science professors who are actively working in artificial intelligence and selected their publications from 2005. This gives us a total of 615 papers. Below is a distribution chart. Table 1. Professor and Published Paper Distribution Professor Teaching Publications Andrew Ng CS229 126 Chris Manning CS224N 109 Dan Boneh CS229 90 Doug James CS205A 34 Fei-fei Li CS231N 24 Jeannette Bohg CS223A 24 Mike Genesereth CS157 23 Percy Liang CS221 36 Silvio Savarese CS231A 100 Stefano Ermon CS228 46 Due to limitation of the database and to reduce training time and model size, we only selected the abstract portion of the publications based on the assumption that they are usually the most informative part and usually summarize the entire publication very well. This also minimized the need for preprocessing the data as abstract usually contains not much more than pure text. We reformatted the data into title, abstract text, year, author tab delimited format and below is one sample in our data. Smart Forms "We present Smart Forms, an innovative web forms technology for easy creation, maintenance, and evaluation of user-friendly web forms especially the ones that must implement complex laws, regulations, or business policies. In order to provide cognitive assistance to end users during form-filling, Smart Forms have built-in mechanisms for visual feedback, restriction of selectable values, and automatic form filling. Smart Forms can be created and maintained easily by declar-atively configuring rather than procedurally programming these mechanisms. We also present the Smart Forms Editor which assists a Smart Form creator in creating datadriven form UI, editing, testing and verifying formrules, and testing and debugging a form. Copyright 2016, Association for the Advancement of Artificial Intelligence(www.aaai.org). All rights reserved." 2016 mike 3.2. Features Obviously, we cannot directly use text as input to machine learning models. We build a transformer to vectorize input text. We preprocessed the input text to remove stop words and performed word stemming, which is a common step to decrease not so useful feature count and capture more correlations and connections between features. We then fit a transformer to transform input text to a termfrequency-inverse-document frequency (tf-idf) vector with various n-gram choice and used it as the final input of the classifier. Detailed description and final feature choice will be presented in following sections. 4. Methods The main problem with traditional system is the need to build user profiles. We propose to use only one main building block as the recommender. Both user input and item content go through the same input. Thus we can skip building user profile and just use user input directly to the recommender. The recommender is made up of two functional blocks, the transformer and the classifier. The transformer transforms the input text into feature vectors that can be used by the classifier. The classifier directly output the recommendation as a scoring vector and the top scored item will be used as the output. 4.1. Feature Extraction To extract features out of the text input, we build the aforementioned transformer and these are the paramter combination that produced the best re-

sults. Table 2. Feature Selection Feature Parameter N-gram 2 max df 0.8 min df 2 max features 20,000 tf-idf yes since we achieved better accuracy in train-dev set. This is understandable and shows our model have room for improvment. Notibly, multiple professors collabrated on the same paper, our training target does not consider this scenario. Also, abstract information might not be detailed enough to differentiate professors with very similar focus. Fig 1. is the output of our unsupervised learning algorithm showing professor s contribution in each area of study We also inputed the following paragraph into the model 4.2. Model Training We chose Multinomial Naive Bayes (MNB), Support Vector Machine (SVM) and Logistic Regression (LR) as the classifier block. Those models are chosen since they are widely used in language processing. For MNB and SVM, we tuned hyperparameters to achieve the best train-dev accuracy. In SVM we also tested multiple kernels, we found linear kernel to be the best performing kernel. Other higher dimension kernel results in very poor results due to the limited size of our training set. 4.3. Testing and Baseline For initial testing, we used other abstract of publications of the professor, doing the same transformation as input and uses the professors name as target. This will show whether the model learns enough information to differentiate between professors, which is a relatively good indicator of how well our recommender will perform. To justify this assumption, lets assume our user is the author of such abstract and used it as the input to the model, then the model should match it with the papers real author. For real testing, we need a baseline reference to see if our output are reasonable. We build a k-means clustering alogorithm to extract topics out of the publications and generate a visual results of professor s focus in different topics. 5. Results and Discussion Table 3, 4 describes the test accuracies of our initial testing using professors other publications as input. We can see that we achieved a relatively good accuracy overall accuarcy. We can see a bias problem INPUT: Reinforcement learning is the machine learning area of my most interests, because it is inspired by the behaviorist psychology. It concerned about how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.due to its generality, it can be applied to many area, such as game theory, control theory, operations research, information theory, simulation-based optimization. OUTPUT: andrew Comparing to unsupervised learning results, this is a very accurate recommendation 6. Future Works The RNN failed to train with our limited time-frame and it will be the first thing we would like to investigate given more time. Glo-Ve and word2vec captures word relationship and can potentially give model even better accuracy. We would also enhance our target label to multi-label to account for multiple authors for the same paper. We would also like to expand our dataset. Full publication should give more fine-grained information to further enhance accuracy. More professors in multiple institutions and in various area of study should be included to test the scalability of our model and investigate potential interdisciplinary recommendation. We should also develop a better overall testing method and set a clearer optimization target to ensure our model improves in the right direction.

Figure 1. Professor weight in each topics extracted by k-means clustering and LDA Table 3. Overall Testing Accuracy Logistic Regression MNB SVM Accuracy 83.0% 83.0% 83.0% Micro Precision 83.0% 83.0% 83.0% Macro Precision 77.9% 89.3% 78.2% Micro Recall 83.0% 83.0% 83.0% Macro Recall 77.7% 80.1% 77.7% Micro F1 83.0% 83.0% 83.0% Macro F1 77.2% 81.3% 77.2% 7. Contribution Table 5. Contribution of each Team Member Item Project Proposal Project Milestone Supervised Unsupervised RNN Project Poster Final Report Contributor Haochong, Haochong, Haochong References

Table 4. Testing Accuracy for each Professor Logistic Regression MNB SVM Professor Test Sample Size Precision F1-score Precision F1-score Precision F1-score Andrew Ng 11 72.7% 72.7% 70.0% 66.7% 80.0% 76.2% Chris Manning 9 69.2% 81.8% 69.2% 81.8% 64.3% 78.3% Dan Boneh 8 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% Doug James 3 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% Fei-fei Li 3 0.0% 0.0% 100.0% 50.0% 0.0% 0.0% Jeannette Bohg 2 100.0% 100.0% 100.0% 66.7% 100.0% 100.0% Mike Genesereth 2 50.0% 50.0% 66.7% 80.0% 50.0% 50.0% Percy Liang 3 100.0% 80.0% 100.0% 80.0% 100.0% 80.0% Silvio Savarese 8 87.5% 87.5% 87.5% 87.5% 87.5% 87.5% Stefano Ermon 4 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% Average/Total 53 79.7% 80.7% 85.4% 82.1% 80.4% 80.8%