Predicting Yelp Ratings Using User Friendship Network Information

Similar documents
Assignment 1: Predicting Amazon Review Ratings

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Comment-based Multi-View Clustering of Web 2.0 Items

Python Machine Learning

(Sub)Gradient Descent

Probabilistic Latent Semantic Analysis

Attributed Social Network Embedding

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

arxiv: v2 [cs.ir] 22 Aug 2016

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Rule Learning With Negation: Issues Regarding Effectiveness

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

AUTHOR COPY. Techniques for cold-starting context-aware mobile recommender systems for tourism

CS Machine Learning

Truth Inference in Crowdsourcing: Is the Problem Solved?

Matching Similarity for Keyword-Based Clustering

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Rule Learning with Negation: Issues Regarding Effectiveness

Efficient Online Summarization of Microblogging Streams

Team Formation for Generalized Tasks in Expertise Social Networks

Universityy. The content of

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods for Fuzzy Systems

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A survey of multi-view machine learning

Softprop: Softmax Neural Network Backpropagation Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Generative models and adversarial training

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CSL465/603 - Machine Learning

Using Web Searches on Important Words to Create Background Sets for LSI Classification

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Probability and Statistics Curriculum Pacing Guide

Summarizing Answers in Non-Factoid Community Question-Answering

An Online Handwriting Recognition System For Turkish

Learning Methods in Multilingual Speech Recognition

Why Did My Detector Do That?!

The Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding

Learning From the Past with Experiment Databases

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Model Ensemble for Click Prediction in Bing Search Ads

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Applications of data mining algorithms to analysis of medical data

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Preference Learning in Recommender Systems

Discovery of Topical Authorities in Instagram

Improving Fairness in Memory Scheduling

Axiom 2013 Team Description Paper

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Universidade do Minho Escola de Engenharia

Artificial Neural Networks written examination

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A study of speaker adaptation for DNN-based speech synthesis

Test Effort Estimation Using Neural Network

A Case Study: News Classification Based on Term Frequency

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

arxiv: v1 [cs.lg] 15 Jun 2015

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Using focal point learning to improve human machine tacit coordination

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Modeling function word errors in DNN-HMM based LVCSR systems

Georgetown University at TREC 2017 Dynamic Domain Track

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

SARDNET: A Self-Organizing Feature Map for Sequences

WHEN THERE IS A mismatch between the acoustic

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

arxiv: v1 [cs.cv] 10 May 2017

Individual Differences & Item Effects: How to test them, & how to test them well

UCLA UCLA Electronic Theses and Dissertations

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Bug triage in open source systems: a review

Reducing Features to Improve Bug Prediction

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Australian Journal of Basic and Applied Sciences

Transcription:

Predicting Yelp Ratings Using User Friendship Network Information Wenqing Yang (wenqing), Yuan Yuan (yuan125), Nan Zhang (nanz) December 7, 2015 1 Introduction With the widespread of B2C businesses, many products and service providers need both evaluation and prediction of customers feedback. For example, Yelp has a five-star quality rating system of restaurants as well as review text, which generates a big volume of explicit and implicit user data. Consequently, a lot of meaningful research questions can be answered using Yelp s datasets. In this project, we attempt to predict the rating a user will give to a restaurant listed on Yelp using Yelp s Challenge Dataset. Being able to predict the rating a user assigns to a restaurant is helpful when trying to build better recommendation systems on Yelp. We approach the problem from a social network analysis perspective by incorporating Yelp user-user friendship networks in our predictions, and we attempt to test if the additional network information enhances the accuracy of the rating predictions. 2 Literature Review With the vast amount of information on products and businesses available to users online nowadays, there is increasing interest in developing recommender systems that provide users with personalized recommendations on items. Usually these systems work by predicting numeric ratings users give products or businesses, and in general they belong to one of two types: content-based methods or collaborative filtering based methods. Content-based methods compare how similar a target item is to items that the user has rated before and gives a predicted rating based on the user s previous ratings. Mooney and Roy determine the similarity between books by mining the text in book descriptions on Amazon.com and then recommend similar books to users [5]. Sarwar, Konstan and Riedl compare different methods of computing item similarity and different methods of producing predictions from the computed similarities [8]. Pazzani and Billsus allow users to provide a profile of webpages that they find interesting and then revise this profile by comparing the similarity between text on webpages [6]. 1

On the other hand, collaborative filtering methods rely on the assumption that users similar to each other tend to like the same items or tend to give similar ratings. Koren, Bell and Volinsky, the winners of the Netflix Prize Contest, summarize the application and flexibility of matrix factorization techniques used in recommender systems, and they introduce how to use singluar value decomposition (SVD), regularization, stochastic gradient descent and alternating least squares to tackle missing data problems [3]. McAuley and Leskovec use latent factor models to uncover hidden dimensions in review ratings and Latent Dirichlet Allocation to uncover the hidden dimensions in review text [4]. Yu et al. develop an algorithm to recommend web communities to users, and they solve the sparsity problem in traditional collaborative filtering methods by generating the latent link between communities and members using latent topic associations [7]. There have also been attempts to improve traditional recommender systems by taking into consideration the social relations among users. He and Chu present a social network-based recommender system (SNRS) that incorporates the influence from both immediate friends and distant friends of a user [1]. They test their recommender system on Yelp s dataset, and they find that SNRS performs better than other traditional methods. Using users contact information on Flickr, Zheng and Bao are able to prove the usefulness of users social network structure when recommending Flickr groups to users [10]. Yang et al. focus on matching users to Yahoo services using users contacts on Yahoo! Pulse [9]. They propose a hybrid model that combines a factor-based random walk model to explain friendship connections and a coupled latent factor model to uncover interest interactions. Taking inspiration from the previous work done, we use a latent factor model with bias terms as our baseline method for predicting user ratings of restaurants. Since previous studies have shown that user social relations are effective at improving rating predictions, we improve our baseline model by adding users friends information into the model. Intuitively, it is reasonable to add user-user interaction because people often go to restaurants with friends, so their friends preferences will influence their own preferences to some extent. However, not all friends opinions are equal, and depending on how friends are involved in the Yelp friendship network their opinions may be thought of as more or less reliable. Taking this fact into consideration, we further weight friends ratings by their degree centrality. 3 Data Summary 3.1 Description The dataset we choose to work with is the Yelp Challenge Dataset. Compiled for researchers and students to explore a wide variety of topics on Yelp, the Challenge Dataset includes 1.6 million reviews and ratings, 481,000 business attributes, a social network of 366,000 users for a total of 2.9 million social edges, and aggregated check-ins over time for each of the 61,000 businesses. The businesses included in the dataset are located in the U.K., Germany, Canada and the U.S. This dataset is particularly suitable for our purposes, since in addition to user ratings of businesses, it also provides information on which users are friends with each other on Yelp. The data is available for download via the Yelp Dataset 2

Challenge website in the form of.json files (http://www.yelp.com/dataset_challenge). Number of users 269231 Number of restaurants 21892 Number of reviews 990627 Average review rating 3.0 Table 1: Data Statistics Since Yelp is best known for its reviews on restaurants, we only explore restaurants in the U.S. and leave out the other business types for our project. After applying these filters, we end up with 21892 businesses that are identified as restaurants and 269231 users that have posted a total of 990627 reviews at these restaurants (Table 1). 3.2 Network Properties and Visualization To construct the Yelp user-user friendship network, we let each user be a node, and add an undirected edge between two users if they are friends with each other on Yelp. Summary statistics of the Yelp friendship graph are shown in Table 2 and the connected components information is shown in Table 3. From the connected components information, we can see that the connections of the network are very sparse. Approximately 50% of the users do not have friends. This can also be seen from the degree distribution plotted in Figure 1. The degree distribution of nodes is extremely right-skewed, with most nodes having degree less than 120 and 1.06% of nodes having degree more than 120. It approximately follows a power-law distribution with α = 1.44. Number of nodes 269231 Number of edges 986864 Alpha of power-law 1.44 Table 2: Graph Statistics Size of CCs Number of CCs 1 135648 2 1763 3 162 4 24 5 7 6 2 7 2 129414 1 Table 3: Connected Components (CC) Info Figure 1: Dataset Power Law Distribution of Yelp 3

To create the visualization of the network, we filter out nodes with degree more than 120, and take a random sample of 10% of the remaining nodes. We plot the Yelp friendship network using these sampled nodes in Gephi and apply the Force Atlas 2 layout. After looking at user attributes such as the average rating users give, number of reviews posted, number of years being a Yelp user, restaurant locations most reviewed, restuarant categories most reviewed etc., we observe that the network shows clustering pattern by most reviewed restaurant locations. Intuitively, this makes sense. Since people go to restaurants together with friends, we would expect friendship clustering to show seperation pattern by location. Figure 2: Yelp friendship network with nodes colored by location 4 Baseline Model and Results 4.1 Model The basic model we use to predict ratings is the standard latent-factor model. r u,i = µ + a u + b i + q T i p u Here, r u,i is the prediction of the rating for item i by user u. µ is a global offset parameter. a u and b i are user and item biases respectively. p u and q i are user and item factors. The 4

system learns by minimizing the Error Sum of Squares (SSE) combined with regularization. min (u,i) τ [(r u,i r u,i ) 2 + λ( p u 2 + q i 2 + a 2 u + b 2 i )] Initialization: µ is given by averaging ratings and we will not update it during iterations. a u and b i are initialized by averaging ratings and residuals. We would like p u and q i for all users u and items i to be such that q T i p u [0, 5]. So we initialize all elements of p and q to random values in [0, 5/k], where k is the number of latent factors. Update: We use stochastic gradient descent to perform updates according to the update equations shown below. ɛ u,i r u,i r u,i a u a u + η(ɛ u,i λa u ) b i b i + η(ɛ u,i λb i ) p u p u + η(ɛ u,i q i λp u ) q i q i + η(ɛ u,i p u λq i ) (1) Parameters: We read each line of the rating file from disk and update the parameters for each line. Each iteration of the algorithm will read the whole file. We set the number of iterations to be 100, and the step size η to be 0.1. We then try out different values for the number of latent factors k and the regularization parameter λ. 4.2 Results (a) λ = 0 (b) λ = 0.2 (c) λ = 0.4 (d) λ = 0.6 (e) λ = 0.8 (f) λ = 1 Figure 3: Baseline model training and test MSE for different k s and λ s 5

We randomly split 20% of the reviews into a test set and 80% into a training set, and we investigate the performance of k [5, 10, 20, 50, 80, 100] and λ [0, 0.2, 0.4, 0.6, 0.8, 1] by comparing their mean-squared error (MSE) on the training and test sets. The results are shown in Figure 3. We can see when λ > 0, the training errors and test errors are nearly the same under different iterations. We believe there may be underfitting problems, and it is reasonable to propose a more complicated model as our improved model. The smallest MSE of 1.5974 is given by k = 50 and λ = 0.4. 5 Improved Model and Results 5.1 Model For our baseline model, we consider the overall average rating, the bias term for user and the bias term for restaurant. Considering the fact that people will go to restaurants together with friends and they will evaluate the restaurant similarly since they tend to have similar tastes and obtain similar sevices, we can extract further useful information from the friendship network of users. Inspired by the SVD++ model in [2], we propose our improved model which takes into account the influence of friendship on users ratings. A user will demonstrate implicit preference for restaurants that his or her friends have visited and rated. Therefore we add an additional friends term to the original free user factor p u. The estimation of the rating given to restaurant i by user u using our improved model is given as follows: ˆr u,i = µ + a u + b i + qi T (p u + F (u) 0.5 y j ) Here F (u) represents user u s friends who have rated restaurant i before. F (u) is the size of this set and it works as a normalization constant. For each user, we add an additional k dimensional vector y j. Thus the user factors are now composed of two parts: one is the free user factor p u as in the baseline model and the other one is the friend term F (u) 0.5 y j. The cost function of the improved model is given by min (u,i) τ [(r u,i r u,i ) 2 + λ( p u 2 + q i 2 + a 2 u + b 2 i + y j 2 )] Initialization: µ is given by averaging ratings and we will not update it during iterations. a u and b i are initialized by averaging ratings and residuals. We would like p u and q i for all users u and items i to be such that q T i p u [0, 5]. So we initialize all elements of p and q to random values in [0, 5/k], where k is the number of latent factors. We initialize all elements of y to be 0. Update: In each update, we update new values of parameters using the old values. We use stochastic gradient descent to get the update equations as following. 6

ɛ u,i r u,i r u,i a u a u + η(ɛ u,i λa u ) b i b i + η(ɛ u,i λb i ) p u p u + η(ɛ u,i q i λp u ) q i q i + η[ɛ u,i (p u + F (u) 0.5 y j ) λq i ] y F (u) : y j y j + η(ɛ u,i F (u) 0.5 q i λy j ) (2) Parameters: We read each line of the rating file from disk and update the parameters for each line. Each iteration of the algorithm will read the whole file. We set the number of iterations to be 100, and the step size η to be 0.1. Similar to before, we then try out different values for the number of latent factors k and the regularization parameter λ. In addition, we also consider using degree centrality to weight the new user factor y j. The weighted friend term is given by ( D j ) 0.5 D j y j Our experiments find that weighting by degree centrality leads to small difference in prediction accuracy compared with the improved model without weighting. Therefore we omit the detailed description of the model with weighting by degree centrality here. 5.2 Results (a) λ = 0 (b) λ = 0.2 (c) λ = 0.4 (d) λ = 0.6 (e) λ = 0.8 (f) λ = 1 Figure 4: Improved model training and test MSE for different k s and λ s 7

With this improved model, we observe that the friends influence term helps to improve the accuracy of rating prediction significantly. Again, we randomly split 20% of the reviews into a test set and 80% into a training set, and we investigate the performance of k [5, 10, 20, 50, 80, 100] and λ [0, 0.2, 0.4, 0.6, 0.8, 1] by comparing their MSE on the training and test sets. The results are shown in Figure 4. We can see when λ = 0, the training errors are very small but the test errors are large, so there are overfitting problems. With the increase of λ, the gap between training errors and test errors becomes smaller. The smallest MSE of 1.4663 is given by k = 100 and λ = 0.2. 6 Conclusions By comparing the results of the baseline model and the improved model, we observe that the friend term introduced in user factors improves the prediction accuracy significantly. The free user factor p u represents explicit ratings given by user u. The friend term represents user s implicit preference for restaurants. People will have similar tastes with their friends and friends can also recommend or comment on restaurants, which influences ratings given to restaurants by users. The two terms combined together therefore give us more information of a user s rating behavior. We also try to incorporate the centrality measures in the friend term, but our results show that weighting friends ratings by degree centrality does not produce a noticable improvement on the prediction performance. This result is reasonable for the Yelp Dataset because most of the users do not have friends information and only very few users have a lot friends. Therefore we conclude that friendship network information allows us to predict user restaurant ratings more accurately, although further differentiating between friends through weighting by degree centrality does not offer much improvement in prediction accuracy. References [1] He, J., & Chu, W. W. (2010). A social network-based recommender system (SNRS) (pp. 47-74). Springer US. [2] Koren, Y. (2008, August). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426-434). ACM. [3] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, (8), 30-37. [4] McAuley, J., & Leskovec, J. (2013, October). Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems (pp. 165-172). ACM. [5] Mooney, R. J., & Roy, L. (2000, June). Content-based book recommending using learning for text categorization. In Proceedings of the fifth ACM conference on Digital libraries (pp. 195-204). ACM. 8

[6] Pazzani, M., & Billsus, D. (1997). Learning and revising user profiles: The identification of interesting web sites. Machine learning, 27(3), 313-331. [7] Qian, Y., Zhiyong, P., Liang, H., Ming, Y., & Dawen, J. (2012, November). A latent topic based collaborative filtering recommendation algorithm for web communities. In Web Information Systems and Applications Conference (WISA), 2012 Ninth (pp. 241-246). IEEE. [8] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001, April). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web (pp. 285-295). ACM. [9] Yang, S. H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., & Zha, H. (2011, March). Like like alike: joint friendship and interest propagation in social networks. In Proceedings of the 20th international conference on World wide web (pp. 537-546). ACM. [10] Zheng, N., & Bao, H. (2013). Flickr group recommendation based on user-generated tags and social relations via topic model. In Advances in Neural Networks ISNN 2013 (pp. 514-523). Springer Berlin Heidelberg. 9