A simple hybrid movie recommender system

Similar documents
E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

Natural language processing implementation on Romanian ChatBot

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

Consortium: North Carolina Community Colleges

arxiv: v1 [cs.dl] 22 Dec 2016

part2 Participatory Processes

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

'Norwegian University of Science and Technology, Department of Computer and Information Science

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

Application for Admission

Management Science Letters

2014 Gold Award Winner SpecialParent

VISION, MISSION, VALUES, AND GOALS

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

also inside Continuing Education Alumni Authors College Events

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A Case Study: News Classification Based on Term Frequency

Preference Learning in Recommender Systems

Learning Methods in Multilingual Speech Recognition

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Lecture 1: Machine Learning Basics

Organizational Knowledge Distribution: An Experimental Evaluation

On-Line Data Analytics

Exploration. CS : Deep Reinforcement Learning Sergey Levine

How do we balance statistical evidence with expert judgement when aligning tests to the CEFR?

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Machine Learning and Development Policy

Assignment 1: Predicting Amazon Review Ratings

Lecture 10: Reinforcement Learning

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Test Effort Estimation Using Neural Network

Artificial Neural Networks written examination

TU-E2090 Research Assignment in Operations Management and Services

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Calibration of Confidence Measures in Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

Software Maintenance

Reinforcement Learning by Comparing Immediate Reward

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Matching Similarity for Keyword-Based Clustering

Learning From the Past with Experiment Databases

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Theory of Probability

SARDNET: A Self-Organizing Feature Map for Sequences

16.1 Lesson: Putting it into practice - isikhnas

South Carolina English Language Arts

Python Machine Learning

6 Financial Aid Information

Writing Research Articles

Lecture 1: Basic Concepts of Machine Learning

Australian Journal of Basic and Applied Sciences

GACE Computer Science Assessment Test at a Glance

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Word Segmentation of Off-line Handwritten Documents

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Grade 6: Correlated to AGS Basic Math Skills

Algebra 2- Semester 2 Review

AQUA: An Ontology-Driven Question Answering System

What is beautiful is useful visual appeal and expected information quality

Introduction of Open-Source e-learning Environment and Resources: A Novel Approach for Secondary Schools in Tanzania

Measures of the Location of the Data

Mathematics process categories

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

THEORY OF PLANNED BEHAVIOR MODEL IN ELECTRONIC LEARNING: A PILOT STUDY

DERMATOLOGY. Sponsored by the NYU Post-Graduate Medical School. 129 Years of Continuing Medical Education

Seminar - Organic Computing

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning to Rank with Selection Bias in Personal Search

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Proof Theory for Syntacticians

Axiom 2013 Team Description Paper

Probabilistic Latent Semantic Analysis

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

A Comparison of Two Text Representations for Sentiment Analysis

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Trust and Community: Continued Engagement in Second Life

A Process-Model Account of Task Interruption and Resumption: When Does Encoding of the Problem State Occur?

Generative models and adversarial training

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Universidade do Minho Escola de Engenharia

Practice Examination IREB

Centre for Evaluation & Monitoring SOSCA. Feedback Information

SELF-STUDY QUESTIONNAIRE FOR REVIEW of the COMPUTER SCIENCE PROGRAM

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Transcription:

A simple hybrid movie recommeder system Jaldert Rombouts (rombouts@ai.rug.l) Tessa Verhoef (tverhoef@ai.rug.l) Abstract A simple hybrid movie recommeder system is described that combies cotet based ad collaborative modellig ad provides a explaatio for icreased user acceptace. The system uses ratig data from the Netflix database which is liked to cotet iformatio from the Iteret Movie Database. I order to provide the user with isight ito the reasoig behid a recommedatio, the system creates a HTML page with a detailed explaatio. Performace results are preseted as well as a discussio o future improvemets. Keywords: Recommeder systems; Collaborative filterig; Cotet based filterig; user acceptace; aive Bayes Itroductio We live i a iformatio society i which people are ofte cofroted with very large amouts of data, for istace through the iteret. We are asked to make choices that are almost impossible to make without additioal iformatio or guidace. Recommeder systems ca provide such guidace by assistig the user i the decisio makig process or by makig the decisio for the user. These systems use the eormous amout of available data i a way that users ever ca. Recommeder system are already beig used i a lot of differet domais. GroupLes (Resick, Iacovou, Suchak, Bergstorm, & Riedl, 1994), for istace, is a system that recommeds ews articles that could be of iterest to the user. AbeBooks recommeds books, Last.fm helps the user to fid ew music ad MovieLes, IMDb ad Netflix recommed movies. Amazo.com is a recommeder that is specialized i all three previously metioed media. Collaborative filterig Varyig approaches for automatic recommedatio have bee proposed. The oldest ad most developed method is collaborative filterig. This method uses iter-user comparisos to geerate ew recommedatios. A collaborative system cosists of a database which cotais the users ratigs ad is augmeted as the user iteracts with the system over time. Users are compared based o their ratigs ad the obtaied similarities ad differeces are used to make a recommedatio. Collaborative filterig suffers from the sparsity problem. Not all users exploit the optio to rate items they have see or used. The available ratig data is therefore typically very sparse, especially whe a user is ew or whe the system is ew ad people are just startig to use it. Aother problem is the first-rater problem: before a item has bee recommeded for the first time, the system will ot recommed it. This problem applies to ew items ad obscure items which makes it less attractive for people with o-maistream tastes. A virtue of collaborative filterig is that it ca surprise the user with relevat items that are ot explicitly similar to items i the users profile. This so called outside the box recommedatio ability (Burke, 2002) is possible because it uses people-to-people correlatios. Cotet based methods Aother method uses cotet based iformatio. Cotet based filterig uses item-to-item correlatio to compare represetatios of cotet i a item to represetatios of cotet i items the user has rated. The similarities betwee items ad the ratig iformatio are used to predict how much the user will like or dislike a ew item. A disadvatage of this method is that it is completely depedet upo machie readable represetatios of items, which may be difficult to obtai. Cotet based methods are ot able to surprise the user, because it uses the feature values of the items that the user has rated ad will ot recommed a item that does ot share ay of these values. Just as collaborative filterig, cotet based methods also suffer from the sparsity problem. Hybrid methods Collaborative filterig ad cotet based methods have varyig advatages ad disadvatages. A combiatio of the two methods therefore provides a promisig extesio. Differet ways have bee proposed ad used to combie two differet recommeders (Burke, 2002). Weightig is a simple method that computes a recommedatio from the results of all idividual recommeders, for istace by liearly combiig them or by a votig mechaism (Pazzai, 1999). This is a very straightforward method that makes easy adjustmets possible, but it also uses the implicit assumptio that the relative value of the differet recommeders is uiform for all items, which is ofte ot the case. As a alterative, switchig techiques ca be used, that use a criterio fuctio to decide whe to switch from oe recommeder to the other (Tra & Cohe, 2000). Basu, Hirsh, ad Cohe (1998) proposed a method i which they use the iformatio of oe recommeder as features for the other. The ratig iformatio from the collaborative part is used as a additioal feature ad cotet based filterig is doe o the complete data set, icludig this ew feature. Cascadig differet recommeders is aother way to combie them which ivolves a staged process. Melville, Mooey, ad Nagaraja (2002) for istace use a cotet boosted collaborative filterig techique, i which they first solve the sparsity problem by makig virtual data usig cotet based iformatio, ad the use collaborative filterig o this much larger data set. Explaatio Recommeder systems are widely used, especially i the olie commuity, but up to ow the applicatio areas of the

techique have bee limited to harmless decisio makig i the etertaimet domai. For more serious issues, like bookig a vacatio, buyig isurace or stock market decisios, people do ot trust the techique eough to let it make these decisios for them. Oe way to overcome this distrust is to provide the user with a explaatio (Herlocker, Kosta, & Riedl, 2000). I the eyes of a user, a recommeder system is a black box, which makes it hard to uderstad why a certai decisio has bee made. A explaatio ca provide more isight ito the reasoig behid the decisio ad also gives the user the opportuity to judge, accordig to this reasoig, whether to trust the decisio or ot. It should make the system more trasparet. It has bee show experimetally that explaatios icrease the acceptace of both expert systems ad recommeder systems, or i geeral, decisio support systems (Herlocker et al., 2000; Ye & Johso, 1995). I the followig sectios our hybrid movie recommeder system is described. Collaborative filterig is liearly combied with a aive Bayesia cotet based approach to predict the movie preferece of a user. The system provides the user with a isightful explaatio that justifies the decisio. First the problem domai ad the system are described, the the result of experimets o the performace are reported ad a discussio follows after that. Domai Method The proposed system for movie recommedatio uses both collaborative filterig ad cotet based modellig ad is supposed to perform the followig task: for a certai user, decide which oe of two proposed movies will probably be favored by the user. Two databases are combied ad used: Netflix 1, providig ratig iformatio for the collaborative part, ad the Iteret Movie Database (IMDb) 2 for extractig cotet iformatio. The Netflix database cosists of ratig iformatio of 480189 differet users. The ratigs are raged from oe to five stars. Oe star idicates that a user does ot like the movie ad five stars meas the user loves it. The database cotais 17770 differet movies, of which 8637 are used because these movies could be liked to the cotet iformatio that is available from the Iteret Movie Database (IMDb). To be able to use the iformatio from both databases, a MySQL database was created from which the iformatio could be extracted usig simple queries. Because of the itese data-depedecy ad large calculatios ivolved i collaborative filterig, we decided to use PyFlix 3, a off the shelf Pytho tool for accessig the Netflix dataset, which was a lot faster tha our MySQL database. 1 http://www.etflix.com/ 2 http://www.imdb.com/ 3 http://pyflix.pytho-hostig.com/ System The system is a liear combiatio (50-50) of the predictios of the cotet based model ad the collaborative filter discussed below. Cotet based modellig The cotet based part of the movie recommeder is based o a aive Bayesia text classificatio method (Mitchell, 1997). The classifier creates a aive Bayesia model for every user, based o the cotet of the movies the user has rated. The cotet that is used are the keywords, geres ad actors of a movie ad these features are assiged to a appropriate class: 1, 2, 3, 4 or 5, based o the ratig for that movie. For every feature type, a separate model is created ad the predictios of these models are liearly combied ito oe predictio. The umber of possible feature values of the keywords ad actors would be very large if all possible values were to be used sice there is a huge amout of differet keywords ad differet actors. To be able to keep the feature vectors maageable, oly the keywords are used that occur more tha 20 times (8762 i total) ad oly the actors are used that occur more tha 50 times i the data set (34956 i total). We are iterested i the posterior probability of a certai ratig/class (c), give the observatio of movie features (o) for a ew movie. Usig Bayes Theorem, this ca be defied as: p(c o) = p(c)p(o c) (1) p(o) Sice the deomiator i this equatio does ot deped o c ad we ca make use of the aive assumptio that every feature f i i the observatio is coditioally idepedet of every other feature, we ca rewrite the fuctio as: p(c o) = p(c) p( f i c) (2) where is the umber of features. I order to classify the movie, we eed to fid the maximum posterior probability of the five classes: classi f y( f i... f ) = max p(c = c) p(f i = f i C = c) (3) So, the class/ratig with the highest posterior probability for this movie is the predicted ratig o which the system bases its recommedatio. A disadvatage of this method is that it has to estimate the prior ad coditioal probabilities from the data, while the data is typically sparse. Direct probability calculatio ca therefore ofte give probabilities that are zero, which is udesirable. For this reaso we use Laplacia smoothig (Mitchell, 1997) to elimiate zeros from the estimated probabilities. What this comes dow to is that it adds a umber of additioal halluciated examples. These examples are spread evely over the possible values. I this case, every possible observatio starts from a frequecy of oe, istead of

zero ad the prior frequecy of a class starts from the umber of possible feature values, for istace the umber of possible geres. Our estimate for the prior probability of a class ow becomes: p(c) = c + f t (4) where c is the umber of observatios for class c, f is the umber of possible feature values ad t is the total umber of observatios. The estimate for the coditioal probability becomes: p(o c) = o + 1 c + f (5) where o is the frequecy with which a certai observatio has bee ecoutered i associatio with class c. If the umber of possible observatios becomes very high, the probabilities ca get very small, so i order to avoid uderflow, the log-likelihoods are used. The predictios of the three models are liearly combied ito oe recommedatio i a way that yielded the best performace. It uses 60 percet of the predictio based o keywords, 30 percet of the predictio based o actors ad 10 percet of the predictio based o geres. This is mostly due to the fact that actors ad geres are less ofte kow tha keywords, ad because there is much less variatio i the geres. Collaborative filterig The collaborative part of the system is heavily based o the system described by Herlocker, Kosta, ad Riedl (1999), which is a eighbourhood-based approach to collaborative filterig. The approach cosists of three steps: 1. Calculate the similarity of the active 4 user to other users; 2. Select users that are most similar to the active user: they form the eighbourhood; 3. Calculate a predictio based o a weighted combiatio of the eighbours ratigs. Like Herlocker et al. (1999), we use the Pearso correlatio coefficiet to calculate the similarity betwee two users: P a,u = m (r a,i r a ) (r u,i r u ) m (r a,i r a ) 2 m (r a,u r u ) 2 (6) P a,u is the similarity betwee the active user a ad aother user u, r i, j is the ratig that user i has give to item j, r i is the average ratig that user i has give, ad m is the total umber of items that both users have i commo. Give that the similarity oly depeds o the co-rated items of two users, it is possible to have a high correlatio betwee users while this is i fact ot the case. For istace, if two users have both rated 500 items, but have oly two co-rated 4 This is the user for which we are calculatig a predictio. items, their correlatio ca still be perfect if they agree o these two items. I order to compesate for this effect, we adopted sigificace-weightig as proposed by Herlocker et al. (1999). This ca be described by the followig formula, where is the umber of overlappig items, ad θ is the threshold value: { /θ Pa,u if < θ; P a,u = P a,u otherwise. We adopted a θ value of 50 which meas that two users should have at least 50 overlappig items, or the corrrelatio for this pair is reduced. To calculate the similarities betwee all users (2 ) 2 comparisos are eeded, where is the total umber of users. Give the large amout of users i the Netflix set, this is ot feasible o a ormal computer. I order to get the umber of comparisos dow, we did a preselectio step o the users to evaluate as eighbours. First, oly select users that have see the item that we wat to rate for the active user, ad the radomly select 30000 of these users. So, i the worst case sceario, the algorithm eeds to calculate 30000 similarities. We used this value because it yielded a acceptable rutime for evaluatig oe (user,item) pair o our hardware. Note that we oly add users with a positive ozero correlatio to the list of possible eighbours: accordig to Shardaad ad Maes (1995) icludig egative correlatios makes o real differece. The described strategy for selectig possible eighbours holds aother advatage: we kow that all eighbours have rated the item of iterest, so all ca have a ifluece o the fial predictio. If you would just select the top- users i the aive way, there is o guaratee that these eighbours have actually rated the item of iterest, i which case their vote would be lost. The last step is combiig the ratigs of the eighbours ito a predictio for the active user. The simplest approach would just take all ratigs for the item by the eighbours, ad multiply this by the weight of that eighbour, ad divide the total by the total weight of the eighbours. Herlocker et al. (1999) show that this approach is ot very good, because it does ot compesate for the fact that every user rates differetly. Some users oly give five stars to items that they really love, ad others are ot as striget with their ratigs. A way to compesate for this is to use the deviatio-from-mea average for all users, as detailed i Herlocker et al. (1999). I this way, whe a user always gives high ratigs to items, his vote will be worth less whe his ratig for the item of iterest is also high. The predictio algorithm cosists of the average ratig of the active user which is modulated by the ratigs of the eighbours (our implemetatio uses 50 eighbours). The algorithm calculates its predictio as follows: p a,i = r a + u=1 (r u,i r u ) P a,u u=1 P a,u (7) (8)

If the preselectio step does ot yield ay eighbours for the user, the algorithm returs the best ratig estimate it ca fid for that item. This estimate is the deviatio-from-mea average for all users that have rated this movie plus the average of the active user. If the algorithm does fid eighbours it calculates the the predictio as discussed. Explaatio To icrease the user acceptace of the recommeder system we created a explaatio module that provides the user with detailed iformatio i HTML format to justify the decisio. The explaatio gives the user isight ito the reasoig behid the decisio of both the collaborative model ad the cotet based model ad both algorithms also idicate how certai they are of this decisio i order to give the user the opportuity to judge how trustworthy the decisio is. The fial decisio is also preseted accompaied by this certaity idicatio. There are three grades of certaity that the system ca express: ot very certai, reasoably certai or certai. The implemetatio for this is actually almost trivial: if the differece betwee predictios is small, it wil say that it is ot very certai, ad if the differece is almost a whole star, it will say that it is more cofidet. The thresholds for these decisios were foud through maual experimetatio. Figure 1 i appedix B gives a impressio of the user iterface. The explaatio of the cotet based model is provided to the user i a small story. This story tells the user which features of the two movies caused the system to recommed oe movie over the other. For istace, because the actors i oe movie have bee associated with higher movie rakigs tha the actors i the other movie. The difficulty with these explaatios is that it is very hard to determie which type of feature is resposible for a high or a low ratig. The recommeder oly takes the keywords, actors ad geres ito accout, but what a perso likes or dislikes about a movie might just as well be caused by the ifluece of a director or a certai style of filmig. Due to this credit assigmet problem the explaatio is purely based o the coclusios draw by the aive Bayesia model which does ot always ecessarily correspod with the real prefereces of the user. The user will be able to recogize such a deviatio from his or her ow prefereces ad ca decide whether to trust the recommedatio or ot. The credit assigmet problem will be further discussed i the discussio sectio. We have based the explaatio of the collaborative part of our system o the work of Herlocker et al. (2000). Because the system is actually a complex mathematical model, spellig out how the algorithm works does ot cout as a explaatio. They cocluded that a simple graph showig what the eighbours had rated worked best. We implemeted this feature for our explaatio: it is icluded o the bottom right of the user iterface, see figure 1 i appedix B. Because the system must give a rakig for two movies, eighbours for two movies are visible (if there are ay). There are oly three groups, where the middle group idicates the umber of eighbours that have rated the movies with tree stars. If the distributio has its ceter of gravity to the right, the eighbours thought that the movie was good, ad if it is at the left, they thought that it was a bad movie. Note that this is ot the exact iformatio that the system uses, sice some users have more weight tha others ad these weights are combied as show i equatio (8), but i practice this graph gives a clear impressio, which almost always correspods to the decisio. You ca also see o how much eighbours the decisio is based, which is also some idicatio for the correctess of the decisio of the algorithm. Performace The performace of the hybrid recommeder was tested, but also the performace of the idividual recommeders. The task is defied as a rakig task i which the recommeder has to decide which movie of two movies that are ew for the user will be the most favorable. The predicted rakigs are compared to the actual rakigs with a repeated crossvalidatio approach. The rakig is based o a compariso of the two predicted ratigs for the movies. To provide more isight ito the performace for these predictios, these estimated ratigs are also tested. A Root Mea Squared Error calculatio is used to determie the error i the predictio (p) compared to the real ratig (r) after predictios: RMSE = 1 (p r) 2 (9) The performace of the idividual recommeders was tested usig cross-validatio. I te trials, 100 radom users (who had rated at least two movies for which cotet iformatio was available) were selected. Two movies were extracted from the total amout the user had rated ad the models were created usig the data of the remaiig movies. The the recommeders were asked to classify the two selected movies ad the predicted rakigs ad ratigs were compared. Determiig whether the rakig is correct is ot as trivial as it seems: for istace, what is the correct aswer whe the user has give both test items the same ratig? We decided that ay rakig would cout as correct i this case. The result of the te trials shows that the average umber of correct rakigs for the cotet based model is 86% with a stadard deviatio of 3.37%. The rakig performace of the collaborative model is 88% with a stadard deviatio of 2.65%. The predicted ratigs of the cotet based model deviated from the actual ratigs with a average RMSE of 0.87 with a stadard deviatio of 0.04. The ratig error of the collaborative model was o average 0.88 with a stadard deviatio of 0.03. I our implemetatio, the combied classifier was much slower tha the idividual classifiers, takig about 1-4 miutes per test case. This is the reaso why we did ot test the combied classifier as extesively as the idividual classifiers. The result was a correct rakig of 88.25%, with a stadard deviatio of 1.5%. The average RMSE was 0.83

with a stadard deviatio of 0.05 This is based o four experimets with 50 tests each, so it is hard to say how this compares to the idividual compoets. Discussio It is very hard to compare our results to other work, because almost all writers use differet databases for traiig ad validatig their algorithm. Eve if the writers have used the same database it ca still be that the results are hard to compare because the authors have made differet assumptios ad their algorithms have differet precoditios. I our case for istace, both test movies eed to be liked to the IMDB database i order to get a predictio. So, users have see at least two movies, but give that the probability that we have liked a movie with IMDB is about 50% you would expect that these users have see more tha two movies. I this way, our algorithm sidesteps oe of the real problems of recommeder systems: doig predictios for users that have see very few items. It should be clear that our system is ot early as good as that of the curret leaders i the Netflixchallege 5, which have a RMSE of about 0.86! Both our systems, ad most recommeder systems i geeral, are highly depedet o ratigs that users have give for certai items. It is a importat questio whether these ratigs ca be cosidered reliable. The five poit ratig scale that is used by Netflix is a arbitrary scale ad it has bee show that chagig the scale s rage ca alter the distributio of resposes (Amoo & Friedma, 2001). Excludig a middle choice, for istace, has bee foud to bias towards positive resposes. The ratig behaviour of users is also iflueced by the predictios that recommeder systems make (Cosley, Lam, Albert, Kosta, & Riedl, 2003), which makes it harder for the system to accurately model the users prefereces. To improve the performace of recommeder systems i the future, it might be beeficial to cosider alterative ratig mechaisms. The credit assigmet problem that was metioed earlier is a problem that ot oly affects the explaatio that the system gives, but it is a geeral disadvatage for cotet based methods that we caot objectively determie which part of the cotet was the decidig factor behid a high or a low ratig. Why do people like The Godfather so much? Is it the actors, or the gere? Or do people like it because it is directed by Fracis Ford Coppola ad they like his style? We would ever o uless we ask the user. Predictios of cotet based recommeders could improve if we would ask the user why they give a specific ratig. There are a few disadvatages of the eighbourhood preselectio as used i the collaborative part of the system. The first is perhaps coceptual: the eighbours for both movies are ot actually the same eighbours, there does ot have to be ay overlap betwee these differet groups of users, while the iterface gives the impressio that this is the case. Aother problem is that the system does ot have to be cosis- 5 http://www.etflixprize.com/leaderboard tet i a ratig for a user ad the same two items, because it selects radom users i the preselectio stage ad if there are more tha 30000 users that have see a items, there is a probability that the top-50 eighbours will ot be the same. This is a trade-off that we eeded to make i order to achieve a feasible rutime. Whether the predictios provided by our algorithm are worse tha eighbours calculated i the aive way is somethig that is worth lookig ito. A poit for improvemet lies i the way that the collaborative system calculates the correlatios betwee two users. The solutio offered by Herlocker et al. (1999) that we adopted seems more like a hack tha a real solutio for this problem. There probably are statistical tools that will offer a more solid foudatio for this decisio, e.g. where the cofidece iterval becomes larger if the overlap betwee two users becomes smaller. This will almost certaily have a positive effect o the predictios. Aother poit for improvemet is the way the fial predictios are calculated: if a user has a large umber of eighbours for the first item ad a very small umber of eighbours for the secod item, the votes of the latter eighbours will weigh more heavily. This is also the case whe you use ormal eighbourhood-based methods, i which ot all eighbours have rated both items. The combiatio of the classifiers was doe i a liear fashio, because this is relatively easy to do by had. But, it is probably the case that the weights that we used for combiig them are probably ot the best oes. Also, it is possible that a liear combiatio of the classifiers ot the best solutio to combie them. It also eeds to be metioed that we did ot evaluate the user-iterface of the recommeder system: we followed the suggestios of Herlocker et al. (2000) ad our ow ituitios about what kid of explaatios users would fid acceptable, but we did ot test the system o real ed-users. This is somethig that eeds to be doe i future research. I coclusio, it has bee show that a simple hybrid recommeder system performs reasoably well o a movie recommedatio task ad ca also provide the user with a isightful explaatio that makes the system ad its separate recommeders more trasparet. The advatage of usig a combiatio of differet classifiers is ot oly that they compesate for each others shortcomigs, but also that the combiatio of explaatios provides a detailed descriptio of the reasoig behid the recommedatio. Refereces Amoo, T., & Friedma, H. (2001, February). Do umeric values ifluece subjects resposes to ratig scales? Joural of Iteratioal Marketig ad Marketig Research, 26, 41-46. Basu, C., Hirsh, H., & Cohe, W. (1998). Recommedatio as classificatio: Usig social ad cotet-based iformatio i recommedatio. Proceedigs of the Fifteeth Natioal Coferece o Artificial Itelligece, 714720.

Burke, R. (2002). Hybrid Recommeder Systems: Survey ad Experimets. User Modelig ad User-Adapted Iteractio, 12(4), 331 370. Cosley, D., Lam, S., Albert, I., Kosta, J., & Riedl, J. (2003). Is seeig believig?: how recommeder system iterfaces affect users opiios. Proceedigs of the SIGCHI coferece o Huma factors i computig systems, 585 592. Herlocker, J., Kosta, J., & Riedl, J. (1999). A algorithmic framework for performig collaborative filterig. Proceedigs of the 22d aual iteratioal ACM SIGIR coferece o Research ad developmet i iformatio retrieval, 230 237. Herlocker, J., Kosta, J., & Riedl, J. (2000). Explaiig collaborative filterig recommedatios. Proceedigs of the 2000 ACM coferece o Computer supported cooperative work, 241 250. Melville, P., Mooey, R., & Nagaraja, R. (2002). Cotet- Boosted Collaborative Filterig for Improved Recommedatios. Proceedigs of the Eighteeth Natioal Coferece o Artificial Itelligece, 187 192. Mitchell, C. (1997). Machie Learig. The McGraw-Hill Compaies, Ic. Pazzai, M. (1999). A Framework for Collaborative, Cotet-Based ad Demographic Filterig. Artificial Itelligece Review, 13(5), 393 408. Resick, P., Iacovou, N., Suchak, M., Bergstorm, P., & Riedl, J. (1994). GroupLes: A Ope Architecture for Collaborative Filterig of Netews. I Proceedigs of ACM 1994 Coferece o Computer Supported Cooperative Work (pp. 175 186). ACM. Shardaad, U., & Maes, P. (1995). Social iformatio filterig: algorithms for automatig word of mouth. Proceedigs of the SIGCHI coferece o Huma factors i computig systems, 210 217. Tra, T., & Cohe, R. (2000). Hybrid Recommeder Systems for Electroic Commerce. Kowledge-Based Electroic Markets, Papers from the AAAI Workshop, Techical Report WS-00, 4. Ye, L., & Johso, P. (1995). The Impact of Explaatio Facilities o User Acceptace of Expert Systems Advice. MIS Quarterly, 19(2), 157 172. Appedix A. Item-item collaborative filterig Aother collaborative modellig algorithm was developed which uses item-item similarity, based o comparig user ratigs for these items. The algorithm is very simple: 1. Calculate the distace betwee the movie of iterest ad all movies that the active user has see; 2. Normalize the distaces, so the largest distace equals 1 3. Calculate the predictio based o these distaces. We use the Euclidea distace as a distace measure: m D(a,b) = (r i,a r i,b ) 2 (10) Where m is the umber of users that two movies have i commo ad r i,x is the ratig for a item x of such a user i. Note that this approach suffers from the same problem as the other algorithm: if the overlap is small, the calculated distace will ot be a realistic estimate. We adapted the sigificace weighig strategy as follows to work with distaces istead of correlatio values (θ = 10000): { D(a,b) (θ/) if < θ; D(a,b) = (11) D(a, b) otherwise. Now, all distaces are ormalized by dividig all distaces by the largest distace foud, so every distace is a value i the iterval [0,1]. I order to calculate a predictio we select the top-15 items that are most similar to the item of iterest. Now we use a simple calculatio for the predictio: p a,i = 15 j=1 (1 D(i, j)) r a, j (12) Where p a,i is the predictio for the ratig that user a will give the item of iterest i, ad r a,i is the ratig that user a has give item j. The performace of this algorithm was ot quite as good as we expected, ofte scorig worse tha a simple model that always retured a ratig of 3 (average rmse was 1.55 (stadard deviatio 0.10) for 5 groups cosistig of 100 tests). We thought it was ice to iclude because it is a collaborative approach that works o differet data tha the user-user based approach, ad perhaps it will thus prove useful whe combiig classifiers.

B. Screeshot of the user iterface Figure 1: Screeshot of the user iterface. The result of the combied classifier is o the top, ad the explaatios for the cotet-based part ad the collaborative part are idicated as Cotet Based ad Peer Based respectively. For referece, this predictio is for user 7 i the etflix dataset, ad compares Gigli to Braveheart.