Expert Ranking using Reputation and Answer Quality of Co-existing Users

Similar documents
Natural language processing implementation on Romanian ChatBot

Management Science Letters

arxiv: v1 [cs.dl] 22 Dec 2016

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

Application for Admission

'Norwegian University of Science and Technology, Department of Computer and Information Science

Consortium: North Carolina Community Colleges

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

part2 Participatory Processes

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

VISION, MISSION, VALUES, AND GOALS

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

2014 Gold Award Winner SpecialParent

also inside Continuing Education Alumni Authors College Events

Cross Language Information Retrieval

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Mining Association Rules in Student s Assessment Data

Mining Student Evolution Using Associative Classification and Clustering

A Case Study: News Classification Based on Term Frequency

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Matching Similarity for Keyword-Based Clustering

AQUA: An Ontology-Driven Question Answering System

Rule Learning With Negation: Issues Regarding Effectiveness

Conversational Framework for Web Search and Recommendations

arxiv: v1 [cs.cl] 2 Apr 2017

Assignment 1: Predicting Amazon Review Ratings

Language Independent Passage Retrieval for Question Answering

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services

Probabilistic Latent Semantic Analysis

Linking Task: Identifying authors and book titles in verbose queries

Variations of the Similarity Function of TextRank for Automated Summarization

On the Combined Behavior of Autonomous Resource Management Agents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Organizational Knowledge Distribution: An Experimental Evaluation

DERMATOLOGY. Sponsored by the NYU Post-Graduate Medical School. 129 Years of Continuing Medical Education

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Australian Journal of Basic and Applied Sciences

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Comment-based Multi-View Clustering of Web 2.0 Items

Test Effort Estimation Using Neural Network

Educator s e-portfolio in the Modern University

Team Formation for Generalized Tasks in Expertise Social Networks

Customized Question Handling in Data Removal Using CPHC

CS Machine Learning

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Word Segmentation of Off-line Handwritten Documents

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Summarizing A Nonfiction

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

On-Line Data Analytics

Rule Learning with Negation: Issues Regarding Effectiveness

UDW+ Student Data Dictionary Version 1.7 Program Services Office & Decision Support Group

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

Welcome to. ECML/PKDD 2004 Community meeting

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Detecting English-French Cognates Using Orthographic Edit Distance

Georgetown University at TREC 2017 Dynamic Domain Track

Seminar - Organic Computing

Term Weighting based on Document Revision History

UCLA UCLA Electronic Theses and Dissertations

Universiteit Leiden ICT in Business

Implementing a tool to Support KAOS-Beta Process Model Using EPF

A study of speaker adaptation for DNN-based speech synthesis

The Role of String Similarity Metrics in Ontology Alignment

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Automating the E-learning Personalization

Disambiguation of Thai Personal Name from Online News Articles

Constructing a support system for self-learning playing the piano at the beginning stage

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL

Using dialogue context to improve parsing performance in dialogue systems

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Moodle and joule 2 Teacher Toolkit

Extracting and Ranking Product Features in Opinion Documents

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

An Investigation into Team-Based Planning

A Comparison of Two Text Representations for Sentiment Analysis

On document relevance and lexical cohesion between query terms

Learning Methods in Multilingual Speech Recognition

STUDENTS' RATINGS ON TEACHER

Truth Inference in Crowdsourcing: Is the Problem Solved?

Mining Topic-level Opinion Influence in Microblog

A Note on Structuring Employability Skills for Accounting Students

A Comparison of Standard and Interval Association Rules

Rule-based Expert Systems

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Procedia Computer Science

Patterns for Adaptive Web-based Educational Systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Reducing Features to Improve Bug Prediction

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Transcription:

The Iteratioal Arab Joural of Iformatio Techology, Vol. 4, No., Jauary 207 8 Expert Rakig usig Reputatio ad Aswer Quality of Co-existig Users Muhammad Faisal, Ali Daud ad Abubakr Akram 2 Departmet of Computer Sciece, Iteratioal Islamic Uiversity, Pakista 2 Departmet of Computer Sciece, COMSATS Istitute of IT, Pakista Abstract: Olie discussio forums provide kowledge sharig facilities to olie commuities. Usage of olie discussio forums has icreased tremedously due to the variety of services ad their ability of commo users to ask questio ad provide aswers. With the passage of time, these forums ca accumulate huge cotets. Some of these posted discussios may ot cotai quality cotets ad may reflect users persoal opiios about topic which may cotradict with a relevat aswer. These low quality discussios idicate the existece of uprofessioal users. Therefore, it is imperative to rak a expert i olie forums. Most of the existig expert-rakig techiques cosider oly user s social etwork authority ad cotet relevacy features as parameters of evaluatig user expertise. But user reputatio as a group member of thread repliers is ot cosidered. I this cotext a ovel solutio of expert rakig i olie discussio forums is proposed. We proposed two expert rakig techiques: The first techique is based o user ad their co-existig user s reputatio i differet threaded discussios, ad the secod techique is based o user aswers quality ad their category specialty features. Furthermore, we exteded a techique expertise rak with our proposed features sets. The experimetal study based o real dataset shows that the followig proposed techiques perform better tha existig techiques. Keywords: Co-existig user, expertise rak, ExpRak-CRF, ExpRak-COM, ExpRak-FB, ExpRak-AQCS. Received October 4, 204, accepted July 9, 205. Itroductio The World Wide Web (WWW) provides a immese kid of platform to olie commuities for searchig topics ad areas of iterests. But due to preset architecture of WWW it is difficult for users to fid topics of their domai withi a sigle web site. News aggregators ad social media etworks like Digg, Reddit 2 ad Google ews reader 3 are emergig services of web 2.0. These services facilitate users by sharig ad recommedig stories ad ews articles. But it is difficult for users to obtai a list of topic specific articles that are similar ad relevat because the posted articles may ot be chaied or liked together i a logical sequece or may ot be categorized. I these ews aggregators there is o reliable criteria to evaluate the competecies of users whom provide ratigs to articles. O the other had a olie discussio forum provides coveiece to users whom are iterested i thorough topic readig. Structural aalysis ad visualizatio of social etworkig commuities yields a better uderstadig of user authorities i the etwork, like ifluetial people fidig etc. Commuity etwork structural properties gives isight about the dyamics http://digg.com/ 2 http://reddit.com 3 http://ews.google.com/ of a commuity, evolvig ature of it etc. Systematic ad comprehesive aalysis ad visualizatio of social etwork commuity gives isight ito the structural aspects of a social etwork [7]. Cotet quality is a major cocer i olie discussio commuities. Due to the presece of poor quality cotet it is desirable to fid topic specific experts so we may recommed a list of experts for user queries. Both documet cotet ad social etwork structures are used as the basic parameters for expert fidig [8]. I a effort, a lik aalysis algorithm PageRak is adapted as expertise rak algorithm [20] for olie help seekig or techical commuities, the algorithm cosiders the reputatio of users to whom a user aswered i java discussio forum, if a user aswers people who are also experts the his rak will be boosted. Z-score measure is recommeded for fidig experts who play active ad cooperative roles by providig quality aswers have bee recommeded [0]. Olie forums possess a hierarchal structure which cosists of a thread ad their respective posts or replies. These forum structures possess social etwork characteristics which may help i expert fidig. A expert rakig techique is preseted i which topic specific threads ad posts have bee retrieved through Query likely hood method [5]. Olie discussio forums have a thread/post structure, where the thread represets a questio asked or topic shared by a questio-asker whereas a post is a

9 Expert Rakig usig Reputatio ad Aswer Quality of Co-existig Users reply/aswer or commet to that questio. Topics discussed i olie forums are grouped ad similar i ature due to the forum structure because all replies to a sigle questio or all discussio opiios o a sigle topic follow a strict hierarchal structure. Therefore, it is easy to fid a chai of relevat discussios o some topic. I most of research efforts i expert rakig domai, sigle user authority or prestige has bee cosidered which iclude features like the umber of aswers provided by a user or social etwork features of a user etc. Users ad their co-existig users reputatio has ot bee cosidered for rakig experts. User s aswer quality ad their aswerig behavior i differet categories is also a effective parameter for expertise evaluatio. We may achieve better expert rakigs by icorporatig these reputatio ad aswer quality features. Our cotributios are as follows: The primary objective of our work is to provide expert rakig techiques for olie discussio forums. Iitially, extractio of co-existig users (users who co-occur together i differet threads as repliers) have bee performed usig priory algorithm. Details are give i sectio 5. Firstly, a expert rakig techique ExpRak-CRF has bee proposed. Accordig to this techique, a user will be a expert if theirs ad their co-existig user s reputatio is high. Cotet ad lik based attributes have bee used to measure the reputatio. Furthermore, we have exteded ExpertiseRak [20] algorithm with our proposed ExpRak-CRF. We amed it ExpRak-COM. Details are give i sectios 6. ad 6.2. Secodly, a expert rakig techique ExpRak-FB has bee proposed. Accordig to this techique, a user will be a expert if he provides quality aswers i specific categories. Cotet relevacy ad category specialty features have bee used to measure user expertise. Furthermore, we exteded ExpertiseRak [20] algorithm with our proposed ExpRak-FB. We amed it ExpRak-AQCS. Details are give i sectios 6.3 ad 6.4. 2. Related Works I this sectio, we itroduce literature related to expert rakig problem i olie forums. We provided details regardig expert rakig techiques based o lik ad cotet base features. Authoritative user idetificatio techique is preseted by Bouguessa et al. [3] for Yahoo aswers based o iteractios betwee asker ad aswer providers. Several lik aalysis techiques like PageRak, HITS have bee applied ad aalysed o this data. Crowd sourcig is the process of obtaiig ideas, services from large people groups 4 ad mostly 4 http://e.wikipedia.org/wiki/crowdsourcig from olie commuities. Usig crowd sourcig compaies may beefit from combied ad collaborative efforts of experts. Schall [4] proposed a model DSARak for estimatig the relative importace of persos based o reputatio mechaisms i collaboratio etworks. It s a lik itesity based rakig model for relevat user s recommedatio. Expert fidig problem i programmig forums is eed of commuity. By kowig experts, programmig questios may be forwarded to them. Li et al. [9] proposed a algorithm ad a tool G- Fider which performs the questio routig decisios to experts. I mappig threads to cocepts thread s title, the source code ad cotet has bee used. Zhu et al. [23] proposed a expert fidig framework to rak user s authority i exteded-category lik graphs. Iitially relevace betwee categories is measured by KL-Divergece ad topic model. Yag et al. [9] proposed probabilistic geerative topic-expertise model for modellig discussio topics with expertise i a olie QA services. Furthermore, based o this model a rak is proposed which combied textual ad lik features for derivig topic specific expertise. Karda et al. [8] proposed a cotext based lik aalysis algorithm for expert fidig. A user may be ivolved i olie discussios i several cotexts like sharig topics, aswerig questios or a user may ask some questios from experts. Vekataramai et al. [7] proposed a approach for expert fidig i stack over flow programmig forum. Techical programmig terms i source code ad tags associated with each query are used to mie user expertise. This model the captures expertise based o term ad tag relatioship. Super edge which is a adaptatio of page rak algorithm is proposed. Several idexes have bee suggested for super etwork modellig like idex for fidig ifluece durig iformatio dissemiatio, lexical overlap betwee terms etc. Zhou et al. [22] addressed a problem of directig ewly posted questios to relevat area experts i olie forums. Three-model framework is proposed to accomplish the task of expert fidig. Laguage models have bee costructed based o experts profile ad thread s coversatio structure. Experts rerakig has bee performed usig Page rak algorithm. Accordig to Zhag et al. [2] respose time for expert posted questios is higher tha ovice posted questios ad if a expert asks a questio the it is difficult for ovice users to aswer expert s questios which causes a expertise gap problem. This is due to fact that ovice users have o experiece i the specific area ad their kowledge level is also very low. Several expertise rakig algorithms proposed icludig z-degree, i-degree, out-degree, HITS ad a adaptatio of PageRak algorithm. I a commuity questio aswerig service, a expert aswers a questio which is relevat to his filed. Pal et al. [2] proposed a probabilistic model to evaluate the existig value of a questio. Mai attributes for evaluatig

The Iteratioal Arab Joural of Iformatio Techology, Vol. 4, No., Jauary 207 20 aswers quality of a questio are umber of aswers, votes received, aswer status, author reputatio ad cotet quality. Shahzad et al. [6] have used Frequet Patter (FP) growth ad fuzzy for saitizig sesitive sequetial patters. First, they fid frequet patters from sequetial data usig mootoe ad atimootoe costraits. The they fuzzily the frequet patters ad hide those patters that are sesitive. Adhikari et al. [] proposed a geeralized approach for miig multiple databases usig local patter aalysis. Daud et al. [6] preseted a temporal ad sematics based expert fidig techique. Coferece ifluece ad time iformatio used together as geeralized topic modellig approach for expert fidig problem. Zhu et al. [24] proposed a expert fidig approach i which cotet ad lik similarities has bee computed to measure the category relevace. Topic-Lik based techiques used to measure the user authority across several categories. This has bee doe o exteded category lik graphs. Riahi et al. [3] recommeded a profile based expert fidig techique. These profiles are used to suggest experts for a give topic agaist a user query. Iterests base user rakig has bee doe usig Term Frequecy ad Iverse Documet Frequecy (tf-idf) ad laguage model. Omidvar et al. [] proposed a cotext based expert fidig techique. Cotext is measured usig WordNet ad users rakig has bee doe usig social etwork aalysis techiques. 3. Problem Statemet I this sectio we first itroduce defiitios of basic elemets used i olie discussio forums ad the formally defie the problem for user-reputatio based Expert rakig. I this paper, we defie: Thread: Thread is a questio asked by a user or it may be a topic iitiated by a user for gaiig isight o some topic i a olie forum. A thread may cosist of may posts. Post: Post is a reply or a aswer provided by a user i a thread. Co-Existig User: Users who reply or co-occur together i two or more threads. Defiitio, (Users-reputatio based Expert Rakig): Let E={e, e 2, e 3,,e } be the set of expert users. Let T be the set of all threads i which user U has participated, where T={t, t 2, t 3,, t m } ad U={u, u 2, u 3,..., u }. We say that a user U i is a expert if he has participated i thread T i as a CE i ad whose S- Rep(U, T c, FI c, Cot sim ) ad CE-Rep(S-Rep, SR(CR)) is high. Where CE i represet co-existece of user with other users, S-Rep represets self-reputatio score of a user, CE-Rep represets reputatio score of members with whom he has co-existed i differet threads, T c represets thread cout ad Cot sim represets cotet similarity 4. Baselie PageRak [4] algorithm raks a web page based o the quality of icomig liks to that page. The more the umber of icomig liks to a web page the more will be the page rak of that page, further-more if icomig page has more umber of out-goig liks the its impact will be decreased. PageRak value for a page a ca be expressed as: PR ( a) k S u PR ( b ) = () Lb ( ) i.e., the PageRak value for a page a is depedet o the PageRak values for each page b cotaied i the set S u (the set cotaiig all pages likig to page a), divided by the umber L(b) of liks from page b. Based o PageRak idea [4], Zhag [20] proposed a Expertise Rak algorithm for olie commuity forum. Accordig to this algorithm, if a user A provides aswer to a user B s questio who is a domai expert, tha it meas that user A has more expertise tha user B because it aswered a expert s questio. Assume UserX has aswered questios for users U,, U, the the Expertise Rak of User X is give as follows: ER ( U ) ER ( U ) ER ( X ) = ( d ) + d +... + LU ( ) LU ( ) (2) Where ER(X) is expertise rak for user X, U is the user who is aswered by X, d is a dampig factor which is set to.85 ad L(U i ) is defied as the total umber of users who helped U, accordig to this idea, a user will have more expertise if he replies to the questios posted by expert users. User rak will be decreased if he puts too may questios i olie forum. Aswer quality has ot bee cosidered as a parameter for evaluatig user expertise i expertise rak [20]. Furthermore, users ad their co-existig user s reputatio have ot bee take i to accout. Both of these factors are effective i expert rakig. We have exteded expertise rak [20] techique with our proposed methods. I our exteded methods, we set dampig factor d s value to.85 because we tried differet values of d like.25 ad.65 but it did ot make a otable differece. 5. Co-existig Users Extractio For expert rakig problem, first task is to extract coexistig users from threaded discussios. A geeral forum structure is represeted as follows: Let F={t, t 2, t 3,..., t } be the forum cotaiig a set of threads T={t, t 2, t 3,..., t m }, where T i be the set cotaiig posts P={p, p 2, p 3,..., p } where P i be the post or reply posted by the user U={u, u 2, u 3,..., u }.

2 Expert Rakig usig Reputatio ad Aswer Quality of Co-existig Users 5.. Co-Existig User Modellig It is defied that co-existig users are group of users who reply together i several threads agaist user posted questios. Group of co-existig users is modelled as: Let R={r, r 2, r 3,..., r } be the group of users who replied i differet threads T={t, t 2, t 3,..., t }. The group R cotais a set of co-existig users as CE={r.r 2.t, r.r 2.t 2, r.r 2.t 3,..., r.r k.t m }. Where r is replier, t is thread ad R is the group, CE is co-existig users i each group. Followig co-existig user types have bee foud. Co-Existig User as a Aswer Provider: These types of users are cosidered as expert users because they oly provide aswers ad they did ot posted ay questio. I our dataset there are few users who lies i this category. Co-Existig User as Asker as well as Aswer Provider: These types of users give aswers to posted questios but they also asks some questios. This type of users has bee hadled by our proposed techiques. Co-Existig User as Asker-Oly: These types of users oly post questios i forums. These may be ovice people who wat to get aswers for their questios or to gai isight o some topic of their iterest. For expert rakig, we eed to extract all co-existig users from threaded discussios. For extractio, we used apriori algorithm which has bee used sice log time for fidig frequet item sets i trasactioal databases [2]. I our case apriori algorithm has bee applied o a set of 0,000 threads ad their respective posts. We obtaied 450 forum users who have bee foud co-existig i differet threaded discussios. Support measure has bee used to check the existece frequecy of users i differet threads. Miimum obtaied support was 2 ad maximum-support was 22. Support ad cofidece measures 5 are defies as follows: Support ( X X Y ) = Y Cofidece ( X Y ) Support ( X Y ) Support ( X ) (3) = (4) I some cases asker-oly type users foud i their ow iitiated threads as aswer providers. Their appearace i the threads may be due to may reasos such as to clarify some poit or they wat to appreciate the aswer of some expert. Their presece is also possible due to some cotroversy exist betwee their poit of view ad other users. 5 http://e.wikipedia.org/wiki/associatio_rule_learig 6. Proposed Expert Rakig Techiques Two expert rakig techiques have bee proposed. Furthermore, we have exteded a techique expertise rak [20] with our proposed expert rakig techiques. 6.. ExpRak-CRF ExpRak-CRF is based o co-existig user reputatio features; it is comprised of four distict reputatio features. Features are illustrated as follows: 6... Threads Support Cout for User The motivatio behid this feature is that the higher the user co-exists as a replier or aswerer i differet threads, the higher the chace that he will remai active overall. The support (U) of a user is defied as the percetage of threads which cotai the user as coexistig. Let T c be the thread support cout of each user. T = T + CE (5) m c i = i i = i Where T c is thread cout, CE i represets set of coexistig users, Let ψ be the support threshold. If T c >= ψ for a user U={u, u 2,..., u } the u i is the active participat i thread T={t, t 2,..., t m }, where ψ=2. 6..2. Frequet Item-Sets i which User Co-Exist The purpose of coutig total frequet item-sets is to cout the frequecy of user s groupigs or item-sets. Based o this feature, it is expected that higher the user appeared i differet item-sets, the higher the chace that he will be a expert. It ca be formulated as: Let F be the total frequet item-sets i which users have bee co-existig. F={Sup (U i, T i, CE i )}, where Sup (U i, T i, CE i ) is the support cout of threads i which user has bee co-existig, U i is the user set, T i is thread set, CE i is the set of co-existig users. Let α be the threshold value for F. If Sup (U i, T i, CE i )>=α the (T i, CE i ) would be cosidered frequet. Here, we set α=2 because oly those users will be selected who foud co-existig i 2 or more frequet item sets. 6..3. Sematic Similarity Amog Posts of Co- Existig Users for a Give Topic Cotet quality is a effective way to evaluate a user expertise, therefore i our case cotet quality of posts for respective threads has bee cosidered as a feature. It is expected for co-existig users that, if the cotet similarity betwee their post cotets i differet threads is similar or early equal, the those users may have commo domai of iterest ad have expertise i that area. It is formulated as follows: Let S be the set of sematic similarity scores of coexistig users C r s post cotet i their respective

The Iteratioal Arab Joural of Iformatio Techology, Vol. 4, No., Jauary 207 22 threads. i.e., S={S Cr, S 2 Cr 2, S 3 Cr 3,..., S Cr m }. If S CR m >=β the the co-existig users have same area of expertise ad have highly relevat cotet for a give questio or topic. Although, cosie similarity has bee extesively used i past research ad gave better results but it oly cosiders lexical overlap betwee documets. Due to this limitatio, the cotext i discussio s cotet is totally igored. It give rise to polysemy problem therefore sematic similarity techiques are preferred for evaluatig cotet overlap betwee discussios. We computed sematic similarity betwee differet post cotets of differet users. We used a algorithm proposed by Leacock-Chodorow [5]. This algorithm defies a similarity measure which is based o distace of the cocepts i the WordNet IS-A hierarchy. 6..4. Co-Existig User Reputatio Rak of users is boosted if their co-existig users have high reputatio. Iitially we compute the reputatio of each user idividually. Users reputatio have bee computed by addig their scores of thread support Equatio 5, frequet item sets i which they co-exist sectio 6..2 ad sematic similarity score of their posts sectio 6..3. It is illustrated as follows: U rep = Sup ( Threads ) + Cout ( Freq itemsets ) + Sim ( Post ) (6) Where U_Rep is user reputatio score of each idividual. Sup is thread support, Cout is frequet item-sets cout, ad Sim is sematic cotet similarity score betwee co-existig users posts for each thread i which they appeared. For computig user expertise based o their coexistig users reputatio, U_Rep score Equatio 6 for each user has bee added to their co-existig users reputatio score. It is illustrated as follows: ExpRak = i = U ( U, U ) + U rep (7) CRF rep i ci ExpRak-CRF is user s expertise score based o the reputatio of their co-existig users. Here, U_Repis reputatio score for each user which is computed i Equatio 6 ad i = U ( U, U ) is the summed reputatio rep i ci score of all other users who co-exist with this user. This score is computed based o U_Rep score Equatio 6. Where U i represet user who has participated i a thread i ad U ci represets co-existig users for thread i. 6.2. ExpRak-COM ExpRak-COM is a proposed extesio of the ExpertiseRak [20] algorithm with our proposed ExpRak-CRF techique Equatio 7. Notio behid the ExpRak-COM is to erich ExpertiseRak [20] techique with our proposed ExpRak-CRF features Equatio 7. Accordig to this techique, user s expertise are ot computed oly based o the total umber of questio-askers to whom they aswer but it also icludes the reputatio of questio-askers who co-exists i differet threads. If questio-askers have high expertise ad their co-existece reputatio is also high the the rak of user will also be high who aswers their questios. It is also assumed that such questio-askers are of similar domai ad they are actively participatig i a collaborative way. So, both scores (ExpRak-CRF ad ExpertiseRak) of a user have bee combied by multiplyig user reputatio score (rep) with their ExpertiseRak score, we amed it ExpRak-COM. It is illustrated as follows: ER ( U ) ER ( U ) CR A d d rep rep ( ) = ( ) + * +... + * CU ( ) CU ( ) (8) Where CR(A) is ExpRak-COM score for user A, ER is ExpertiseRak score of user U who is aswered by user A, rep is ExpRak-CRF score of user U computed i Equatio 7, C is the total umber of users who helped user A. ad d is dampig factor whose value is set to.85. 6.3. ExpRak-FB Accordig to this techique, a user is expert if he provides quality aswers i topic specific categories. I this regard followig features have bee proposed: f. Cout User s Highly Similar Replies for each Thread: It is expected that, user expertise will be high if sematic similarity score betwee his post cotets ad thread titles is high. This feature is computed for all threads i which users exist usig WordNet [5]. f2. Metio Liks: It is expected that, if users metio liks i their post cotets, their aswer quality will be high as they provided a exteral source to support their aswers. f3. Aswer Cout i each Category: It is expected that, if umber of replies by a user i a specific category is high, he will be cosidered as a expert i that domai. f4. Metio Quotes: Existece of quotes i user s post cotets shows that they provide quality aswers. f5. Aswer Cout: The maximum the user will provide aswers to questios. The higher the possibility of a user to be a expert. f6. Aswer Legth: It is expected that if a user provide aswers with good legth tha it meas he produces well explaied aswers. I order to rak experts, features scores have bee added for all users, it is illustrated as: FB = i i ExpRak = i ( U, f ) (9) Where ExpRak-FB: Is features based expert rakig which is computed by addig aswer quality ad

23 Expert Rakig usig Reputatio ad Aswer Quality of Co-existig Users user s category specialty features score for each user, f i : Is the feature score for user U i. 6.4. ExpRak-AQCS ExpRak-AQCS is a proposed extesio of ExpertiseRak [20] algorithm with our proposed ExpRak-FB techique Equatios 9. Notio behid the ExpRak-AQCS is to erich ExpertiseRak [20] by addig aswer quality ad category specialty features score of a user to his ExpertiseRak score. Accordig to this techique, a user s expertises are ot oly based o the total umber of questio-askers to whom he aswers but it also icludes the questio-asker s aswer quality ad their category specialty score. So both scores (ExpRak-FB ad ExpertiseRak) of a user have bee combied by multiplyig user s aswer quality ad category speciality score with their ExpertiseRak score, we amed it ExpRak-AQCS. It is illustrated as follows: ER ( U ) ER ( U ) AQCS ( A ) = ( d ) + d * f +... + * f (0) CU ( ) CU ( ) Where AQCS: Is ExpRak-AQCS score for user A, ER: Is ExpertiseRak of user U who is aswered by user A, f: Is a summed features (aswer quality, category speciality) score for each user computed as ExpRak- FB i Equatio 9, C is the total umber of users who helped user A. d is dampig factor whose value is set to.85. 7. Experimets I this sectio we describe dataset, performace measures ad results. 7.. Dataset We used a public BBC message board s discussios dataset from cyberemotios 6. BBC data set cosist of differet categories icludig world ews, UK ews, media ad religious topics. It was a four year data. There were 97,946 threads ad 2,592,745 posts/commets. Total 8,000 users have bee participated i these olie discussios. For expert rakig problem, based o our requiremet we selected forum users who provided maximum replies for questios or topics. Iitially, we selected users who participated i 0,000 threads. There were 500 users who participated i these threaded discussios. Out of 500 users, there were 450 users who co-exist i differet discussios. Labellig a big dataset was a major problem. Therefore, huma judgmets for labellig the dataset have bee take. For labellig purpose, Zhag [20] categorized users ito five expertise levels. We have adapted their ratig criteria for labellig users as experts i our dataset. Table shows the details: Table. Expertise ratig levels. Level Category Descriptio 5 Experts Highly iformative ad ca timely aswer critical questios. 4 Professioal Ca aswer ad discuss domai specific topics well. 3 User Ca aswer geeral questios ad have some basic cocepts. -2 Begier or Amateur Just startig to kow about geeral issues or wat to gai isight o some topic. Because most of our data set was cosist of world ews ad sports topics, therefore we take help from two huma raters to label these 450 users. These raters were from Broadcast Jouralism domai. 7.2. Performace Measures Spearma s rho ad Kedall s Tau are the commo correlatio measures 7. However, weak orderig are ot hadled well by Spearma correlatio (weak orderig meas that rakig has multiple items ad either item i the list is preferred over other item). I our case we have weak orderig because multiple users have bee assiged same ratig score by huma raters. O the other had Kedall s Tau gives equal weight to ay iterchage of equal distace, regardless of where it occurs [20]. We selected Kedall s Tau which is a better metric. Upo receivig 450 users ratigs from huma raters the huma rater s reliability have bee checked by itra-rater correlatio. The Kedall s Tau distace betwee the two huma raters was foud 0.773, ad the Spearma s rho correlatio coefficiet was 0.79 (p<0.0), which is sufficietly a high rate of iter-rater correlatio. 7.3. Results ad Discussio I our case we computed both Kedall s Tau ad Spearma s rho correlatios for both proposed ad exteded methods. Top-50 ad top-00 raked users have bee selected for measurig correlatios. Figures, 2, 3 ad 4 show the correlatios scores for baselie, proposed ad exteded methods. It is evidet from the Figures, 2, 3, ad 4 that proposed ad exteded techiques have achieved a better ad sigificat correlatio score agaist huma-assiged score. This shows the stregth of proposed methods. Here, we discuss some mai methods comparisos. Figure. Correlatios for top-50 users. 6 http://www.cyberemotios.eu/data.html 7 http://e.wikipedia.org/wiki/rak_correlatio

The Iteratioal Arab Joural of Iformatio Techology, Vol. 4, No., Jauary 207 24 Figure 2. Correlatios for top-50 users. Figure 3. Correlatios for top-00 users. Figure 4. Correlatios for top-00 users. ExpertiseRak vs. ExpRak-FB: Kedall s correlatio betwee huma experts is 0.773. From Figures ad 3, it is evidet that for both top-50 ad top-00 users, proposed method ExpRak-FB which is based o aswer quality ad category specialty features, outperformed both expertise rak ad ExpRak-CRF methods. This shows that aswer quality ad category specialty features are very effective i expert rakig. User s high cotet overlap showed that these are from same domai ad their poit of view o give topic is also same. Additioally, aswerig i specific categories shows their domai specificity. ExpertiseRak vs. ExpRak-CRF: From Figures ad 3, it is evidet that for both top-50 ad top-00 users, proposed ExpRak-CRF method performed better tha ExpertiseRak. This is due to the effect of addig user s co-existig reputatio score to his self-reputatio score. ExpertiseRak vs. ExpRak-CRF+ExpRak-FB: It is evidet that for both top-50 ad top-00 users, proposed hybrid method (ExpRak-CRF +ExpRakCRF) performed better ad their correlatio score with huma ratig is 0.758. Proposed features for these hybrid methods showed that user reputatio, co-existig reputatio ad aswer quality features are best for expert rakig problem. Hybrid method (ExpertiseRak + Hybrid): From Figures 2 ad 4, it is evidet that for both top-50 ad top-00 user s proposed Hybrid method (expertise rak+hybrid) outperformed all other methods. It is due to fact that characteristics of all proposed methods have bee combied with baselie expertise rak method. For all methods Spearma s rho shows relatively higher correlatio scores tha Kedall s, but for each result it shows approximately same rakig differeces as Kedall s tau. 8. Coclusios ad Future Work This paper proposes expert rakig techiques for olie discussio forums. These techiques cosiders users ad theirs co-existig users reputatio i differet threads alog with their aswer quality ad category speciality features. Although, proposed techiques show better performace, these techiques may be further improved by icorporatig credibility of user s cotet through computig -gram similarity betwee thread title ad posts. Other features like coutig ous, verbs, stop words ad o-stop words may also be sigificat ad may be used i idetifyig quality aswer providers. Curretly, we proposed these techiques for olie ews discussio forum but these may be exteded i future for rakig experts i olie programmig forums as well. Refereces [] Adhikari A., Ramachadrarao P., Prasad B., ad Adhikari J., Miig Multiple Large Data Sources, The Iteratioal Arab Joural of Iformatio Techology, vol. 7, o. 3, pp. 24-249, 200. [2] Agrawal R. ad Srikat R., Fast Algorithms for Miig Associatio Rules i Large Databases, i Proceedigs of the 20 th Iteratioal Coferece o Very Large Data Bases, Sa Fracisco, pp. 487-499, 994. [3] Bouguessa M., Dumouli B., ad Wag S., Idetifyig Authoritative Actors i Questio Aswerig Forums: The Case of Yahoo! Aswers, i Proceedigs of the 4 th ACM SIGKDD Iteratioal Coferece o Kowledge Discovery ad Data Miig, Nevada, pp. 866-874, 2008. [4] Bri S. ad Page L., The Aatomy of a Large- Scale Hypertextual Web Search Egie, Computer Networks ad ISDN Systems, vol. 30, o. (-7), pp. 07-7, 998. [5] Budaitsky A. ad Hirst G., Evaluatig WordNet-based Measures of Lexical Sematic Relatedess, Computatioal Liguistics, vol. 32, o., pp. 3-47, 2006.

25 Expert Rakig usig Reputatio ad Aswer Quality of Co-existig Users [6] Daud A., Li J., Zhou L., ad Muhammad F., Temporal Expert Fidig Through Geeralized Time Topic Modellig, Kowledge-Based Systems, vol. 23, o. 6, pp. 65-625, 200. [7] Hua G. ad Haughto D., A Network Aalysis of a Olie Expertise Sharig Commuity, Social Network Aalysis ad Miig, vol. 2, o. 4, pp. 29-303, 202. [8] Karda A. ad Behzadi M., Cotext based Expert Fidig i Olie Commuities usig Social Network Aalysis, Iteratioal Joural of Computer Sciece Research ad Applicatio, vol. 2, o., pp. 79-88, 202. [9] Li W., Zhag C., ad Hu S., G-Fider: Routig Programmig Questios Closer to the Experts, i Proceedigs of the ACM Iteratioal Coferece o Object Orieted Programmig Systems Laguages ad Applicatios, Nevada, pp. 62-73, 200. [0] Li Y., Liao T., ad Lai C., A Social Recommeder Mechaism for Improvig Kowledge Sharig i Olie Forums, Iformatio Processig & Maagemet, vol. 48, o. 5, pp. 978-994, 202. [] Omidvar A., Garakai M., ad Safarpour H., Cotext Based User Rakig i Forums for Expert Fidig usig WordNet Dictioary ad Social Network Aalysis, Iformatio Techology ad Maagemet, vol. 5, o., pp. 5-63, 204. [2] Pal A. ad Kosta J., Expert Idetificatio i Commuity Questio Aswerig: Explorig Questio Selectio Bias, i Proceedigs of the 9 th ACM Iteratioal Coferece o Iformatio ad Kowledge Maagemet, Otario, pp. 505-508, 200. [3] Riahi F., Zolaktaf Z., Shafiei M., ad Milios E, Fidig Expert Users i Commuity Questio Aswerig, i Proceedigs of the 2 st World Wide Web Coferece, Lyo, pp. 79-798, 202. [4] Schall D., Expertise Rakig usig Activity ad Cotextual Lik Measures, Data & Kowledge Egieerig, vol. 7, o., pp. 92-3, 202. [5] Seo J. ad Croft W., Thread-based Expert Fidig, i Proceedigs of the SIGIR Workshop o Search i Social Media, Bosto, 2009. [6] Shahzad F., Asghar S., ad Usmai K., A Fuzzy Based Scheme for Saitizig Sesitive Sequetial Patters, The Iteratioal Arab Joural of Iformatio Techology, vol. 2, o., pp. 60-68, 205. [7] Vekataramai R., Gupta A., Asadullah A., Muddu B., ad Bhat V., Discovery of Techical Expertise from Ope Source Code Repositories, i Proceedigs of the 22 d Iteratioal Coferece o World Wide Web Compaio, Brazil, pp. 97-98, 203. [8] Wag G., Jiao J., Abrahams A., Fa W., ad Zhag Z., ExpertRak: A Topic-aware Expert Fidig Algorithm for Olie Kowledge Commuities, Decisio Support System, vol. 54, o. 3, pp.442-45, 203. [9] Yag L., Qiu M., Gottipati S., Zhu F., Jiag J., Su H., ad Che Z., CQARak: Joitly Model Topics ad Expertise i Commuity Questio Aswerig, i 22d ACM Iteratioal Coferece o Iformatio ad Kowledge Maagemet, Califoria, pp. 99-08, 203. [20] Zhag J., Ackerma M., ad Adamic L., Expertise Networks i Olie Commuities: Structure ad Algorithms, i Proceedigs of the 6th iteratioal coferece o World Wide Web, Caada, pp. 22-230, 2007. [2] Zhag J., Ackerma M., Adamic L., ad Nam K., QuME: A Mechaism to Support Expertise Fidig I Olie Help-seekig Commuities, i Proceedigs of the 20 th Aual ACM Symposium o User Iterface Software ad Techology, Newport, pp. -4, 2007. [22] Zhou Y., Cog G., Cui B., Jese C., ad Yao J., Routig Questios to the Right Users i Olie Commuities, i Proceedigs of the 25 th IEEE Iteratioal Coferece o Data Egieerig, Shaghai, pp. 700-7, 2009. [23] Zhu H., Cao H., Xiog H., Che E., ad Tia J., Towards Expert Fidig by Leveragig Relevat Categories i Authority Rakig, i Proceedigs of the 20 th ACM Iteratioal Coferece o Iformatio ad Kowledge Maagemet, Scotlad, pp. 222-2224, 20. [24] Zhu H., Che E., Xiog H., Cao H., ad Tia J., Rakig user Authority with Relevat Kowledge Categories for Expert Fidig, World Wide Web, vol. 7, o. 5, pp. 08-07, 204. Muhammad Faisal is a PhD. Cadidate i Departmet of Computer Sciece at Iteratioal Islamic Uiversity, Islamabad. His curret research iterests iclude: Iformatio retrieval techiques for olie discussio forums, Movie Recommeder systems ad miig social web data. Ali Daud is workig as Assistat Professor i the Departmet of Computer Sciece at Iteratioal Islamic Uiversity, Islamabad. He obtaied his PhD degree from Tsighua Uiversity i 200. He is head of Data Miig ad Iformatio Retrieval Group. His curret research iterests iclude: text miig, social etworks aalysis ad applicatios of probabilistic topic models.

The Iteratioal Arab Joural of Iformatio Techology, Vol. 4, No., Jauary 207 26 AbuBakr Akramis MS (CS) studet i Departmet of Computer Sciece at COMSATS Istitute of IT, Attock. His curret research iterests iclude: Miig social forums data ad Recommeder systems.