Hidden Topic Sentiment Model

Size: px
Start display at page:

Download "Hidden Topic Sentiment Model"

Transcription

1 Hien Topic Sentiment Moel M Mustafizur Rahman, Hongning Wang Department of Computer Science University of Virginia Charlottesville VA, USA {mr4xb, hw5x}@virginia.eu ABSTRACT Various topic moels have been evelope for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from fining fine-graine epenency between topical aspects an sentiments. In this paper, we buil a Hien Topic Sentiment Moel (HTSM) to explicitly capture topic coherence an sentiment consistency in an opinionate text ocument to accurately extract latent aspects an corresponing sentiment polarities. In HTSM, 1) topic coherence is achieve by enforcing wors in the same sentence to share the same topic assignment an moeling topic transition between successive sentences; 2) sentiment consistency is impose by constraining topic transitions via tracking sentiment changes; an 3) both topic transition an sentiment transition are guie by a parameterize logistic function base on the linguistic signals irectly observable in a ocument. Extensive experiments on four categories of prouct reviews from both Amazon an NewEgg valiate the effectiveness of the propose moel. General Terms Algorithm, Experimentation Keywors Topic moeling, aspect etection, sentiment analysis 1. INTRODUCTION Topic moels have become an important builing block in sentiment analysis [18, 12, 11, 15, 31, 28]. It naturally ecomposes unstructure text content into topical aspects an sentiment polarities via generative moeling. The automatically ientifie topics an corresponing opinions provie a fine-graine unerstaning of opinionate text ata an enable a wie range of important applications, incluing public opinion tracking in social meia [18, 15, 12], automate recommenation in e-commerce [16], contrastive opinions summarization in political science [6], an many more. One funamental assumption in topic moels is exchangeability, i.e., topics are infinitely exchangeable within a given ocument while the joint probability is invariant [3]. As a result, a common practice is to moel a ocument as a mixture over a set of latent Copyright is hel by the International Worl Wie Web Conference Committee (IW3C2). IW3C2 reserves the right to provie a hyperlink to the author s site if the Material is use in electronic meia. WWW 2016, April 11 15, 2016, Montréal, Québec, Canaa. ACM /16/04. topics; an given topic mixing proportion, the topic assignments over wors in a ocument are consiere as inepenent from each other. This overly simplifie assumption fails to capture rich structures embee in a text ocument: in reality, natural language text rarely consists of isolate, unrelate sentences, but rather collocate, structure an coherent groups of sentences [10]. The existence of sentiment in an opinionate text ocument further increases the complication of topic an sentiment mixture. For example, most topic moels for sentiment analysis assume the selection of topics are inepenent given sentiment labels over wors [18, 15, 12]. However, it is very unlikely for a user to express contraictory sentiment, i.e., both positive an negative, about the same topical aspect in a ocument; an thus when sentiment switches, the topic shoul also change. Enhance inepenence assumption is expecte to yiel better moels in terms of latent aspect ientification an sentiment classification. Figure 1 illustrates this interepenency between topic assignments an sentiment polarities in a typical prouct review, which motivates our research in this paper. By: Kinle Customer Date:June 25, 2014 I own an ultrabook an I like it for a number of specific tasks. I especially like its portability (3 pouns with a small footprint) {portability,+} an the spee of its soli state rive {har rive,+}. When it comes to looks you have to give it to the Inspiron {appearance,+}. It efinitely has the sleek look of an ultrabook {appearance,+}. The combination of brushe aluminum with black trim, keys an bezel make for a very classy, corporate presence {appearance,+}. The fit an finish are first rate {appearance,+}. However, the soun sucks {soun,-}. I have owne 10 notebook an laptop computers over the past two ecaes an this Inspiron has the worst soun of any before it {soun,-}. It is weak, tinny an what low en it has is muy an inistinct {soun,-}. While we ve all come to expect pretty lousy soun from notebooks, this is subpar even consiering those low stanars {soun,-}. Figure 1: A review of a laptop from Amazon 1. Topical aspects an sentiment polarities are manually labele in superscripts with ifferent colors on each sentence. Three important observations can be foun in the sample review ocument annotate in Figure 1. First, topic assignments over wors in a ocument are not a simple mixture; instea, wors in close proximity ten to share the same topic, i.e., topic coherence. Secon, sentiment polarities expresse towar the same topical aspect ten to be consistent, i.e., sentiment consistency. We shoul note that this observation is not contraicting with the fact that a user might have mixe jugements about an item within a review 1 RQ4YYC5BXD

2 ocument, e.g., appreciate appearance but islike soun quality of the ultrabook in our motivating example. Sentiment consistency suggests that a user tens to give the same opinion about a particular topical aspect, rather than expressing contraictory assessments over it. This as another imension of regularity of topic assignments over wors in an opinionate text ocument: when sentiment switches, the topic assignment shoul also switch. Last but not least, there are clear linguistic cues inicating the transition of sentiment an topics between successive sentences. For example, conjunctions like however an nonetheless imply a switch of sentiment in the current sentence, while an increase overlap of content wors suggest unaltere topic an sentiment assignment between two ajacent sentences. Some solutions have been evelope to realize topic coherence, i.e., assign wors in a sentence to the same topic [12] an moel topic transition among successive sentences [8, 29]. Linguistic cues, e.g., POS tagging [31] an metaata [19], have been also exploite to guie topic generation. But exchangeability assumption is still being mae when moeling the compoun of topic an sentiment in a ocument [18, 15, 12]: topics are moele as simple mixtures uner sentiment labels. It reners erroneous posterior inference results that assign opposite sentiment labels to the same topical aspects in a ocument. This inevitably leas to suboptimal performance in ownstream sentiment analysis tasks. In this work, we propose to explicitly moel topic coherence an sentiment consistency in an opinionate text ocument so that we can accurately extract latent aspects an corresponing sentiment polarities. Specifically, we introuce hien Markov moel into topic moeling an name our solution as Hien Topic Sentiment Moel (HTSM). In HTSM, topics are moele as a compoun of latent aspects an sentiment polarities. Topic coherence is achieve by enforcing wors in the same sentence to share the same topic assignment an moeling topic transition between successive sentences [8]. Sentiment consistency is impose by constraining topic transitions via tracking sentiment changes once sentiment assignment changes, a new topic has to be sample for the current sentence. Both topic transition an sentiment transition are guie by a parameterize logistic function base on the linguistic signals irectly observable in a ocument, e.g., cosine similarity an POS tag overlapping between ajacent sentences. A customize forwarbackwar algorithm is evelope to perform efficient posterior inference for HTSM. The moel configuration, incluing both wor istribution uner topics an topic/sentiment transitions, is learne in a fully unsupervise manner via expectation maximization. The formalization of HTSM is so flexible that partially annotate ocuments, e.g., user-provie pros an cons, can be easily incorporate for more accurate moel estimation. Extensive experimentations are performe on four categories of prouct reviews crawle from both Amazon an NewEgg to valiate the effectiveness of the propose moel. A set of state-of-theart topic moels for sentiment analysis are employe as baselines to compare the quality of learne topics, accuracy of sentiment classification, an utility of aspect-base contrastive summarization from our HTSM moel. As a summary, our contributions in this paper are as follows, We evelop a unifie topic moel to explicitly capture topic coherence an sentiment consistency in opinionate text ocuments. It provies more accurate extraction of latent topics an sentiment polarities. Our flexible moeling assumption enables both unsupervise an semi-supervise estimation of moel parameters. We performe extensive experiment comparisons on ifferent ata sets uner various application scenarios. Promising performance confirms the value of moeling epenence between sentiment an topic in sentiment analysis. 2. RELATED WORK The wie coverage of topics an abunance of opinions in social meia make it a gol mine for iscovering public opinions on all sorts of topics [22]. Significant research effort has been pai on builing statistic topic moels to mine user-generate opinion ata. Accoring to the notion propose in Mimno an McCallum s work [19], we can categorize most of existing topic moels for sentiment analysis as upstream moels an ownstream moels. Upstream moels assume that in orer to generate a wor in a ocument, one nees to first ecie the sentiment polarity of this wor an then sample the topic assignment for this wor accoringly. In contrast, ownstream moels assume the sentiment label is etermine by the topic assignment in parallel to the text content. Our propose solution falls into the category of upstream moels. One typical upstream moel is the Topic-Sentiment Moel (TSM) propose in [18]. TSM is constructe base on the plsa moel [9]: in aition to assuming a corpus consists of a set of latent topics with neutral sentiment, TSM introuces two aitional sentiment moels, one for positive an one for negative sentiment. A new concept calle theme is introuce in TSM for ocument moeling, an a theme is moele as a compoun of these three components: neutral topic wors, positive wors an negative wors, in each ocument. However, this kin of ivision cannot capture the interrelation between topic an sentiment, given a ocument is still moele as an unorere bag of wors; an TSM also suffers from the same problems as in plsa, e.g., overfitting an can harly generalize to unseen ocuments. Several follow-up work tries to aress the limitations of TSM from ifferent perspectives. Base on the LDA moel [3], Lin an He propose a joint sentiment/topic moel (JST) for sentiment analysis [15]. In JST, the combination of topic an sentiment is moele as a Cartesian prouct between a set of topic moels an sentiment moels. Accoringly, each ocument exhibits istinct topic mixtures uner ifferent sentiment categories in JST. To improve topic coherence, Jo an Oh extene JST by enforcing wors in a single sentence to share the same topic an sentiment label in their Aspect an Sentiment Unification Moel (ASUM) [12]. Zhao et al. introuce the Maximum Entropy LDA moel (MaxEnt- LDA) to control the sampling of wors from a backgroun topic, aspect-specific topics an opinion-specific topics in [31]. Both JST an ASUM strongly epen on sentiment see wors to ifferentiate ifferent sentiment categories. MaxEnt-LDA epens on a set of manually labele training sentences with backgroun, aspect an opinion wors to estimate the maximum entropy moel beforehan. Moreover, the simple sentiment-topic mixture assumption prevents all the aforementione moels to recognize sentiment consistency, i.e., sampling the same aspect assignment uner ifferent sentiment categories in a ocument. Downstream moels reverse the generation assumption between sentiment labels an topic assignments, an provie some flexibility in moeling sentiment, e.g., continuous opinion ratings can also be moele [17, 28, 25]. However, ownstream moels usually assume the sentiment labels are observable, an it thus limits their applications in sentiment analysis. Another line of relate work is introucing Markov moel into topic moeling. Aspect-HMM moel [2] combines plsa with a hien Markov moel [23] to perform ocument segmentation over text streams. However, Aspect-HMM separately estimates topics in training set an epens on heuristics to infer the transitional relations between topics. HMM-LDA [7] istinguishes short-range syntactic epenencies from long-range semantic epenencies a- 156

3 mong the wors in each ocument. But in HMM-LDA, only the latent variables for the syntactic classes are treate as a locally epene sequence, while latent topics are treate the same as in other topic moels. Hien Topic Markov Moel (HTMM) [8] is the most similar moel to ours. HTMM captures topic coherence by assuming wors in one sentence share the same topic assignment an moeling topic transitions between successive sentences. However, HTMM loosely moels the transition between topics as a binary relation: the same as the previous sentence s assignment or raw a new one with a certain probability. It ignores sentiment consistence in a ocument: when sentiment switches, the topic assignments shoul also switch. Our HTSM constrains topic transition via tracking sentiment changes; an linguistic cues irectly observable from ajacent sentences are leverage to guie topic an sentiment transitions. 3. METHODOLOGY In this section, we escribe the propose Hien Topic Sentiment Moel an iscuss how it captures topic coherence an sentiment consistence simultaneously within an opinionate text ocument. Efficient posterior inference is performe via a customize forwar-backwar algorithm, an Expectation-Maximization algorithm is utilize to estimate the moel parameters in both unsupervise an semi-supervise settings. 3.1 Definition of Terminologies We first specify the notations an efinitions of aspect, sentiment an topic use in this paper. Denote a set of review text ocuments about a particular type of entities, e.g., prouct reviews, as D = { 1, 2,..., D }, where each ocument i consists of m sentences. We assume there is a share set of aspects that attract reviewers interest; an they can be efine as follows: Definition (Aspect) An aspect of a particular entity is characterize by a set of wors, which present a semantically coherent theme of iscussion. An aspect can be inexe by a iscrete ranom variable taking value from A = {a 1,a 2,...,a A }. For example, wors such as price, value, an worth escribe the price aspect of a prouct. Besie escribing the aspects, users also express their personal attitues towar those aspects in their review ocuments, e.g., favor price aspect or criticize customer service aspect in prouct reviews. The expresse attitue is efine as sentiment. Definition (Sentiment) Sentiment represents a user s emotional feelings about a particular entity. It can be enote by a iscrete ranom variable taking value from S = {s 1,s 2,...,s S }, e.g., positive or negative. In text ocuments, sentiment can be etermine from content wors. For example, love an wonerful inicate positive sentiment, an terrible an regret inicate negative sentiment. In this paper, topic is efine as a compoun of latent aspect an sentiment polarity. For example, in tablet reviews, potential topics coul inclue positive aspect about battery life an negative aspect about customer service. Formally, topic is efine as follows: Definition (Topic) Topic is a compoun of latent aspect an sentiment polarity in a given ocument collection. It can be represente as a iscrete istribution over wors in a fixe vocabulary. Wors with high probabilities uner a topic epict the corresponing aspect an sentiment. Base on the above efinitions, we strive to evelop a probabilistic generative moel to automatically ientify topics, i.e., aspects an sentiment, from a collection of opinionate text ocuments. The moel takes an unstructure text ocument as input an returns ecompose latent aspects an sentiment polarities as output. In the following sections, we will iscuss the etaile moel assumptions an specifications. 3.2 Hien Topic Sentiment Moel From linguistic analysis perspective, ocument exhibits internal structure, where structural segments encapsulate semantic units that are closely relate [13]. As a result, in the propose Hien Topic Sentiment Moel (HTSM), we treat sentence as the basic structure unit an assume all the wors in a sentence share the same topic (as illustrate in our motivating example in Figure 1). Base on this, HTSM rops the simple mixture assumption employe in conventional topic moels [3, 9], an explicitly moels topic transition in successive sentences via a first-orer hien Markov moel. Because in HTSM a topic is moele as a compoun of latent aspect an sentiment polarity, two factors control the transition of topics. First, once the sentiment labels switch between two consecutive sentences, a topic has to be generate for the subsequent sentence with a ifferent aspect assignment. This enforces sentiment consistency. Secon, when the sentiment labels stay intact, two ajacent sentences are assume to be highly relate: the subsequent sentence will inherent the topic assignment from the previous sentence, or select a istinct one from a ocument-specific topic mixture with certain probability. This imposes topic coherence. Formally, we assume there are K topics embee in a given collection of review ocuments. A topic inexe by z k has two components: a k inicates aspect label an s k inicates sentiment label, i.e., z k = (a k,s k ). Topic z k is specifie as a multinomial istribution over a fixe vocabularyv,i.e.,{p(w β k )} w V, where β k is the corresponing moel parameter. To avoi overfitting, we impose a share Dirichlet prior over β k, i.e., β k Dir(η). In this paper, to simplify our iscussion, we only consier binary sentiment polarities in HTSM, i.e., s k = {0,1}. But HTSM is flexible enough to moel multi-class sentiment polarities, e.g., five-star rating scales [21]. In a given ocument, the ocument-level topic proportionθ is assume to be rawn from a share Dirichlet istribution [3], i.e., θ Dir(α). Among m sentences in, each sentence t i has N i wors an is associate with a topicz i, which is sequentially rawn from a ocument-specific Markov chain. Because the aspect label an sentiment polarity of sentences are unobservable, we introuce two latent variables τ an ψ on each sentence to control the sampling of topics with respect to the topic coherence an sentiment consistency requirements. Specifically, τ i an ψ i are binary ranom variables inicating whether there is a sentiment switch an an aspect change on sentence t i accoringly. Their combination etermines topic transition: 1) when τ i = 0 an ψ i = 0, t i will inherent previous sentence s topic assignment; 2) whenτ i = 0 an ψ i = 1, a new topic z i will be rawn from θ, with the constraint that s i = s i 1 an a i a i 1; 3) when τ i = 1 an ψ i = 1, a new topic z i will be sample from θ with the constraint that s i s i 1 an a i a i 1. The combination of τ i = 1 an ψ i = 0 is not allowe in HTSM, because the sentiment consistency constraint enforces aspect change when sentiment is switche. To capitalize on the linguistic features irectly observable in ocument content, e.g., overlappe sentence content inicates intact topic assignments, we use parameterize logistic functions to efine the generation probability ofτ anψ in each sentence. Aspect transition feature function f a(,i) takes ocument an sentence t i as input, an outputs an l-imensional feature vector escribing aspect change. Accoringly,f s(,i) generates ap-imensional 157

4 w w w Figure 2: Graphical moel representation of Hien Topic Sentiment Moel. Dark an light circles represent observable an latent ranom variables, an plates enote repetitions. Soli arrows encoe epenency relation an ashe arrows enote the generation of transition features. feature vector escribing sentiment switch. Hence, we efine, 1 p(τ i = 1,σ) = 1+exp ( σ T f ) s(,i) (1) 1 p(ψ i = 1,ǫ) = 1+exp ( ǫ T f ) a(,i) (2) where σ an ǫ are the corresponing feature weights for moeling sentiment switch an aspect change. The etaile specifications of f a(,i) an f s(,i) an the feature weight estimation proceures will be iscusse in Section 3.4. Putting above assumptions together, the generative process of a ocument postulate in HTSM can be escribe as follows, 1. For every topicz, raw β z Dir(η). 2. For each review ocument D, (a) Draw topic proportionθ Dir(α). (b) For each sentencet i,i = 1,2,...,m, i. Sample τ i p(τ i,σ); setτ i = 1 when i = 1. ii. Sample ψ i p(ψ i,ǫ); set ψ i = 1 when τ i = 1. iii. Sample z i by, z i 1 if τ i=0,ψ i=0 z i= z Mul(θ ), s.t. a a i 1,s=s i 1 if τ i=0,ψ i=1 z Mul(θ ), s.t. a a i 1,s s i 1 if τ i=1,ψ i=1 iv. Sample each worw n int i,w n Mul(β zi ). To make the above generation process consistent at every sentence in a ocument, we efinea 0 = an s 0 =, such that there is no constraint when sampling new topics for the first sentence in a ocument. Using the language of graphical moels, this generation process can be visualize in Figure 2. Conitione on the moel parameters(α,β,ǫ,σ), the joint probability of sentences an latent topics in ocumentis thus given by, p(z,θ,ψ,τ,w 1,...,w Ni α,β,ǫ,σ) (3) m =p(θ α) i=1 p(τ i,ǫ)p(ψ i,σ)p(z i z i 1,τ i,ψ i,θ) N i n=1 p(w n β zi ) The above joint istribution ifferentiates HTSM from conventional topic moels for sentiment analysis, which are built on the simple topic mixture assumptions. Due to the sequential generation of topic assignments in sentences from a Markov chain, HTSM is no longer invariant to permutation of wors nor sentences in a ocument. Documents in which successive sentences share coherent topics are more likely than any ranom shuffling of the same sentences. This leas to linearly coherent topic inference in a ocument: successive sentences ten to share similar topics, rather than fluctuate assignments. More importantly, sentiment consistency is especially emphasize in HTSM: in every sentence of a ocument, one nees to first etermine if he/she wants to keep the sentiment polarity from previous sentence; if not, a new topic with ifferent aspect label an sentiment polarity nees to be sample. This avois assigning contraictory sentiment polarities to the same aspect in a ocument. To the best of our knowlege, no existing topic moels coul achieve such regularity over topic assignments. 3.3 Posterior Inference The latent variables of interest in HTSM are sentence-level topic assignments z an ocument-level topic proportion θ. The aspect switch inicators ψ an sentiment switch inicators τ can be easily ecoe from the topic assignment sequence z. However, ue to the coupling between continuous ranom variable θ an iscrete ranom variables z, exact inference in HTSM is computationally infeasible. In this paper, we evelop a coorinate ascent base solution to perform approximate posterior inference. In a given ocument, θ can be first ranomly initialize from its prior istribution Dir(α). With known θ, exact inference for (z, ψ, τ) can be efficiently performe via the forwar-backwar algorithm [23]. Because of the special esign in our Markov chain, customization of the generic forwar-backwar algorithm can be mae to greatly reuce its computational complexity in HTSM. In particular, we treat the combination of(z i,ψ i,τ i) at sentence t i as latent states in our Markov chain for ocument, an erive the corresponing transition function as, p(z i,ψ i,τ i z i 1,θ,,ǫ,σ) = p(z i z i 1,θ,ψ i,τ i) (4) p(ψ i,ǫ)p(τ i,σ) in whichp(ψ i,ǫ) anp(τ i,σ) can be pre-compute beforehan since they are invariant uring inference. An the first term of righthan sie in Eq (4) has a simple linear structure, p(z i z i 1,θ,ψ i,τ i) (5) 1 ifτ i=0,ψ i=0,z i=z i 1 θ zi, s.t. a i a i 1,s i=s i 1 ifτ i=0,ψ i=1 θ zi, s.t. a i a i 1,s i s i 1 ifτ i=1,ψ i=1 0 otherwise This enables us to maintain a blockwise transition matrix in an reuce the quaratic computational complexity in stanar forwar an backwar computations to linear in HTSM. After one roun of forwar-backwar computation, posterior of θ can be compute by the expecte frequency of wors assigne to a topic that is rawn from the ocument-specific topic proportion, rather than inherite from a previous sentence. More specifically, m N i θ,z p(z i = z,ψ i = 1 )+α z 1 (6) i=1 n=1 The inference of θ an (z,ψ,τ) can be alternatively performe in a given ocument. An it can be prove that this coorinate ascent metho will converge to a local maximum of ata likelihoo function in, because the forwar-backwar algorithm gives us exact posterior of(z,ψ,τ) (refer to the EM algorithm proof [5]). 3.4 Parameter Estimation Motivate by the insights gaine from the annotate example shown in Figure 1, in HTSM we leverage content features irect- 158

5 ly observable in ocuments to efine the probabilities of aspect change an sentiment switch. In orer to ifferentiate the aspectriven transitions from sentiment-rive transitions, two sets of transition features are constructe. The aspect transition features f a(,i) inclue: 1) content-base cosine similarity between t i an t i 1; 2) sentence length ratio betweent i ant i 1; 3) relative position oft i in, i.e.,i/m; an 4) an inicator function about whethert i is more similar tot i 1 ort i+1. The sentiment transition featuresf s(,i) inclue: 1) content-base cosine similarity betweent i ant i 1; 2) sentiwornet [1] score - ifference between t i an t i 1; 3) sentiment wor count ifference between t i an t i 1; 4) Jaccar coefficient between POS tags in t i an t i 1, an 5) aversative conjunction count in t i. We also a bias terms in f a(,i) an f s(,i) to capture unconitione aspect an sentiment transitions in ocuments. Detaile escriptions of these transition features can be foun in Table 3. The feature weights ǫ an σ in the transition functions efine in Eq (1) an Eq (2) can be efficiently estimate together with the other moel parameters in HTSM by EM algorithm. In this work, we treat α an η as hyper-parameters of the moel an manually tune their settings, given they have consierably less influence in moel fitting [24] comparing to the other parameters, i.e.,(β, ǫ, σ). We shoul note optimizing α an η with respect to ata likelihoo [3] is also feasible in HTSM. The EM algorithm executes iteratively between E-step (for posterior inference) an M-step (for expectation maximization). In E- step at iteration T, the approximate inference proceures evelope in Section 3.3 is execute in each ocument with the current moel parameter (β T,ǫ T,σ T ). The following sufficient statistics are collecte in ocuments after inference, m N i E[c(z,w,)] = δ(w n = w)p(z i = z ) (7) i=1 n=1 E[ψ i] = p(ψ i = 1 ), s.t. i > 1 (8) E[τ i] = p(τ i = 1 ), s.t. i > 1 (9) In M-step, maximum likelihoo estimator is use to compute (β T+1,ǫ T+1,σ T+1 ) as follows, D βz,w T+1 E[c(z,w,)]+η w 1 (10) ǫ T+1 = argmax ǫ σ T+1 = argmax σ D m E[ψ i]logp(ψ i = 1,ǫ) (11) i=1 D m E[τ i]logp(τ i = 1,σ) (12) i=1 where the optimization of ǫ an σ can be effectively solve via a graient-base optimizer. The E-step an M-step will be alternatively execute until the ata likelihoo function on the whole collectiond converges. In some review ata sets, external signals about sentiment polarities are irectly available. For example, some reviewers will explicitly organize their reviews in pros an cons sections 1 ; an in NewEgg ( reviewers are require to o so. Such signals can be easily incorporate in HTSM to refine moel estimation. In the ocuments with ientifie pros/cons sections, sentences in pros section will be consiere as having sentiment label s = 1, an sentences in cons section will have s = 0. During posterior inference, the sentiment switch inicator 1 R12HYQYZX5TNT9 τ can be irectly compute from the sentiment labels in such ocuments, while all the rest inference steps stay the same. Hence, moel parameter estimation in M-Step will be affecte by such irect observations. As a result, HTSM effectively exploits such sie information in ocument content an estimate the moel parameters in a semi-supervise manner. In our quantitative evaluation, such semi-supervise moel training greatly improves HTSM s sentiment classification performance. 4. EXPERIMENT In this section, we perform experiment evaluations of the propose HTSM moel from both quantitative an qualitative perspectives. We compare HTSM with several state-of-the-art topic moels for sentiment analysis on four ifferent collections of prouct reviews from both Amazon an NewEgg. 4.1 Data Sets & Preprocessing We have collecte four categories of prouct reviews, i.e., i) camera, ii) tablet, iii) tv an iv) phone, from Amazon (http: // an NewEgg ( com). The reviews from NewEgg are segmente into pros an cons sections by their original authors, since this is require by the website. The complete ata set can be foun at virginia.eu/~hw5x/ataset.html. Stanar pre-processing is performe before the subsequent experiments. Firstly, punctuation, numbers an other non-alphabet characters are remove. Stopwors are also remove base on a stanar stopwor list [14]. Seconly, all the wors are converte to the lower cases an stemming is performe on the remaining wors in a ocument using the Porter s stemmer [30]. Finally, all the reviews which have less than five wors are remove. Besies, since we are moeling topic transition between successive sentences, those reviews containing less than two sentences are also remove. Table 1 summarizes the resulting review ata sets. Table 1: Statistics of evaluation ata sets. Data set Amazon NewEgg Vocabulary Positive size ratio camera tv tablet phone For comparison purposes, we inclue Latent Dirichlet Allocation (LDA) [3], Hien Topic Markov moels (HTMM) [8], Aspect an Sentiment Unification moel (ASUM) [12], an Joint Sentiment/Topic moel (JST) [15] as baselines. Among these baseline moels, ASUM an JST are specialize for sentiment analysis, an HTMM an ASUM explicitly moel sentences in a ocument. As unsupervise topic moels, both ASUM an JST require sentiment see wors as input. Following the settings in their original paper, two sets of sentiment see wors are use in our experiments. The first one is from Turney s PARADIGM [26] contains seven positive wors an seven negative wors, an the secon one is PARADIGM+ which contains all Turney s paraigm wors plus other sentiment wors. To conuct a fair comparison, we also inclue those sentiment see wors in our HTSM moel, i.e., aing positive see wors to topics with sentiment label s = 1, an negative wors to topics with sentiment label s = 0 as priors. We shoul note that unless otherwise specifie, we have use 26 topics for camera an phone, 30 topics for tablet an 16 topics for tv for all the moels. In aition, we fixe the hyper-parameters α an η in Dirichlet priors to 1.01 an for all the topic moels. 159

6 Tv Dataset LDA HTMM ASUM JST HTSM Perplexity Camera Dataset Tablet Dataset 3000 Phone Dataset Perplexity Figure 3: Perplexity with increasing training size on four ifferent review ocument sets. 4.2 Topic moeling evaluation We first compare the quality of learne topics from all the topic moels. Perplexity an wor intrusion experiments are performe to quantitatively evaluate this aspect, an we also emonstrate the learne topical transition iagram from HTSM Perplexity comparisons Perplexity, use by convention in language moeling, is monotonically ecreasing with respect to the likelihoo of test ata, an is algebraically equivalent to the inverse of the geometric mean of per-wor likelihoo. A lower perplexity inicates better generalization performance. More specifically, the perplexity of test ocument setd test can be compute as: { } M =1 perplexity(d test) = exp logp(w ) M =1 N (13) wherem is the total number of ocuments in test corpus ann is the total number of wor in a test ocumentd test. We traine all the topic moels (HTSM, HTMM, LDA, JST an ASUM) on the escribe corpora to compare their generalization performance in moeling text ocuments on a hel-out test set via the perplexity measurement. Since our goal is to evaluate the ensity estimation quality, all ocuments in the corpora are treate as unlabelle (e.g., ignore the pros/cons segmentation in NewEgg reviews). The etaile experiment setup for perplexity comparison is as follows: we start with a training set containing only the reviews from NewEgg, refer this training set as the origin in plots of Figure 3, an graually a more training reviews from Amazon (training size 1000, 2000 etc.). This experiment setting is to make the results aligne with the later sentiment classification experiments. Figure 3 emonstrates the average perplexity from five-fol cross valiation (test sets are selecte from both Amazon an NewEgg reviews accoringly). It is clear from Figure 3 that HTSM outperforme all the other topic moels on all four atasets, except HTMM. There are two possible explanations. First, HTMM moels topic transitions loosely as a Bernoulli istribution: the same as the previous sentence s assignment or rawing a new topic with certain probability. But HTSM moels this topical transition with a more complicate logistic function. Overfitting might be cause by this parametric moel. Secon, HTMM oes not consier sentiment in a ocument, i.e., less constraints in moeling a ocument. But in HTSM, once the sentiment label switches, a ifferent topic has to be sample for the subsequent sentence. As a result, HTMM has more freeom to allocate wors uner one topic, which results in a lower perplexity in moeling unseen ocuments. We shoul note that the perplexity metric only measures the quality of estimate wor istribution in unseen ocuments. It cannot assess the sentiment preiction quality, which HTMM is unable to achieve. In later experiments we foun that the increase complexity in HTSM benefits sentiment classification greatly. Finally, we can fin that the simple sentiment-topic mixture assumptions in both JST an ASUM fail to capture the topic-wor istribution in the test set an lea to much worse perplexity than HTSM. It is also important to investigate how a topic moel s generalization capability varies uner ifferent number of topics. In particular, we test ifferent moels perplexity at the last testing point in Figure 3, i.e., five-fol cross valiation on 5000 Amazon reviews with NewEgg reviews for training. Due to space limit, we only emonstrate the perplexity result from our HTSM moel on all four categories of reviews in Figure 4. The baseline moels exhibit similar patterns. From the results, it is clearly to observe that within a reasonable range of topic size, the perplexity of HTSM increases moerately. When we have more than 40 topics, the perplexity increases ramatically on all ata sets, i.e., an inication of overfitting. The results justifie our setting of the number of topics in HTSM an all baseline topic moels, an we fix this setting in all our following experiments Wor intrusion comparisons Perplexity only measures the quality topic moeling from ensity estimation perspective; it is also necessary to evaluate whether 160

7 Perplexity Camera Tablet Phone Tv Number of Topics Figure 4: Perplexity of HTSM uner ifferent number of topics across all four categories of reviews. the topics ientifie by those statistical moels are human interpretable. More specifically, we prefer a moel that generates more semantically coherent an meaningful topics. In this experiment, we employ wor intrusion iscusse in [4] to evaluate four ifferent topic moels, namely LDA, HTMM, A- SUM an HTSM (because ASUM an JST are quite similar in moel assumptions, we o not inclue JST in this experiment). During the first phase of evaluation, our experiment setup is as follows: we first selecte the top five wors from each topic z k uner every moel as topical wors. Then we select two intruing wors. The first intruing wor is referre to as intra-topic intrusion wor, which has a very low probability in topic z k of corresponing moels. The secon intruing wor is referre to as inter-topic intrusion wor, which is selecte from a ifferent topic z l z k an has a high probability in topic z l but a very low probability in topic z k. To select a wor which is consiere as having a very low generation probability, we rank all the wors uner a topic in a escening orer with respect to p(w z) an then ranomly select a wor with rank between 90 to 100 (given our vocabulary size on all collections is aroun 1400). Hence, in total we have seven wors for each topicz k from every topic moel: among those, five are regular wors, one is intra-topic intruing wor an the last one is inter-topic intrusion wor. In the secon phase of this evaluation, we ranomly shuffle the topical wors with the intruing wors uner each topic from every moel an present the shuffle wors to three annotators. The annotators o not have any knowlege about which topics or wors have been generate by which moel, an they are only informe of the category of the prouct. The task of the annotators is to i- entify at least one an at most two intruing wors uner each topic presente to them. In orer to reuce annotation bias, we evenly separate the learne topics from each moel into two parts, an present them to ifferent annotators. We ensure that each topic is annotate by three ifferent annotators. Since we have four ifferent categories an four ifferent topic moels, for this task we take feeback from twenty four annotators. The agreement a- mong annotators was calculate by pairwise Kappa statistics [27] an then these kappa values were average across all pairs of annotators. For example, on tablet ata set, the average kappa value for original topical wors is 0.885, which inicates annotators agree with each other most of time. However, for the intra-topic intrusion wors an inter-topic intrusion wors the average kappa values are an respectively, which imply that annotators might have ifferent ways of interpreting the inferre topics. To quantitatively measure the quality of inferre topics from all these four moels, we efine a metric name moel wor-intrusion recall (MR) as follows: MR m = K S s=1 1(im z k,s,wz m k ) k=1 K S (14) Table 2: Wor intrusion measurement across ifferent topic moels of four categories of prouct reviews. Inter-topic MR Category LDA HTMM ASUM HTSM camera tablet phone tv Intra-topic MR Category LDA HTMM ASUM HTSM camera tablet phone tv where wz m k is the vocabulary inex of the intruing wor among the wors generate from the zk th topic inferre by topic moelm, i m z k,s is the corresponing inex of the intruing wor selecte by annotator s. S enotes the number of annotators, an K enotes the total number of topics. From Table 2, it is evient that annotators can interpret the topics inferre by HTSM more effectively than those from the other moels in terms of inter-topic intrusion wor. For example, out of 90 actual inter-topic intrusion wors in tablet category, 35 wors have been picke out by annotators from HTSM s topics. This empirical evience implies that our HTSM moel is inferring more human interpretable topics than other topic moels. However, in terms of intra-topic intrusion, the performance of HTSM is not as competitive as other moels. The proceure of selecting low probability intra-topic intrusion wor an the concentration of the learne wor istribution uner topics from HTSM might be contributing factors to the relative inferior performance of HTSM Topic transitions Given HTSM explicitly moels topic transitions in an opinionate review ocument, we visualize the learne transition using a transitional iagram to qualitatively emonstrate the topical coherence obtaine by HTSM. Due to space limit, we only report the results extracte from tablet ata set. First, we train an HTSM with 30 topics on all the reviews from tablet category. To automatically ifferentiate omain-specific sentiment polarity, we train HTSM in a semi-supervise moe: the pros/cons sections in NewEgg reviews will be use to specify sentiment labels on sentences; while Amazon reviews will be use in fully unsupervise training. Then, for each sentence t i in a review ocument from training set, we infer its most probable topic z k from HTSM via the Viterbi algorithm. As a result, for two consecutive sentencest i 1 ant i, we have the corresponing pairwise topic transitionz j z k. We accumulate the transition count base on all the consecutive sentences in the training corpus, an normalize the resulting transition matrix to construct the iagram. Figure 5 illustrates the learne topic transition iagram in the tablet category. It is to be note that in orer to get a more perceivable view, we have ignore the transitions with probabilities less than 0.01 an also remove less popular topics in it. In this figure, each topic is enote as a pair of Aspect_Sentiment. For example, screen_p represents positive sentiment about the screen aspect. In this transition iagram, there is also a special noe name start, which is use to represent a ummy topic, which generates the initial topic for the first sentence in every ocument. Besies, we also highlighte the top six wors uner some selecte topics (the selection of annotate topics is purely base on space constrain- 161

8 screen h rea isplay color resolut buy money price bought purchas wast batteri life hour charg time long charg batteri rain power time ie prouct amazon ship return box ay camera pictur goo great front vieo app anroi ownloa problem play slow Figure 5: Estimate topic transition an top wors uner selecte topic on tablet ata set. t). From Figure 5, we can clearly ientify some interesting topical transitions in tablet reviews. For example, when reviewers hol positive feeling about their purchase tablets, they usually start with positive sentiment about price, which is followe by positive sentiment about battery life, service an so on. However, if a reviewer plans to criticize a tablet, he or she usually starts with negative sentiment about price an then transits to negative sentiment about battery life, screen, app, an etc. This learne transition is of particular importance in opinion summarization: it helps organize the generate sentences in a coherent orer. 4.3 Sentiment classification In this section, we evaluate HTSM in terms of sentiment classification. We use the alreay segmente NewEgg reviews as grountruth sentence-level sentiment annotations: we treat all sentences in the pros section as positive an all sentences in the cons section as negative. We shoul note such annotations are ifferent from the overall ratings of reviews. The overall ratings are of low resolution in sentiment annotation: a review with high overall rating might still contain some negative sentences, an vice versa. In contrast, the self-annotate pros/cons sections are with finer-granularity in sentiment annotations. Therefore, in this experiment we i not use the overall ratings in moel training an testing. During the training phase of HTSM, we use a mixture of review ata sets obtaine from NewEgg an Amazon. Besies, since we have sentiment labels on sentences from the NewEgg ata set, the sentiment transition inicator τ can be irectly inferre. Hence we train our HTSM moel in a semi-supervise manner. Specifically, uring the training phase of HTSM, if the input ocument is from NewEgg, τ is fixe base on the sentiment labels on sentences; otherwise, HTSM has to infer τ accoring to Eq (1). To make a fare comparison across all the moels, ASUM an JST were also moifie to utilize the annotate pros/cons sections in NewEgg ata set uring the training phase. In aition, we also inclue EM- NaiveBayes [20], a semi-supervise algorithm, as a baseline in this experiment. It exploits the sentiment annotation in NewEgg ata uring the training phase. We use only NewEgg ata set to construct the test set, since we o not have such fine-graine annotations in Amazon ata set (so we refer Amazon ata as unlabelle ata). Besies, we start our training set containing only the reviews from NewEgg (training size 0 in Figure 6) an then keep aing more an more unlabelle ata from Amazon (training size 1000, 2000 etc.) into the training set, i.e., the exact setting that we use in perplexity evaluation in Section We report the average F-1 score from five-fol cross-valiation as the performance metric in this experiment. Figure 6 illustrates the sentiment classification performance of HTSM over all the four categories against ASUM, JST an EM- NaiveBayes baselines. We can clearly notice that with the same amount of training ata, HTSM outperforme all the other moels, which treat sentences as inepenent in a ocument. Sentiment consistency enforce by HTSM helps to capture the epenence between consecutive sentences better an therefore preicts their sentiment polarities more accurately. The only exception is in the tv category, where the performance of HTSM egenerate beyon training size 3000 an became worse than EM-NaiveBayes. This egenerate result is cause by the ivergent proucts reviewe in the Amazon an NewEgg ata sets. We manually checke the proucts in tv category from these two ata sets an foun there are less common proucts than other categories. As a result, aing more Amazon reviews increases the iscrepancy of the learne moel on testing set, which is only from NewEgg reviews. The improve classification performance of HTSM results from its unique capability in moeling sentiment consistency insie a review ocument, i.e., when sentiment switches, topic assignments have to change in successive sentences. The transitions are controlle by the parameterize logistic functions on the observable linguistic features escribe in Section 3.4. In Table 3, the learne feature weights for topic switch ǫ an sentiment switch σ on camera ata set are emonstrate (we have very similar results on the other three categories as well, but ue to space limit we cannot list them in the table). For example, the bias term controlling sentiment switch is more negative than that for topic transition. It implies that sentiment in two consecutive sentences are less likely to change than the topics. The learne weights for the content-base cosine 162

9 F1 measure F1 measure Camera Dataset 0.7 ASUM JST 0.65 EM-Naive Bayes HTSM Phone Dataset 0.6 Tablet Dataset Tv Dataset Figure 6: Sentiment classification performance with increasing training size on four ifferent review ocument sets. Table 3: Learne feature weights in HTSM for sentiment an topic transition on camera ata set. Sentiment transition feature Weight bias termf s(,i) content-base cosine similarity betweent i ant i sentiwornet [1] score ifference betweent i ant i sentiment wor count ifference int i ant i an inicator function about whethert i is more similar to t i 1 or t i jaccar coefficient between POS tags int i an t i negation wor count int i Topic transition feature Weight bias termf a(,i) content-base cosine similarity betweent i ant i length ratio of two consecutive sentencest i an t i relative position oft i in, i.e.,i/m an inicator function about whethert i is more similar to t i 1 or t i similarity are negative for both transitions. It follows our expectation that the more similar two consecutive sentences are, the less likely we will observe sentiment or topic switch. These kin of observations well support our ecision of using observable linguistic features to guie topic transition moeling an it ultimately helps HTSM to achieve improve topic coherence an sentiment consistency in moeling opinionate ocuments. To provie a thorough evaluation of sentiment classification, we also teste all the topic moels with varie number of topics. Following the same settings as in Figure 4, we reporte the F1 measure of HTSM uner all four categories of reviews. Due to space limit, we i not inclue the results from the baselines in Figure 7. Similar conclusion as that in perplexity evaluation can be reache: with a moerate number of topics in HTSM, its classification performance is satisfactory an stable; but with an increase number of topics, the classification results varie an even egenerate on some ata sets (e.g., tablet ata set). F1 measure Camera Tablet Phone Tv Number of Topics Figure 7: Sentiment classification performance of HTSM uner ifferent number of topics across all four categories of reviews. 4.4 Aspect-Base Contrastive Summarization In orer to evaluate the utility of the aspects an sentiments ientifie by our moel, we stuy aspect-base review summarization, which aims at fining the most representative sentences for each topic (a combination of aspect an sentiment) from a collection of reviews. In Table 4, we emonstrate a sample aspect-base contrastive summarization result for two comparable tablet proucts. We selecte Samsung Galaxy Note 10.1 an Amazon Kinle Fire HDX base on their popularity in Amazon tablet ata set. The practical value of this type of contrastive review summaries is to help customers easily igest vast amount of opinionate ata an make informe ecisions. Table 4 shows the sie-by-sie comparison on six aspects ( + inicates positive aspects an - inicates negative aspects) of these two tablets ientifie by HTSM. Imagine a user is making a choice between these two tablets. If the user cares battery aspect the most, he or she can easily fin out from the summary that Samsung Galaxy Note 10.1 is a better choice than Amazon Kinle Fire HDX by consulting this aspect-base contrastive review summarization. This saves the user consierable amount of time in reaing the etaile reviews. We perform user stuies to unerstan whether these kin of summaries are meaningful for the actual users. In this experiment, 163

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information SANTIAGO CANYON COLLEGE Reaing & English Placement Testing Information DO YOUR BEST on the Reaing & English Placement Test The Reaing & English placement test is esigne to assess stuents skills in reaing

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

SPECIAL ARTICLES Pharmacy Education in Vietnam

SPECIAL ARTICLES Pharmacy Education in Vietnam American Journal of Pharmaceutical Eucation 2013; 77 (6) Article 114. SPECIAL ARTICLES Pharmacy Eucation in Vietnam Thi-Ha Vo, MSc, a,b Pierrick Beouch, PharmD, PhD, b,c Thi-Hoai Nguyen, PhD, a Thi-Lien-Huong

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Sweden, The Baltic States and Poland November 2000

Sweden, The Baltic States and Poland November 2000 Folkbilning co-operation between Sween, The Baltic States an Polan 1990 2000 November 2000 TABLE OF CONTENTS FOREWORD...3 SUMMARY...4 I. CONCLUSIONS FROM THE COUNTRIES...6 I.1 Estonia...8 I.2 Latvia...12

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Comment-based Multi-View Clustering of Web 2.0 Items

Comment-based Multi-View Clustering of Web 2.0 Items Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University

More information

Mining Topic-level Opinion Influence in Microblog

Mining Topic-level Opinion Influence in Microblog Mining Topic-level Opinion Influence in Microblog Daifeng Li Dept. of Computer Science and Technology Tsinghua University ldf3824@yahoo.com.cn Jie Tang Dept. of Computer Science and Technology Tsinghua

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Learning to Rank with Selection Bias in Personal Search

Learning to Rank with Selection Bias in Personal Search Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Toward Probabilistic Natural Logic for Syllogistic Reasoning Toward Probabilistic Natural Logic for Syllogistic Reasoning Fangzhou Zhai, Jakub Szymanik and Ivan Titov Institute for Logic, Language and Computation, University of Amsterdam Abstract Natural language

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1 Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria FUZZY EXPERT SYSTEMS 16-18 18 February 2002 University of Damascus-Syria Dr. Kasim M. Al-Aubidy Computer Eng. Dept. Philadelphia University What is Expert Systems? ES are computer programs that emulate

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

AP Statistics Summer Assignment 17-18

AP Statistics Summer Assignment 17-18 AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes

Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes Zhaochun Ren z.ren@uva.nl Maarten de Rijke derijke@uva.nl University of Amsterdam, Amsterdam, The Netherlands ABSTRACT Given a topic

More information

As a high-quality international conference in the field

As a high-quality international conference in the field The New Automated IEEE INFOCOM Review Assignment System Baochun Li and Y. Thomas Hou Abstract In academic conferences, the structure of the review process has always been considered a critical aspect of

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes Stacks Teacher notes Activity description (Interactive not shown on this sheet.) Pupils start by exploring the patterns generated by moving counters between two stacks according to a fixed rule, doubling

More information

Experts Retrieval with Multiword-Enhanced Author Topic Model

Experts Retrieval with Multiword-Enhanced Author Topic Model NAACL 10 Workshop on Semantic Search Experts Retrieval with Multiword-Enhanced Author Topic Model Nikhil Johri Dan Roth Yuancheng Tu Dept. of Computer Science Dept. of Linguistics University of Illinois

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

arxiv: v1 [math.at] 10 Jan 2016

arxiv: v1 [math.at] 10 Jan 2016 THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the

More information

LEGO MINDSTORMS Education EV3 Coding Activities

LEGO MINDSTORMS Education EV3 Coding Activities LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University

More information

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance a Assistant Professor a epartment of Computer Science Memoona Khanum a Tahira Mahboob b b Assistant Professor

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1 Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of

More information

Honors Mathematics. Introduction and Definition of Honors Mathematics

Honors Mathematics. Introduction and Definition of Honors Mathematics Honors Mathematics Introduction and Definition of Honors Mathematics Honors Mathematics courses are intended to be more challenging than standard courses and provide multiple opportunities for students

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Generating Test Cases From Use Cases

Generating Test Cases From Use Cases 1 of 13 1/10/2007 10:41 AM Generating Test Cases From Use Cases by Jim Heumann Requirements Management Evangelist Rational Software pdf (155 K) In many organizations, software testing accounts for 30 to

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Attributed Social Network Embedding

Attributed Social Network Embedding JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy

More information

Julia Smith. Effective Classroom Approaches to.

Julia Smith. Effective Classroom Approaches to. Julia Smith @tessmaths Effective Classroom Approaches to GCSE Maths resits julia.smith@writtle.ac.uk Agenda The context of GCSE resit in a post-16 setting An overview of the new GCSE Key features of a

More information

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers Daniel Felix 1, Christoph Niederberger 1, Patrick Steiger 2 & Markus Stolze 3 1 ETH Zurich, Technoparkstrasse 1, CH-8005

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information