A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Similar documents
Sweden, The Baltic States and Poland November 2000

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information

A Case Study: News Classification Based on Term Frequency

AQUA: An Ontology-Driven Question Answering System

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SPECIAL ARTICLES Pharmacy Education in Vietnam

Assignment 1: Predicting Amazon Review Ratings

Multilingual Sentiment and Subjectivity Analysis

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Extracting and Ranking Product Features in Opinion Documents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Software Maintenance

Australian Journal of Basic and Applied Sciences

Cross Language Information Retrieval

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Rule Learning With Negation: Issues Regarding Effectiveness

Grade 6: Correlated to AGS Basic Math Skills

Using dialogue context to improve parsing performance in dialogue systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Linking Task: Identifying authors and book titles in verbose queries

Efficient Online Summarization of Microblogging Streams

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Variations of the Similarity Function of TextRank for Automated Summarization

Postprint.

Movie Review Mining and Summarization

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Cal s Dinner Card Deals

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Evidence for Reliability, Validity and Learning Effectiveness

success. It will place emphasis on:

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Probabilistic Latent Semantic Analysis

A Comparison of Two Text Representations for Sentiment Analysis

Extending Place Value with Whole Numbers to 1,000,000

Radius STEM Readiness TM

Word Segmentation of Off-line Handwritten Documents

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Rule Learning with Negation: Issues Regarding Effectiveness

How do adults reason about their opponent? Typologies of players in a turn-taking game

Georgetown University at TREC 2017 Dynamic Domain Track

Statewide Framework Document for:

Lecture 1: Machine Learning Basics

TextGraphs: Graph-based algorithms for Natural Language Processing

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Learning Methods for Fuzzy Systems

arxiv: v1 [cs.cl] 2 Apr 2017

Multi-Lingual Text Leveling

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Beyond the Pipeline: Discrete Optimization in NLP

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Functional Skills Mathematics Level 2 assessment

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Axiom 2013 Team Description Paper

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Leveraging Sentiment to Compute Word Similarity

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Extracting Verb Expressions Implying Negative Opinions

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking

Visit us at:

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Mining Topic-level Opinion Influence in Microblog

Detecting English-French Cognates Using Orthographic Edit Distance

Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

A study of speaker adaptation for DNN-based speech synthesis

1.11 I Know What Do You Know?

Patterns for Adaptive Web-based Educational Systems

Comment-based Multi-View Clustering of Web 2.0 Items

Semantic and Context-aware Linguistic Model for Bias Detection

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Spinners at the School Carnival (Unequal Sections)

Truth Inference in Crowdsourcing: Is the Problem Solved?

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Ensemble Technique Utilization for Indonesian Dependency Parser

Story Problems with. Missing Parts. s e s s i o n 1. 8 A. Story Problems with. More Story Problems with. Missing Parts

Reducing Features to Improve Bug Prediction

Parsing of part-of-speech tagged Assamese Texts

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Writing a Basic Assessment Report. CUNY Office of Undergraduate Studies

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Automating the E-learning Personalization

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse

Visual CP Representation of Knowledge

Language Independent Passage Retrieval for Question Answering

Transcription:

A Novel Two-stage Framework for Extracting Opinionate Sentences from News Articles Pujari Rajkumar 1, Swara Desai 2, Niloy Ganguly 1 an Pawan Goyal 1 1 Dept. of Computer Science an Engineering, Inian Institute of Technology Kharagpur, Inia 721302 2 Yahoo! Inia 1 rajkumarsaikorian@gmail.com, {niloy,pawang}@cse.iitkgp.ernet.in 2 swara@yahoo-inc.com Abstract This paper presents a novel two-stage framework to extract opinionate sentences from a given news article. In the first stage, Naïve Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionate. In the secon stage, we use this prior within the HITS (Hyperlink-Inuce Topic Search) schema to exploit the global structure of the article an relation between the sentences. In the HITS schema, the opinionate sentences are treate as Hubs an the facts aroun these opinions are treate as the Authorities. The algorithm is implemente an evaluate against a set of manually marke ata. We show that using HITS significantly improves the precision over the baseline Naïve Bayes classifier. We also argue that the propose metho actually iscovers the unerlying structure of the article, thus extracting various opinions, groupe with supporting facts as well as other supporting opinions from the article. 1 Introuction With the avertising base revenues becoming the main source of revenue, fining novel ways to increase focusse user engagement has become an important research topic. A typical problem face by web publishing houses like Yahoo!, is unerstaning the nature of the comments poste by reaers of 10 5 articles poste at any moment on its website. A lot of users engage in iscussions in the comments section of the articles. Each user has a ifferent perspective an thus comments in that genre - this many a times, results in a situation where the iscussions in the comment section waner far away from the articles topic. In orer to assist users to iscuss relevant points in the comments section, a possible methoology can be to generate questions from the article s content that seek user s opinions about various opinions conveye in the article (Rokhlenko an Szpektor, 2013). It woul also irect the users into thinking about a spectrum of various points that the article covers an encourage users to share their unique, personal, aily-life experience in events relevant to the article. This woul thus provie a broaer view point for reaers as well as perspective questions can be create thus catering to users with rich user generate content, this in turn can increase user engagement on the article pages. Generating such questions manually for huge volume of articles is very ifficult. However, if one coul ientify the main opinionate sentences within the article, it will be much easier for an eitor to generate certain questions aroun these. Otherwise, the sentences themselves may also serve as the points for iscussion by the users. Hence, in this paper we iscuss a two-stage algorithm which picks opinionate sentences from the articles. The algorithm assumes an unerlying structure for an article, that is, each opinionate sentence is supporte by a few factual statements that justify the opinion. We use the HITS schema to exploit this unerlying structure an pick opinionate sentences from the article. The main contribtutions of this papers are as follows. First, we present a novel two-stage framework for extracting opinionate sentences from a news article. Seconly, we propose a new evaluation metric that takes into account the fact that since the amount of polarity (an thus, the number of opinionate sentences) within ocuments can vary a lot an thus, we shoul stress on the ratio of opinionate sentences in the top sentences, relative to the ratio of opinionate sentences in the article. Finally, iscussions on how the propose algorithm captures the unerlying structure of the opinions an surrouning facts in a news article reveal that the algorithm oes much more than just extracting opinionate sentences. This paper has been organise as follows. Section 2 iscusses relate work in this fiel. In section 3, we iscuss our two-stage moel in further etails. Section 4 iscusses the experimental framework an the results. Further iscussions on the unerlying assumption behin using HITS along with error analysis are carrie out in Section 5. Conclusions an future work are etaile in Section 6. 2 Relate Work Opinion mining has rawn a lot of attention in recent years. Research works have focuse on mining 25 Proceeings of TextGraphs-9: the workshop on Graph-base Methos for Natural Language Processing, pages 25 33, October 29, 2014, Doha, Qatar. c 2014 Association for Computational Linguistics

opinions from various information sources such as blogs (Conra an Schiler, 2007; Harb et al., 2008), prouct reviews (Hu an Liu, 2004; Qair, 2009; Dave et al., 2003), news articles (Kim an Hovy, 2006; Hu an Liu, 2006) etc. Various aspects in opinion mining have been explore over the years (Ku et al., 2006). One important imension is to ientify the opinion holers as well as opinion targets. (Lu, 2010) use epenency parser to ientify the opinion holers an targets in Chinese news text. (Choi et al., 2005) use Conitional Ranom Fiels to ientify the sources of opinions from the sentences. (Kobayashi et al., 2005) propose a learning base anaphora resolution technique to extract the opinion tuple < Subject, Attribute, V alue >. Opinion summarization has been another important aspect (Kim et al., 2013). A lot of research work has been one for opinion mining from prouct reviews where most of the text is opinion-rich. Opinion mining from news articles, however, poses its own challenges because in contrast with the prouct reviews, not all parts of news articles present opinions (Balahur et al., 2013) an thus fining opinionate sentences itself remains a major obstacle. Our work mainly focus on classifying a sentence in a news article as opinionate or factual. There have been works on sentiment classification (Wiebe an Riloff, 2005) but the task of fining opinionate sentences is ifferent from fining sentiments, because sentiments mainly convey the emotions an not the opinions. There has been research on fining opinionate sentences from various information sources. Some of these works utilize a ictionary-base (Fei et al., 2012) or regular pattern base (Brun, 2012) approach to ientify aspects in the sentences. (Kim an Hovy, 2006) utilize the presence of a single strong valence wors as well as the total valence score of all wors in a sentence to ientify opinion-bearing sentences. (Zhai et al., 2011) work on fining evaluative sentences in online iscussions. They exploit the inter-relationship of aspects, evaluation wors an emotion wors to reinforce each other. Thus, while ours is not the first attempt at opinion extraction from news articles, to the best of our knowlege, none of the previous works has exploite the global structure of a news article to classify a sentence as opinionate/factual. Though summarization algorithms (Erkan an Raev, 2004; Goyal et al., 2013) utilize the similarity between sentences in an article to fin the important sentences, our formulation is ifferent in that we conceptualize two ifferent kins of noes in a ocument, as oppose to the summarization algorithms, which treat all the sentences equally. In the next section, we escribe the propsoe two-stage algorithm in etail. 3 Our Approach Figure 1 gives a flowchart of the propose two-stage metho for extracting opinionate sentences from news articles. First, each news article is pre-processe to get the epenency parse as well as the TF-IDF vector corresponing to each of the sentences present in the article. Then, various features are extracte from these sentences which are use as input to the Naïve Bayes classifier, as will be escribe in Section 3.1. The Naïve Bayes classifier, which correspons to the first-stage of our metho, assigns a probability score to each sentence as being an opinionate sentence. In the secon stage, the entire article is viewe as a complete an irecte graph with eges from every sentence to all other sentences, each ege having a weight suitably compute. Iterative HITS algorithm is applie to the sentence graph, with opinionate sentences conceptualize as hubs an factual sentences conceptualize as authorities. The two stages of our approach are etaile below. 3.1 Naïve Bayes Classifier The Naïve Bayes classifier assigns the probability for each sentence being opinionate. The classifier is traine on 70 News articles from politics omain, sentences of which were marke by a group of annotators as being opinionate or factual. Each sentence was marke by two annotators. The inter-annotator agreement using Cohen s kappa coefficient was foun to be 0.71. The features utilize for the classifier are etaile in Table 1. These features were aapte from those reporte in (Qair, 2009; Yu an Hatzivassiloglou, 2003). A list of positive an negative polar wors, further expane using wornet synsets was taken from (Kim an Hovy, 2005). Stanfor epenency parser (De Marneffe et al., 2006) was utilize to compute the epenencies for each sentence within the news article. After the features are extracte from the sentences, we use the Weka implementation of Naïve Bayes to train the classifier 1. Table 1: Features List for the Naïve Bayes Classifier 1. Count of positive polar wors 2. Count of negative polar wors 3. Polarity of the root verb of the sentence 4. Presence of acomp, xcomp an avmo epenencies in the sentence 3.2 HITS The Naïve Bayes classifier as iscusse in Section 3.1 utilizes only the local features within a sentence. Thus, the probability that a sentence is opinionate remains 1 http://www.cs.waikato.ac.nz/ml/weka/ 26

Figure 1: Flow Chart of Various Stages in Our Approach inepenent of its context as well as the ocument structure. The main motivation behin formulating this problem in HITS schema is to utilize the hien link structures among sentences. HITS stans for Hyperlink-Inuce Topic Search ; Originally, this algorithm was evelope to rank Web-pages, with a particular insight that some of the webpages (Hubs) serve as catalog of information, that coul lea users irectly to the other pages, which actually containe the information (Authorities). The intuition behin applying HITS for the task of opinion extraction came from the following assumption about unerlying structure of an article. A news article pertains to a specific theme an with that theme in min, the author presents certain opinions. These opinions are justifie with the facts present in the article itself. We conceptualize the opinionate sentences as Hubs an the associate facts for an opinionate sentence as Authorities for this Hub. To escribe the formulation of HITS parameters, let us give the notations. Let us enote a ocument D using a set of sentences {S 1, S 2,..., S i,..., S n }, where n correspons to the number of sentences in the ocument D. We construct the sentence graph where noes in the graph correspon to the sentences in the ocument. Let H i an A i enote the hub an authority scores for sentence S i. In HITS, the eges always flow from a Hub to an Authority. In the original HITS algorithm, each ege is given the same weight. However, it has been reporte that using weights in HITS upate improves the performance significantly (Li et al., 2002). In our formulation, since each noe has a non-zero probablility of acting as a hub as well as an authority, we have outgoing as well as incoming eges for every noe. Therefore, the weights are assigne, keeping in min the proximity between sentences as well as the probability (of being opinionate/factual) assigne by the classifier. The following criteria were use for eciing the weight function. An ege in the HITS graph goes from a hub (source noe) to an authority (target noe). So, the ege weight from a source noe to a target noe shoul be higher if the source noe has a high hub score. A fact corresponing to an opinionate sentence shoul be iscussing the same topic. So, the ege weight shoul be higher if the sentences are more similar. It is more probable that the facts aroun an opinion appear closer to that opinionate sentence in the article. So, the ege weight from a source to target noe ecreases as the istance between the two sentences increases. Let W be the weight matrix such that W ij enotes the weight for the ege from the sentence S i to the sentence S j. Base on the criteria outline above, we formulate that the weight W ij shoul be such that W ij H i W ij Sim ij W ij 1 ist ij where we use cosine similarity between the sentence vectors to compute Sim ij. ist ij is simply the number 27

of sentences separating the source an target noe. Various combinations of these factors were trie an will be iscusse in section 4. While factors like sentence similarity an istance are symmetric, having the weight function epen on the hub score makes it asymmetric, consistent with the basic iea of HITS. Thus, an ege from the sentence S i to S j is given a high weight if S i has a high probability score of being opinionate (i.e., acting as hub) as obtaine the classifier. Now, for applying the HITS algorithm iteratively, the Hubs an Authorities scores for each sentence are initialize using the probability scores assigne by the classifier. That is, if P i (Opinion) enotes the probability that S i is an opinionate sentence as per the Naïve Bayes Classifier, H i (0) is initialize to P i (Opinion) an A i (0) is initialize to 1 P i (Opinion). The iterative HITS is then applie as follows: H i (k) = Σ j W ij A i (k 1) (1) A i (k) = Σ j W ji H i (k 1) (2) where H i (k) enote the hub score for the i th sentence uring the k th iteration of HITS. The iteration is stoppe once the mean square error between the Hub an Authority values at two ifferent iterations is less than a threshol ɛ. After the HITS iteration is over, five sentences having the highest Hub scores are returne by the system. 4 Experimental Framework an Results The experiment was conucte with 90 news articles in politics omain from Yahoo! website. The sentences in the articles were marke as opinionate or factual by a group of annotators. In the training set, 1393 out of 3142 sentences were foun to be opinianate. In the test set, 347 out of 830 sentences were marke as opinionate. Out of these 90 articles, 70 articles were use for training the Naïve Bayes classifier as well as for tuning various parameters. The rest 20 articles were use for testing. The evaluation was one in an Information Retrieval setting. That is, the system returns the sentences in a ecreasing orer of their score (or probability in the case of Naïve Bayes) as being opinionate. We then utilize the human jugements (provie by the annotators) to compute precision at various points. Let op(.) be a binary function for a given rank such that op(r) = 1 if the sentence returne as rank r is opinionate as per the human jugements. A P @k precision is calculate as follows: k r=1 P @k = op(r) k (3) While the precision at various points inicates how reliable the results returne by the system are, it oes not take into account the fact that some of the ocuments are opinion-rich an some are not. For the opinion-rich ocuments, a high P @k value might be similar to picking sentences ranomly, whereas for the ocuments with a very few opinions, even a lower P @k value might be useful. We, therefore, evise another evaluation metric M@k that inicates the ratio of opinionate sentences at any point, normalize with respect to the ratio of opinionate sentences in the article. Corresponingly, an M@k value is calculate as M@k = P @k Ratio op (4) where Ratio op enotes the fraction of opinionate sentences in the whole article. Thus Ratio op = Number of opinionate sentences Number of sentences (5) The parameters that we neee to fix for the HITS algorithm were the weight function W ij an the threshol ɛ at which we stop the iteration. We varie ɛ from 0.0001 to 0.1 multiplying it by 10 in each step. The results were not sensitive to the value of ɛ an we use ɛ = 0.01. For fixing the weight function, we trie out various combinations using the criteria outline in Section 3.2. Various weight functions an the corresponing P @5 an M@5 scores are shown in Table 2. Firstly, we varie k in Sim k ij an foun that the square of the similarity function gives better results. Then, keeping it constant, we varie l in H l i an foun the best results for l = 3. Then, keeping both of these constants, we varie α in (α + 1 ). We foun the best results for α = 1.0. With this α, we trie to vary l again but it only reuce the final score. Therefore, we fixe the weight function to be W ij = H i 3 (0)Sim ij 2 (1 + 1 ist ij ) (6) Note that H i (0) in Equation 6 correspons to the probablity assigne by the classifier that the sentence S i is opinionate. We use the classifier results as the baseline for the comparisons. The secon-stage HITS algorithm is then applie an we compare the performance with respect to the classifier. Table 3 shows the comparison results for various precision scores for the classifier an the HITS algorithm. In practical situation, an eitor requires quick ientification of 3-5 opinionate sentences from the article, which she can then use to formulate questions. We thus report P @k an M@k values for k = 3 an k = 5. From the results shown in Table 3, it is clear that applying the secon-stage HITS over the Naïve Bayes Classifier improves the performance by a large egree, both in term of P @k an M@k. For instance, the first-stage NB Classifier gives a P @5 of 0.52 an P @3 of 0.53. Using the classifier outputs uring the secon-stage HITS algorithm improves the 28

Table 2: Average P @5 an M@5 scores: Performance comparison between various functions for W ij Function P @5 M@5 Sim ij 0.48 0.94 Sim 2 ij 0.57 1.16 Sim 3 ij 0.53 1.11 Sim 2 ijh i 0.6 1.22 Sim 2 2 ijh i 0.61 1.27 Sim 2 3 ijh i 0.61 1.27 Sim 2 4 ijh i 0.58 1.21 Sim 2 3 1 ijh i 0.56 1.20 Sim 2 ijh 3 i (0.2 + 1 ) 0.60 1.25 Sim 2 ijh 3 i (0.4 + 1 ) 0.61 1.27 Sim 2 ijh 3 i (0.6 + 1 ) 0.62 1.31 Sim 2 ijh 3 i (0.8 + 1 ) 0.62 1.31 Sim 2 ijh 3 i (1 + 1 ) 0.63 1.33 Sim 2 ijh 3 i (1.2 + 1 ) 0.61 1.28 Sim 2 ijh 2 i (1 + 1 ) 0.6 1.23 Table 3: Average P @5, M@5, P @3 an M@3 scores: Performance comparison between the NB classifier an HITS System P@5 M@5 P@3 M@3 NB Classifier 0.52 1.13 0.53 1.17 HITS 0.63 1.33 0.72 1.53 Imp. (%) +21.2 +17.7 +35.8 +30.8 preformance by 21.2% to 0.63 in the case of P @5. For P @3, the improvements were much more significant an a 35.8% improvement was obtaine over the NB classifier. M@5 an M@3 scores also improve by 17.7% an 30.8% respectively. Strikingly, while the classifier gave nearly the same scores for P @k an M@k for k = 3 an k = 5, HITS gave much better results for k = 3 than k = 5. Specially, the P @3 an M@3 scores obtaine by HITS were very encouraging, inicating that the propose approach helps in pushing the opinionate sentences to the top. This clearly shows the avantage of using the global structure of the ocument in contrast with the features extracte from the sentence itself, ignoring the context. Figures 2 an 3 show the P @5, M@5, P @3 an M@3 scores for iniviual ocuments as numbere from 1 to 20 on the X-axis. The articles are sorte as per the ratio of P @5 (an M@5) obtaine using the HITS an NB classifier. Y-axis shows the corresponing scores. Two ifferent lines are use to represent the results as returne by the classifier an the HITS algorithm. A ashe line enotes the scores obtaine by HITS while a continuous line enotes the scores obtaine by the NB classifier. A etaile analysis of these figures can help us raw the following conclusions: For 40% of the articles (numbere 13 to 20) HITS improves over the baseline NB classifier. For 40% of the articles (numbere 5 to 12) the results provie by HITS were the same as that of the baseline. For 20% of the articles (numbere 1 to 4) HITS gives a performance lower than that of the baseline. Thus, for 80% of the ocuments, the secon-stage performs at least as goo as the first stage. This inicates that the secon-stage HITS is quite robust. M@5 results are much more robust for the HITS, with 75% of the ocuments having an M@5 score > 1. An M@k score > 1 inicates that the ratio of opinionate sentences in top k sentences, picke up by the algorithm, is higher than the overall ratio in the article. For 45% of the articles, (numbere 6, 9 11 an 15 20), HITS was able to achieve a P @3 = 1.0. Thus, for these 9 articles, the top 3 sentences picke up by the algorithm were all marke as opinionate. The graphs also inicate a high correlation between the results obtaine by the NB classifier an HITS. We use Pearson s correlation to fin the correlation strength. For the P @5 values, the correlation was foun to be 0.6021 an for the M@5 values, the correlation was obtaine as 0.5954. In the next section, we will first attempt to further analyze the basic assumption behin using HITS, by looking at some actual Hub-Authority structures, capture by the algorithm. We will also take some cases of failure an perform error analysis. 5 Discussion First point that we wante to verify was, whether HITS is really capturing the unerlying structure of the ocument. That is, are the sentences ientifie as authorities for a given hub really correspon to the facts supporting the particular opinion, expresse by the hub sentence. Figure 4 gives two examples of the Hub-Authority structure, as capture by the HITS algorithm, for two ifferent articles. For each of these examples, we show the sentence ientifie as Hub in the center along with the top four sentences, ientifie as Authorities for that hub. We also give the annotations as to whether the sentences were marke as opinionate or factual by the annotators. In both of these examples, the hubs were actually marke as opinionate by the annotators. Aitionally, we fin that all the four sentences, ientifie as authorities to the hub, are very relevant to the opinion expresse by the hub. In the first example, top 3 authority sentences are marke as factual by the annotator. Although the fourth sentence is marke as opinionate, it can be seen that this sentence presents a supporting opinion for the hub sentence. While stuying the secon example, we foun that while the first authority oes not present an important fact, the fourth authority surely oes. Both of these 29

(a) Comparison of P@5 values (b) Comparison of M@5 values Figure 2: Comparison Results for 20 Test articles between the Classifier an HITS: P@5 an M@5 (a) Comparison of P@3 values (b) Comparison of M@3 values Figure 3: Comparison Results for 20 Test articles between the Classifier an HITS: P@3 an M@3 (a) Hub-Authority Structure: Example 1 (b) Hub-Authority Structure: Example 2 Figure 4: Example from two ifferent test articles capturing the Hub-Authority Structure were marke as factual by the annotators. In this particular example, although the secon an thir authority sentences were annotate as opinionate, these can be seen as supporting the opinion expresse by the hub sentence. This example also gives us an interesting iea to improve iversification in the final results. That is, once an opinionate sentence is ientifie by the algorithm, the hub score of all its suthorities can be reuce proportional to the ege weight. This will reuce the chances of the supporting opinions being reurne by the system, at a later stage as a main opinion. We then attempte to test our tool on a recently publishe article, What s Wrong with a Meritocracy Rug? 2. The tool coul pick up a very 2 http://news.yahoo.com/ whats-wrong-meritocracy-rug-070000354. html 30

important opinion in the article, Most people ten to think that the most qualifie person is someone who looks just like them, only younger., which was ranke 2 n by the system. The supporting facts an opinions for this sentence, as iscovere by the algorithm were also quite relevant. For instance, the top two authorities corresponing to this sentence hub were: 1. An that appreciation, we learne painfully, can easily be tinge with all kins of genere elements without the person who is making the ecisions even realizing it. 2. An many of the traits we value, an how we value them, also en up being laen with gener overtones. 5.1 Error Analysis We then trie to analyze certain cases of failures. Firstly, we wante to unerstan why HITS was not performing as goo as the classifier for 3 articles (Figures 2 an 3). The analysis reveale that the supporting sentences for the opinionate sentences, extracte by the classifier, were not very similar on the textual level. Thus a low cosine similarity score resulte in having lower ege weights, thereby getting a lower hub score after applying HITS. For one of the articles, the sentence picke up by HITS was wrongly annotate as a factual sentence. Then, we looke at one case of failure ue to the error introuce by the classifier prior probablities. For instance, the sentence, The civil war between establishment an tea party Republicans intensifie this week when House Speaker John Boehner slamme outsie conservative groups for riiculous pushback against the bipartisan buget agreement which cleare his chamber Thursay. was classifie as an opinionante sentence, whereas this is a factual sentence. Looking closely, we foun that the sentence contains three polar wors (marke in bol), as well as an avm o epenency between the pair (slamme,when). Thus the sentence got a high initial prior by the classifier. As a result, the outgoing eges from this noe got a higher H 3 i factor. Some of the authorities ientifie for this sentence were: For Democrats, the tea party is the gift that keeps on giving. Tea party sympathetic organizations, Boehner later sai, are pushing our members in places where they on t want to be. which ha wors, similar to the original sentence, thus having a higher Sim ij factor as well. We foun that these sentences were also very close within the article. Thus, a high hub prior along with a high outgoing weight gave rise to this sentence having a high hub score after the HITS iterations. 5.2 Online Interface To facilitate easy usage an unerstaning of the system by others, a web interface has been built for the system 3. The webpage caters for users to either input a new article in form of text to get top opinionate sentences or view the output analysis of the system over manually marke test ata consisting of 20 articles. The wors in green color are positive polar wors, re inicates negative polar wors. Wors marke in violet are the root verbs of the sentences. The colore graph shows top ranke opinionate sentences in yellow box along with top supporting factual sentences for that particluar opinionate sentence in purple boxes. Snapshots from the online interface are provie in Figures 5 an 6. 6 Conclusions an Future Work In this paper, we presente a novel two-stage framework for extracting the opinionate sentences in the news articles. The problem of ientifying top opinionate sentences from news articles is very challenging, especially because the opinions are not as explicit in a news article as in a iscussion forum. It was also evient from the inter-annotator agreement an the kappa coefficient was foun to be 0.71. The experiments conucte over 90 News articles (70 for training an 20 for testing) clearly inicate that the propose two-stage metho almost always improves the performance of the baseline classifier-base approach. Specifically, the improvements are much higher for P @3 an M@3 scores (35.8% an 30.8% over the NB classifier). An M@3 score of 1.5 an P @3 score of 0.72 inicates that the propose metho was able to push the opinionate sentences to the top. On an average, 2 out of top 3 sentences returne by the system were actually opinionate. This is very much esire in a practical scenario, where an eitor requires quick ientification of 3-5 opinionate sentences, which she can then use to formulate questions. The examples iscusse in Section 5 bring out another important aspect of the propose algorithm. In aition to the main objective of extracting the opinionate sentences within the article, the propose metho actually iscovers the unerlying structure of the article an woul certainly be useful to present various opinions, groupe with supporting facts as well as supporting opinions in the article. While the initial results are encouraging, there is scope for improvement. We saw that the results obtaine via HITS were highly correlate with the Naïve Bayes classifier results, which were use in assigning a weight to the ocument graph. One irection for the future work woul be to experiment with other features to improve the precision of the classifier. Aitionally, in the current evaluation, we are not evaluating the egree of iversity of the opinions returne by the system. The Hub-Authority 3 available at http://cse.iitkgp.ac.in/ resgrp/cnerg/temp2/final.php 31

Figure 5: Screenshot from the Web Interface Figure 6: Hub-Authority Structure as output on the Web Interface structure of the secon example gives us an interesting iea to improve iversification an we woul like to implement that in future. In the future, we woul also like to apply this work to track an event over time, base on the opinionate sentences present in the articles. When an event occurs, articles start out with more factual sentences. Over time, opinions start surfacing on the event, an as the event matures, opinions preominate the facts in the articles. For example, a set of articles on a plane crash woul start out as factual, an woul offer expert opinions over time. This work can be use to plot the maturity of the meia coverage by keeping track of facts v/s opinions on any event, an this can be use by organizations to provie a timeline for the event. We woul also like to experiment with this moel on a ifferent meia like microblogs. References Alexanra Balahur, Ralf Steinberger, Mijail Kabajov, Vanni Zavarella, Erik Van Der Goot, Matina Halkia, Bruno Pouliquen, an Jenya Belyaeva. 2013. Sentiment analysis in the news. arxiv preprint arxiv:1309.6202. Caroline Brun. 2012. Learning opinionate patterns for contextual opinion etection. In COLING (Posters), pages 165 174. Yejin Choi, Claire Carie, Ellen Riloff, an Siharth Patwarhan. 2005. Ientifying sources of opinions with conitional ranom fiels an extraction 32

patterns. In Proceeings of the conference on Human Language Technology an Empirical Methos in Natural Language Processing, pages 355 362. Association for Computational Linguistics. Jack G Conra an Frank Schiler. 2007. Opinion mining in legal blogs. In Proceeings of the 11th international conference on Artificial intelligence an law, pages 231 236. ACM. Kushal Dave, Steve Lawrence, an Davi M Pennock. 2003. Mining the peanut gallery: Opinion extraction an semantic classification of prouct reviews. In Proceeings of the 12th international conference on Worl Wie Web, pages 519 528. ACM. Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, et al. 2006. Generating type epenency parses from phrase structure parses. In Proceeings of LREC, volume 6, pages 449 454. Günes Erkan an Dragomir R Raev. 2004. Lexrank: Graph-base lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457 479. Geli Fei, Bing Liu, Meichun Hsu, Malu Castellanos, an Rihiman Ghosh. 2012. A ictionary-base approach to ientifying aspects im-plie by ajectives for opinion mining. In Proceeings of COLING 2012 (Posters). Pawan Goyal, Laxmihar Behera, an Thomas Martin McGinnity. 2013. A context-base wor inexing moel for ocument summarization. Knowlege an Data Engineering, IEEE Transactions on, 25(8):1693 1705. Ali Harb, Michel Plantié, Gerar Dray, Mathieu Roche, François Trousset, an Pascal Poncelet. 2008. Web opinion mining: How to extract opinions from blogs? In Proceeings of the 5th international conference on Soft computing as transisciplinary science an technology, pages 211 217. ACM. Minqing Hu an Bing Liu. 2004. Mining opinion features in customer reviews. In Proceeings of Nineteeth National Conference on Artificial Intellgience (AAAI). Minqing Hu an Bing Liu. 2006. Opinion extraction an summarization on the web. In AAAI, volume 7, pages 1621 1624. Soo-Min Kim an Euar Hovy. 2005. Automatic etection of opinion bearing wors an sentences. In Proceeings of IJCNLP, volume 5. Soo-Min Kim an Euar Hovy. 2006. Extracting opinions, opinion holers, an topics expresse in online news meia text. In Proceeings of the Workshop on Sentiment an Subjectivity in Text, pages 1 8. Association for Computational Linguistics. Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Umeshwar Dayal, an Rihiman Ghosh. 2013. Compact explanatory opinion summarization. In Proceeings of the 22n ACM international conference on Conference on information & knowlege management, pages 1697 1702. ACM. Nozomi Kobayashi, Ryu Iia, Kentaro Inui, an Yuji Matsumoto. 2005. Opinion extraction using a learning-base anaphora resolution technique. In The Secon International Joint Conference on Natural Language Processing (IJCNLP), Companion Volume to the Proceeing of Conference incluing Posters/Demos an Tutorial Abstracts. Lun-Wei Ku, Yu-Ting Liang, an Hsin-Hsi Chen. 2006. Opinion extraction, summarization an tracking in news an blog corpora. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, volume 100107. Longzhuang Li, Yi Shang, an Wei Zhang. 2002. Improvement of hits-base algorithms on web ocuments. In Proceeings of the 11th international conference on Worl Wie Web, pages 527 535. ACM. Bin Lu. 2010. Ientifying opinion holers an targets with epenency parser in chinese news texts. In Proceeings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL. Ashequl Qair. 2009. Detecting opinion sentences specific to prouct features in customer reviews using type epenency relations. In Proceeings of the Workshop on Events in Emerging Text Types, eetts 09, pages 38 43. Oleg Rokhlenko an Ian Szpektor. 2013. Generating synthetic comparable questions for news articles. In ACL, pages 742 751. Janyce Wiebe an Ellen Riloff. 2005. Creating subjective an objective sentence classifiers from unannotate texts. In Computational Linguistics an Intelligent Text Processing, pages 486 497. Springer. Hong Yu an Vasileios Hatzivassiloglou. 2003. Towars answering opinion questions: Separating facts from opinions an ientifying the polarity of opinion sentences. In Proceeings of the 2003 Conference on Empirical Methos in Natural Language Processing, EMNLP 03, pages 129 136. Zhongwu Zhai, Bing Liu, Lei Zhang, Hua Xu, an Peifa Jia. 2011. Ientifying evaluative sentences in online iscussions. In Proceeings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. 33