COMPONENT BASED SUMMARIZATION USING AUTOMATIC IDENTIFICATION OF CROSS-DOCUMENT STRUCTURAL RELATIONSHIP

Size: px
Start display at page:

Download "COMPONENT BASED SUMMARIZATION USING AUTOMATIC IDENTIFICATION OF CROSS-DOCUMENT STRUCTURAL RELATIONSHIP"

Transcription

1 IADIS International Conference Applied Computing 2012 COMPONENT BASED SUMMARIZATION USING AUTOMATIC IDENTIFICATION OF CROSS-DOCUMENT STRUCTURAL RELATIONSHIP Yogan Jaya Kumar 1, Naomie Salim 2 and Albaraa Abuobieda 3 1 Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka Durian Tunggal, Melaka, Malaysia 2 Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia , Skudai, Johor, Malaysia 3 Faculty Faculty of Computer Studies, International University of Africa , Khartoum, Sudan ABSTRACT The world we live today witnesses a fast moving information age due to the ever increasing information available online. People are being exposed vast online documents, being retrieved from various sources. The need for automatic document summarization system has deemed necessary to alleviate information overload. In the context of online news documents, different news sources reporting on a particular event tend to contain common components that make up the main story of the news. Based on this conception, we propose component based summarization, i.e. taking into account the generic components of a news story to produce quality summaries. We focus particularly on news stories related to natural disaster events. Besides that, we also investigate the automatic identification of cross structural relationships (CST) between sentences using case base reasoning (CBR) approach. The identified CST relations will be used to extract highly relevant sentences to be included in the summary. As for the evaluation process, the performance of our proposed approach was evaluated using ROUGE: - a standard evaluation metric used in text summarization. KEYWORDS Multi document summarization, Cross-document structural relationship, Case base reasoning, Machine learning. 1. INTRODUCTION There have been many research works concerning text document summarization in academia (Gupta and Lehal, 2010). Research on text summarization can be of different nature ranging from single document summarization to multi document summarization. In this work we concentrate our attention on the problem of producing multi document summary for news articles, particularly news stories related to natural disaster events. Imagine that a user tries to find information about the news regarding the earthquake which occurred in Sendai, Japan. The user will probably receive dozens of articles, possibly related. Thus it is favorable to have a system which can generate a summary containing the most important information contained in those articles. Such systems are designed to take a cluster of related documents and produce a shorter or concise version of the original documents. Much of the work thus far has been in the context of generic summarization (Nenkova and McKeown, 2011). In generic summarization, the importance of information is determined only with respect to the input alone without relating to the goal for generating the summary. This approach is very imprecise and less knowledge-rich. Furthermore, such approach views all documents as homogenous texts regardless of the genre of its domain, i.e. generic summaries make no assumptions about the domain of its source information. This shortcoming has led to the development of summarization systems centered upon various domains of interest. For example in summarizing business articles, biomedical documents and etc. (Wu and Liu, 2003, Verma et al, 2007). As this study involves multi document, we will also investigate the studies related to multi document analysis. Text analysis in documents has nowadays become very prominent, especially when it involves multiple documents e.g. news articles. The idea of cross-document structural relationship is to investigate the existence of inter-document relationships. These relations are based on the CST model (Cross-document 59

2 ISBN: IADIS Structure Theory) (Radev, 2000). Documents which are related to the same topic usually contain semantically related textual units. A number of researchers have addressed the benefits of CST for summarization task (Zhang et al, 2002 and Jorge and Pardo, 2010). However the major limitation of their work is that the CST relationships need to be manually annotated by human expert; which is a drawback for an automatic summarization system. In this study, our proposed approach takes into account the generic components of a news story and performs sentence selection based on automatic identification of CST relationships. We believe that providing comprehensive contextual information coverage would be ideal for news summary creation, as it is close to the way how humans prepare a news related summary. The rest of the paper is organized as follows: Section 2 presents the overview of our approach. It covers the proposed framework together with the CST relationship identification process. Section 3 gives the evaluation results for both the CST relationship identification and the generated summaries. Finally we end with conclusion in Section OVERVIEW OF APPROACH If we look back at previous approaches concerning text summarization, we can observe that the approaches are mainly based on low knowledge representations without any attempt to understand the text. Moreover, until now, most text summarization models incorporate only bag of words as text representation and do not include much contextual information. For example, to provide coverage related to the locations, people and events particularly for a news story. We believe that providing comprehensive contextual information coverage would be ideal for summary creation. As far as news documents are concerned, different news sources reporting on a particular event tend to contain common components that make up the main story of the news. The most common components of a news article consist of WHO, WHEN, WHERE, WHAT and HOW. In the process of news story production, these are the core components which a journalist must collect, interpret, organize, and transmit (Neal and Brown, 1982). Furthermore, such components which are integrated in news articles are very close to how human perceive news information content. Figure 1. General components of news articles related to natural disaster events In the context of natural disaster events, the news components can be mapped to these events. That is including components such as the description of the disaster (HOW), information about affected locations (WHERE), persons involved (WHO), the damages to human and properties (WHAT), the relief efforts (WHAT) and the organizations involved (WHO) (see Figure 1). Such occurrence of component sentence with its information content description is what the readers usually search for while reading news stories related to natural disaster events. 60

3 IADIS International Conference Applied Computing Framework Overview Figure 2 shows the general design for our proposed component based summarization (CBS) framework. First it receives a cluster of news articles that need to be summarized. Using the GATE tool, all component sentences are extracted from these documents. The extracted component sentences are then further preprocessed and directed into CBR model to identify the CST relationships it holds. Based on the type and number of CST relationship, each component sentence will be scored and ranked in their predefined component clusters. Redundant sentences will be removed by using word overlap check. To generate the summary, high ranking sentences are selected according to each component cluster size until the desired summary length is met. Finally the selected summary sentences are sorted according to its original position in the document Component Sentence Extraction Figure 2. Component based summarization (CBS) framework To extract the component sentences from the news articles, we looked at some existing information extraction (IE) techniques. Over the years, a number of information extraction techniques have been developed. A comprehensive review and analysis of these techniques can be found in (Moens, 2003). We have employed the technique which uses gazetteer lists. Despite its simplicity, many IE systems have shown that this technique works well in various applications (Wimalasuriya and Dou, 2009). As opposed to IE technique which uses linguistic rules, this technique recognizes individual words or phrases instead of patterns. This approach is similar to the one used for named entity recognition (NER) task. First, the words or phrases to be identified for a particular category are stated in a list, known as a gazetteer list. In our work, these categories refer to the components in the news documents (refer to Figure 1). Once the text documents are annotated by the entities of each component, then by using Java Annotation Patterns Engine (JAPE) grammar, the component sentences are recognized and extracted. In this work, we used the General Architecture for Text Engineering (GATE) tool (Cunningham et al, 2002), which is a widely used NLP framework that provides the platform to employ this technique Identification of CST Relationships In this section, we will discuss all the steps to be considered for automatically identifying the CST relationships between sentences pairs. Cross-document relationships between sentences can indicate the sentences with high relevance in a particular document cluster. In our work, the sentence relevancy is evaluated with respect to its component cluster. We have considered four types CST relationship namely Identity, Subsumption, Description and Overlap. As most of the other CST relations are covered by these four relations, we consider them sufficient for our summarization task. Table 1 lists the details of the four CST relations that need to be identified. 61

4 ISBN: IADIS Table 1. CST relations used in this work Relationship Identity Subsumption Description Overlap (partial equivalence) Description The same text appears in more than one location S1 contains all information in S2, plus additional information not in S2 S1 describes an entity mentioned in S2 S1 provides facts X and Y while S2 provides facts X and Z; X, Y, and Z should all be non-trivial. At first, all the extracted component sentences will be preprocessed using stop word removal and word stemming. Then feature extraction is performed. We represent each of the sentence pairs using lexical and semantic features. Below we describe the features that were computed for each sentences pair: Cosine similarity cosine similarity is used to measure the similarity of two sentences (S). Here the sentences are represented as word vectors with tf-idf as its element (i) value: S1, i S2, i cos( S1, S2) (1) 2 2 ( S1, i) ( S2, i) Word overlap this feature represents the measure based on the number of overlapping words in the two sentences. This measure is not sensitive to the word order in the sentences (Zahri and Fukumoto, 2011): # commonwords( S1, S2) overlap( S1, S2) (2) # words( S1) # words( S2) Length ty pe of S 1 this feature gives the length type of the first sentence when the lengths of two sentences are compared: lengtype( S1) 1 if length( S1) length( S2), -1 if length( S1) length( S2), (3) 0 if length( S1) length( S2) NP simila rity this feature represents the noun phrase (NP) similarity between two sentences. The similarity between the NPs is calculated according to Jaccard coefficient as defined as in the following equation: NP( S1) NP( S2) NP( S1, S2) (4) NP( S1) NP( S2) VP simila rity this feature represents the verb phrase (VP) similarity between two sentences. The similarity between the VPs is calculated according to Jaccard coefficient as defined as in the following equation: VP( S1) VP( S2) VP( S1, S2) (5) VP( S1) VP( S2) We have proposed the case based reasoning (CBR) approach to perform the identification of CST relationship in this work. CBR is a supervised based learning algorithm which has four major phases i.e. Retrieve, Reuse, Revise and Adapt. Once we have extracted all the features from each sentence pair, we represent them as feature vectors (inputs). These inputs together with their respective outputs (CST relationship types) represent the cases in the casebase. Next to identify the relationship type of a new case (sentence pair), we will compare the input feature vector of the new case with existing cases in casebase. We use the cosine measure to retrieve the similar cases. If the similarity value is more than the predefined threshold value, the model will reuse the solution. Thus, the solution (relationship type) to the new case will then be the output of the most similar case retrieved from the casebase. If the similarity value is less than the threshold value, the model will revise the new case as No relation type and adapt the revised new case into the casebase. This process is depicted in Figure 3. 62

5 IADIS International Conference Applied Computing Sentence Scoring Figure 3. CBR process for CST relationship identification In text summarization, one of the key phases prior towards sentence selection is sentence scoring. This phase involves giving score to each sentence based on some scoring metrics and then rank those sentences based on the highest score. The high ranking sentences will then become the potential sentence to be included in the summary. In our work, the sentence scoring will be based on the type and number of CST relationships i.e. the final score is obtained by integrating all the CST relationship scores: 4 Score( S)= Score( Rk ) (6) k 1 where: a) Score( S ) = the score of the sentence S b) Score( R k ) = the score of the CST relation k, i.e. the total number of relation k owned by sentence S Sentence Selection After sentence scoring and ranking completes, the final phase will be the sentence selection phase. The phase completes a summary by adding the high ranking sentence in a text until the summary length is met. This is the most standard method for sentence selection. However in our proposed component based summarization approach, we choose the sentences based on the size of the component clusters. The size ratio of each component cluster is first computed. Then the highly ranked sentences in each component cluster are picked according to its size ratio. This process ends when it reaches the desired summary length. Finally the selected summary sentences are ordered according to its original position in the document. 3. EVALUATION As our proposed component based summarization approach uses automatically identified CST relations for summary generation, it is not fair to just evaluate the final summary alone. The performance of CBR classification should also be evaluated. This is essential because the performance of the classification has direct implication on the final results of the system. Thus in this section, we will show the evaluation results for both CBR classification model for CST relationship identification and the overall component based summarization (CBS) model for generating the summaries. 63

6 ISBN: IADIS 3.1 CBR Classification Performance It is important to compare different techniques on the same datasets to see if the performance of the technique being proposed is comparable or performs better than the other techniques. We have compared the performance of our proposed CBR model with Neural Network (NN) and Support Vector Machine (SVM), which are two popular machine learning techniques used for classification tasks. In conducting the experiment, we used the dataset taken from CSTBank a corpus consisting clusters of English news articles annotated with CST relationships (Radev and Otterbacher, 2003). Our training and testing set consist of sentence pairs with its corresponding CST type label. We selected 476 sentence pairs for training and 206 sentence pairs for testing. Figure 4 shows the comparison of F-measures between the three techniques. It can be observed that CBR performs better than SVM and NN in classifying three out of five relations. It was also found that overall our CBR model achieved highest classification accuracy, which is 80.58%. The ability of CBR to perform well in CST relationship type identification could be closely related to nature of its learning method itself, i.e. lazy learning. As opposed to eager learning methods which need to generalize the training data to classify new cases, lazy learning is a learning method which performs classification based on the similarity of that problem with already known problems. Concerning our problem domain, since texts data have high variability, the key advantage of lazy learning is that instead of estimating the target function once for the entire instance space, this method can estimate it locally for each new instance to be classified. Besides that, CBR can better fit our CST relationship identification problem since it is capable to adapt new cases into its casebase. This will not require retraining of data as opposed to SVM and NN. Figure 4. Comparison of F-measures between SVM, NN and CBR 3.2 Summary Evaluation Our system was evaluated using 44 news articles (related to natural disaster events) from different document sets of test data obtained from Document Understanding Conference (DUC) DUC 2002 is a standard corpus used in text summarization studies. As to evaluate the generated summary, we used ROUGE, a common tool used for this purpose. ROUGE (Recall-Oriented Understudy for Gisting Evaluation), proposed by Lin (2004) is package for automatic evaluation for summaries. This system measures the quality of a system generated summary by comparing it to a human created summary (gold standard). There are many variances in ROUGE evaluation measure; however it was found that ROUGE-1, ROUGE-2, ROUGE-S and ROUGE-SU worked well in multi document summarization tasks (Lin, 2004). Thus we employ these four measures in this work. We selected two model (human) summaries from DUC 2002, namely HI and H2. H1 is used as reference summary to measure the quality of the generated summaries for each method (the proposed CBS method, baseline and H2-H1). H2 is used as human with human benchmark (H2-H1). We also use the baseline (without using component) as our comparison model. 64

7 IADIS International Conference Applied Computing 2012 Table 2. Summarization results comparison between CBS and baseline (without using component) using ROUGE-1, ROUGE-2, ROUGE-S and ROUGE-SU Using Component Without Component Measure AVG-R AVG-P AVG-F AVG-R AVG-P AVG-F ROUGE ROUGE ROUGE-S ROUGE-SU The evaluation was based on the average recall, precision and F-measure of the ROUGE metrics. Table 2 shows the comparison between the proposed CBS method and the baseline method. The baseline method also uses CST relations to select the most relevant sentences for its summary generation. However it threats all sentences to be homogenous without integrating the components associated to it. The experimental result shows that adopting news components improves the quality of the summary. Also, as can be seen in Figure 5, the graphs indicate that the CBS method is close to human benchmark in terms of its performance. We could say that incorporating the structure or components of a news document does influence the nature of automatic summarization. Such way of utilizing news component s content knowledge will benefit the summarization process as it gives an intuitive thought on the kind of information that is essential to be included in the summary. (a) (b) (c) Figure 5. The CBS, baseline and H2-H1 comparison: Average precision, recall and f-measure using (a) ROUGE-1, (b) ROUGE-2, (c) ROUGE-S and (d) ROUGE-SU (d) 65

8 ISBN: IADIS 4. CONCLUSION In this paper, we have introduced a method based on the generic components of news for multi document summarization problem where our approach is focused on generating summaries for news articles related to natural disaster events. Our proposed component based summarization model (CBS) is integrated with crossdocument relationships (CST) to identify highly relevant sentences to be included in the summary. As opposed to previous works which highlighted the benefit of CST relations using manually annotated text, in this work we have attempted to automatically identify the CST relations. In order to achieve this task, we designed a case based reasoning (CBR) model and obtained good classification results. The overall performance of our proposed CBS model was assessed using the dataset from DUC 2002 whereby its performance was measured using ROUGE measures. The results showed that the proposed method is effective and came close to the human benchmark scores. This supports our hypothesis i.e. providing comprehensive contextual information coverage using generic components of news would be ideal for news summary creation, as it is close to the way how humans prepare a news related summary. Although we focus on natural disaster news stories, the concepts and techniques are applicable to other domains as well. As future work, we plan to improve the CST relationship identification by using feature weighting and also considering suitable learning algorithm to improve the sentence scoring phase to better rank the important sentences. ACKNOWLEDGEMENT This research is supported by the Ministry of Higher Education (MOHE), Universiti Teknikal Malaysia Melaka (UTeM) and Universiti Teknologi Malaysia (UTM). REFERENCES Gupta, V. and G.S. Lehal, A Survey of Text Summarization Extractive. J. Emerging Technologies in Web Intelligence, Vol. 2, No.3, pp Nenkova, A. and McKeown, K., Automatic Summarization. Foundations and Trends in Information Retrieval,Vol. 5, pp Wu, C. and Liu, C., Ontology-based Text Summarization for Business News Articles. Proceeding of the Computers and Their Applications, pp Verma, R., P. Chen and W. Lu, A Semantic free-text summarization system using ontology knowledge. Proceedings of the Document Understanding Conference, pp Radev, D.R., A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure. Proceeding SIGDIAL, Vol. 10, pp Zhang, Z., Blair-Goldensohn, S., and Radev, D.R., Towards CST-Enhanced Summarization. In Proceedings of AAAI/IAAI, pp Jorge, M.L.C., Pardo, T.S., Experiments with CST-based Multidocument Summarization, Workshop on Graphbased Methods for Natural Language Processing, ACL. Uppsala, Sweden, pp Neal, James M. and Suzanne S. Brown Newswriting and Reporting. Surjeet Publications, Delhi. Moens, M., Information Extraction: Algorithms and Prospects in a Retrieval Context, The Information Retrieval Series, Springer-Verlag, Secaucus, NJ. Wimalasuriya D. C. and Dou, D., Ontology-Based Information Extraction: An Introduction and a Survey of Current Approaches. Journal of Information Science, pp Cunningham, H., Bontcheva, K., Tablan, V. and Maynard, D., GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02). Philadelphia. Zahri, N.A.H.B. and Fukumoto, F., Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences. In Proceedings of CICLing, Vol. 2, pp Lin, C.Y., ROUGE: A Package for Automatic Evaluation of Summaries, In Proceedings of Workshop on Text Summarization of ACL, Spain. Radev, D.R., Otterbacher, J., CSTBank PhaseI. Available from: 66

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Variations of the Similarity Function of TextRank for Automated Summarization

Variations of the Similarity Function of TextRank for Automated Summarization Variations of the Similarity Function of TextRank for Automated Summarization Federico Barrios 1, Federico López 1, Luis Argerich 1, Rosita Wachenchauzer 12 1 Facultad de Ingeniería, Universidad de Buenos

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Columbia University at DUC 2004

Columbia University at DUC 2004 Columbia University at DUC 2004 Sasha Blair-Goldensohn, David Evans, Vasileios Hatzivassiloglou, Kathleen McKeown, Ani Nenkova, Rebecca Passonneau, Barry Schiffman, Andrew Schlaikjer, Advaith Siddharthan,

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Automating the E-learning Personalization

Automating the E-learning Personalization Automating the E-learning Personalization Fathi Essalmi 1, Leila Jemni Ben Ayed 1, Mohamed Jemni 1, Kinshuk 2, and Sabine Graf 2 1 The Research Laboratory of Technologies of Information and Communication

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

HLTCOE at TREC 2013: Temporal Summarization

HLTCOE at TREC 2013: Temporal Summarization HLTCOE at TREC 2013: Temporal Summarization Tan Xu University of Maryland College Park Paul McNamee Johns Hopkins University HLTCOE Douglas W. Oard University of Maryland College Park Abstract Our team

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCES

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCES ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCES Afan Oromo news text summarizer BY GIRMA DEBELE DINEGDE A THESIS SUBMITED TO THE SCHOOL OF GRADUTE STUDIES OF ADDIS ABABA

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Handling Sparsity for Verb Noun MWE Token Classification

Handling Sparsity for Verb Noun MWE Token Classification Handling Sparsity for Verb Noun MWE Token Classification Mona T. Diab Center for Computational Learning Systems Columbia University mdiab@ccls.columbia.edu Madhav Krishna Computer Science Department Columbia

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Vocabulary Agreement Among Model Summaries And Source Documents 1

Vocabulary Agreement Among Model Summaries And Source Documents 1 Vocabulary Agreement Among Model Summaries And Source Documents 1 Terry COPECK, Stan SZPAKOWICZ School of Information Technology and Engineering University of Ottawa 800 King Edward Avenue, P.O. Box 450

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Expert locator using concept linking. V. Senthil Kumaran* and A. Sankar

Expert locator using concept linking. V. Senthil Kumaran* and A. Sankar 42 Int. J. Computational Systems Engineering, Vol. 1, No. 1, 2012 Expert locator using concept linking V. Senthil Kumaran* and A. Sankar Department of Mathematics and Computer Applications, PSG College

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION SUMMARY 1. Motivation 2. Praat Software & Format 3. Extended Praat 4. Prosody Tagger 5. Demo 6. Conclusions What s the story behind?

More information

The Role of String Similarity Metrics in Ontology Alignment

The Role of String Similarity Metrics in Ontology Alignment The Role of String Similarity Metrics in Ontology Alignment Michelle Cheatham and Pascal Hitzler August 9, 2013 1 Introduction Tim Berners-Lee originally envisioned a much different world wide web than

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

Summarizing Text Documents: Carnegie Mellon University 4616 Henry Street

Summarizing Text Documents:   Carnegie Mellon University 4616 Henry Street Summarizing Text Documents: Sentence Selection and Evaluation Metrics Jade Goldstein y Mark Kantrowitz Vibhu Mittal Jaime Carbonell y jade@cs.cmu.edu mkant@jprc.com mittal@jprc.com jgc@cs.cmu.edu y Language

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles

Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles Ralf Krestel, 1 Sabine Bergler, 2 and René Witte 3 1 L3S Research Center Universität Hannover, Germany 2 Department of Computer

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Class Responsibility Assignment (CRA) for Use Case Specification to Sequence Diagrams (UC2SD)

Class Responsibility Assignment (CRA) for Use Case Specification to Sequence Diagrams (UC2SD) Class Responsibility Assignment (CRA) for Use Case Specification to Sequence Diagrams (UC2SD) Jali, N., Greer, D., & Hanna, P. (2014). Class Responsibility Assignment (CRA) for Use Case Specification to

More information

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization PNR : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization Li Wenie, Wei Furu,, Lu Qin, He Yanxiang Department of Computing The Hong Kong Polytechnic University,

More information

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education Journal of Software Engineering and Applications, 2017, 10, 591-604 http://www.scirp.org/journal/jsea ISSN Online: 1945-3124 ISSN Print: 1945-3116 Applying Fuzzy Rule-Based System on FMEA to Assess the

More information

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross-lingual Text Fragment Alignment using Divergence from Randomness Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov Dec. 2015), PP 01-07 www.iosrjournals.org Longest Common Subsequence: A Method for

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

CWIS 23,3. Nikolaos Avouris Human Computer Interaction Group, University of Patras, Patras, Greece

CWIS 23,3. Nikolaos Avouris Human Computer Interaction Group, University of Patras, Patras, Greece The current issue and full text archive of this journal is available at wwwemeraldinsightcom/1065-0741htm CWIS 138 Synchronous support and monitoring in web-based educational systems Christos Fidas, Vasilios

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Automatic document classification of biological literature

Automatic document classification of biological literature BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made available soon. Automatic

More information

Robust Sense-Based Sentiment Classification

Robust Sense-Based Sentiment Classification Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

Task Tolerance of MT Output in Integrated Text Processes

Task Tolerance of MT Output in Integrated Text Processes Task Tolerance of MT Output in Integrated Text Processes John S. White, Jennifer B. Doyon, and Susan W. Talbott Litton PRC 1500 PRC Drive McLean, VA 22102, USA {white_john, doyon jennifer, talbott_susan}@prc.com

More information

ARNE - A tool for Namend Entity Recognition from Arabic Text

ARNE - A tool for Namend Entity Recognition from Arabic Text 24 ARNE - A tool for Namend Entity Recognition from Arabic Text Carolin Shihadeh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany carolin.shihadeh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg 3 66123

More information

Data Fusion Models in WSNs: Comparison and Analysis

Data Fusion Models in WSNs: Comparison and Analysis Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services Segmentation of Multi-Sentence s: Towards Effective Retrieval in cqa Services Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua Department of Computer Science School of Computing National University of Singapore

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information