Review on Abstractive Text Summarization Techniques for Biomedical Domain

Similar documents
Probabilistic Latent Semantic Analysis

AQUA: An Ontology-Driven Question Answering System

Data Fusion Models in WSNs: Comparison and Analysis

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Word Segmentation of Off-line Handwritten Documents

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Australian Journal of Basic and Applied Sciences

Rule Learning With Negation: Issues Regarding Effectiveness

Disambiguation of Thai Personal Name from Online News Articles

Python Machine Learning

A heuristic framework for pivot-based bilingual dictionary induction

A Case Study: News Classification Based on Term Frequency

Rule Learning with Negation: Issues Regarding Effectiveness

Parsing of part-of-speech tagged Assamese Texts

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Assignment 1: Predicting Amazon Review Ratings

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Ensemble Technique Utilization for Indonesian Dependency Parser

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Circuit Simulators: A Revolutionary E-Learning Platform

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Knowledge-Based - Systems

Matching Similarity for Keyword-Based Clustering

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

The Strong Minimalist Thesis and Bounded Optimality

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Prediction of Maximal Projection for Semantic Role Labeling

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evolution of Symbolisation in Chimpanzees and Neural Nets

ScienceDirect. Malayalam question answering system

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Introduction to CRC Cards

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Automating the E-learning Personalization

The Smart/Empire TIPSTER IR System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning Methods for Fuzzy Systems

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Short Text Understanding Through Lexical-Semantic Analysis

Some Principles of Automated Natural Language Information Extraction

Customized Question Handling in Data Removal Using CPHC

On-Line Data Analytics

Exposé for a Master s Thesis

Compositional Semantics

Test Effort Estimation Using Neural Network

A Domain Ontology Development Environment Using a MRD and Text Corpus

The College Board Redesigned SAT Grade 12

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Classification Using ANN: A Review

Seminar - Organic Computing

Learning Methods in Multilingual Speech Recognition

Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant Sudheer Takekar 1 Dr. D.N. Raut 2

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Efficient Online Summarization of Microblogging Streams

Automatic document classification of biological literature

The MEANING Multilingual Central Repository

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

MYCIN. The MYCIN Task

A Pipelined Approach for Iterative Software Process Model

Calibration of Confidence Measures in Speech Recognition

Constraining X-Bar: Theta Theory

Beyond the Pipeline: Discrete Optimization in NLP

Major Milestones, Team Activities, and Individual Deliverables

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Issues in the Mining of Heart Failure Datasets

Application of Multimedia Technology in Vocabulary Learning for Engineering Students

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Mining Association Rules in Student s Assessment Data

Procedia - Social and Behavioral Sciences 226 ( 2016 ) 27 34

SEMAFOR: Frame Argument Resolution with Log-Linear Models

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Linking Task: Identifying authors and book titles in verbose queries

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Ontologies vs. classification systems

Multimedia Application Effective Support of Education

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Infrared Paper Dryer Control Scheme

Specification of the Verity Learning Companion and Self-Assessment Tool

Software Maintenance

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

The stages of event extraction

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Evolutive Neural Net Fuzzy Filtering: Basic Description

Human Emotion Recognition From Speech

Bug triage in open source systems: a review

Transcription:

Volume-03 Issue-05 May-2018 ISSN: 2455-3085 (Online) www.rrjournals.com [UGC Listed Journal] Review on Abstractive Text Summarization Techniques for Biomedical Domain *1 Krutika Patel & 2 Urmi Desai *1 Computer Engineering Department, Sarvajanik College of Engineering and Technology, Athwalines, Surat, Gujarat (India) 2 Computer Engineering Department, Sarvajanik College of Engineering and Technology, Athwalines, Surat, Gujarat (India) ARTICLE DETAILS Article History Published Online: 04 May 2018 Keywords Abstractive Summary, Text Summarization, rich semantic * Corresponding Author Email: kruti2443[at]gmail.com ABSTRACT In this internet era amount of biomedical literature and data are increased exponentially. In order to keep up to date with knowledge of this field and other related area information also interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. Biomedical or Biological domain is all about studying life and tremendous amount of biomedical textual information has produced and collected all over the world on daily basis. The task of analyzing huge amount of biomedical data and association of biological data is much difficult. To efficiently analyze the biomedical domain data text summarization approach is used. Automatic text summarization provides solution by generating summary automatically. Text summarization techniques classified into extractive and abstractive text summarization types. Existing techniques of extractive text summarization extract important sentences from original and generate summary without any modification of actual data. This technique may not present conflicting information properly. Abstractive text summarization can solve this problem by representing the extracted sentences into another understandable semantic form. This paper discusses abstractive text summarization techniques and highlights the parametric evaluation of these techniques. 1. Introduction Text Summarization is a process of reducing data from the vast amount of literatures. For biomedical field tremendous amount of information are there for clinical and researchers from a variety of sources, for example, scientific literature databases, Electronic Health Records (EHR) systems, web archives, patient s reports and interactive media records. The scientific literatures give wellsprings of data to researchers like MEDLINE, PubMed, IEEE and ACM digital library. Clinical trials and scientific publications supply a new researches or technology frequently for more advancement in biomedical field. It helps the clinicians and researcher analysts to look for important information and save their time to seek information. Some reasons have been identified for producing summaries from full-text s even when they provide abstracts. The reason incorporates there are variants of an ideal summary in addition to the abstract, 1) some content of the full-text may be missed in the abstract, 2) customized summaries are useful in question answering systems, 3) automatic summaries allow abstract services to scale the number of s they can evaluate, and 4) assessing the quality of sentence selection methods can be helpful in development of multi summarization system. Automatic text summarization gives a decent intends to fast obtaining of data through compression and refinement. While existing strategies for automatic text summarization achieve elegant performance on short sequences, however, they are facing the challenges of low efficiency and accuracy when dealing with long text. An automatic text summarization is an effective technique, which utilizes computers to process and compress texts in order to produce concise and refined content. In the time of enormous information and rapid of information overload, automatic text summarization has become an important and timely tool for user to quickly understand the large volume of information. The automatic summarization is the core subtle part of natural language processing [1][3].Automatic text summarization used in many areas, for example, news articles outlines, email summary, short message news on portables, information summary for businessman, online search engines and biomedical and so forth [8][9][11]. In extractive text summarization extracted sentences could become longer than the average [2][3]. Due to this some of the portion which are not important for summary that also gets included. Moreover the conflicting information may not be presented properly [2]. Abstractive text summarization can solve this problem by representing the extracted sentences into another more understandable semantic form [2]. In this paper we are studying different techniques of abstractive text summarization. This paper aims to make survey of existing abstractive text summarization techniques along with parametric evaluation of these techniques. This paper covers the details: various different text summarization techniques described in section 2. The parametric evaluation of abstractive text summarization techniques is presented in section 3. Finally, section 4 concludes with a discussion of future research directions in this area. 2. Related Work This section gives a detailed description of various abstractive text summarization techniques. Depending on the RRIJM 2015, All Rights Reserved 21 P a g e

input and other parameters summarization categorized into two group s extractive and abstractive summarization. Fig. 1.0: Text Summarization Basic Process Abstractive summarization classified into two categories: structured base and semantic base. In Abstractive based method, semantic representation of (s) used to feed into natural language generation (NLG) system. This method focus on identifying noun phrases and verb phrases by processing linguistic data. Different techniques utilized this approach are discussed here [10][14][16]: modal semantic model catches the ideas and relationship among source information that important ideas evaluated based on some measures and finally the selected concepts expressed as sentences to form summary. In information item based method the contents of summary generated from abstract representation of source s. The abstract representation called information item, in which the smallest element of coherent information in a text. In based method summary of forms by creation a rich semantic (RSH) of the original, reduced the generated semantic, and then generating the final abstractive summary. text representation method analyzed input text using semantics of words rather than syntax structure of text. 3. Various Techniques for Text Summarization Most of work done in text summarization has focused in this section, we discuss different approaches and some works on abstractive text summarization. A. Graph Reduction Approach This approach [1] outlines an input by creating semantic called as rich semantic (RSG). The semantic further reduced and generates final abstractive summary from reduced semantic. System takes input as a solitary in English language and output generated as reduced summary report. Source Fig. 2.0: Architecture of Graph Reduction Approach This approach comprises of three task. The first task is RSG creation. The main aim of the RSG creation to represent the input semantically. In that verb and nouns of input represented as nodes and edges represented as semantic relations between them. In this way it builds the for each sentence and afterward it interconnects rich semantic sub-s. At the end the sub-s, all the sub merged together to represent the whole semantically. The second phase called RSG reduction. In this phase a set of rules applied on RSG to reduce it by merging, deleting the nodes. Third phase generates abstractive summary from reduced RSG. This approach succeeds to reduce the source up to half of the original. Limitation of this approaches that no multiple s taken as input to generate abstractive summary. B. Graph based Approach Rich Graph Rich Graph Reduction Summarized Text Generation This approach uses word to represent source. This approach includes two phases [2]. First phase sentence reduction and second sentence combination. The sentence reduction phase based on discourse rules to remove RRIJM 2015, All Rights Reserved 22 Page

redundant clauses at the beginning of a sentence, and syntactic constraints to complete the end of the reduced sentence. used for sentence combinations and to represent word relations between texts [12]. New sentences are generated from several sentences which are generated by using word. In word nodes used to store the information about words and their part of speech tagger and edges used to represent adjacency relations between word pairs. This approach generate syntactically correct sentence but does not care about word meaning. C. Sentiment Infusion Approach This approach work on a based technique that generates summaries of redundant opinions and uses sentiment analysis to combine the statements. Also uses word for compressing and merging information and then summaries are generated from resultant sentences. The captures the redundancy in the using words that occur more than once in the texts that mapped to the same node. Moreover, the creation does not require any domain knowledge. At the time of generation this approach will ensure the correctness of sentences. Building the word Fig. 3.0 Architecture of Sentiment Infusion Approach [3] For getting abstractive summary, score given to all the paths as well as the sentences have been fused. After that ranked the sentences in descending order of their scores and remove duplicate sentences from summary using jacquard index for similarity measure. Then the remaining top most sentences chosen for the summary. D. Genetic Graph Based Approach This approach [4] work on a genetic semantic based approach for multi text summarization. This approach constructs a semantic from text in such a way that the nodes represent the Predicate Ensuring the sentence correctness Getting abstractive summaries Creation of Graph Modified Graph Based Ranking Algorithm MMR for Reducing Redundancy Abstractive Summary Generation Fig. 4.0 Proposed Genetic Graph Based Approach [3] Argument Structure (PASs) and the edges of the represent a semantic similarity weight which can be determined from PAS-to-PAS semantic similarity, and PAS-to- set relationship. For constructing PASs they use semantic role labelling. In order to reduce redundancy, utilize maximal marginal relevance (MMR) to re-ranks the PASs and use language generation to generate summary sentences from the top ranked PASs [13]. This approach automatically merges similar information across the s to reduce the overlapping information in summary. E. Clustered Genetic Graph based Approach This [5] work on clustered genetic semantic based approach for multi abstractive text summarization. This approach similar to genetic semantic based approach but that used clustering algorithm to eliminate redundancy. Algorithm eliminate redundancy in such a way that representative PAS with the highest similarity score from each cluster chosen and fed to language generation to generate summary sentences. For making cluster used Hierarchical Agglomerative Clustering (HAC) algorithm [15]. HAC algorithm accepts the semantic similarity matrix as input. Algorithm merges two clusters which most similar and update the semantic similarity matrix to represent the pair wise similarity between the nearest cluster and the original cluster. Creation of Graph Augmented Graph with PAS-to Set Relationship Modified Graph Based Ranking Algorithm Clustering for Eliminating Redundancy Augmented Graph with PAS-to Set Relationship Abstractive Summary Generation Fig. 5.0 Proposed Clustered Genetic Graph Based Approach [5] RRIJM 2015, All Rights Reserved 23 Page

Process repeats until the compression rate of summary reached 20%. Once the clusters obtained, top scored PASs obtained using simple natural language generation and a simple heuristic rules form to generate summary sentences from PASs. 4. Comparison of abstractive text summarization techniques This section illustrates comparison of previously discussed abstractive text summarization techniques use full for biomedical domain [12] [14]. Table 1 shows a comparative study of abstractive text summarization techniques based on parameters as follows. Type of text summarization parameter indicates that abstractive summary to be generated from single source or multi s. Source representation parameter constituted that the original text to be represented in which form. Content selection parameter represent that which techniques or algorithm used for extracting important information. Summary generation parameter describes that final abstractive summary generated in which form. summarization parameter and syntactically correct representation parameter indicates that generated summary is semantically and syntactically correct or not. This all techniques are based on mono-lingual language based techniques. There are other languages also available like multilingual and cross-lingual. In mono-lingual language based technique, input and output language is same. However in multi-lingual language input would be in more than one language and output will be in the user desired language and in cross-lingual language, input and output language is different from each other. Technique Title Graph Reduction Approach [1] Graph based Approach [2] Sentiment Infusion Approach [3] Type of Text Summarizat ion Table 1. Parametric Evaluation of Abstractive Text Summarization Techniques Original Text Representation Rich semantic Content Selection rules Relation among words, Clauses Sentiment analysis Summary Generation Reduced semantic Path Scoring, Sentence Fusion ally Correct Summarization Syntactically Correct Representation Technique used for Eliminate Redundancy No - No - - Genetic Graph based Approach [4] Clustered Genetic Graph based Approach [5] Role Labeling and Similarity Score Role Labeling and Similarity score SimpleNLG and Simple rule SimpleNLG and Simple rule Maximal Marginal Relevance (MMR) algorithm Clustering algorithm 5. Conclusion In this paper, we study different abstractive text summarization techniques based on natural language processing, data mining and semantic similarity approaches. These all techniques used to generate summary automatically from source. This, All techniques are mono lingual language based. based reduction approach produces concise, coherent and less redundant sentences. Sentiment infusion approach generates summary which semantically and syntactically correct and in reduced form. Among all text summarization techniques, clustered genetic semantic based approach eliminate the overlapping semantic redundancy significantly. Future work may include developing a more efficient technique with multi- lingual or cross-lingual structure based. One can also try to generate more concise and less redundant summary by designing new approach or by merging available techniques which provides accurate summary results for biomedical domain. References 1. IF. Moawad, M. Aref, Graph Reduction Approach for Abstractive Text Summarization, Computer Engineering & Systems (ICCES), IEEE, 2012, pp. 132-138. 2. H. T. Le, T. M. Le, An approach to abstractive text summarization, Soft Computing and Pattern Recognition (SoCPaR), IEEE, 2013. RRIJM 2015, All Rights Reserved 24 Page

3. R. Bhargava, Y. Sharma, G. Sharma, ATSSI: Abstractive Text Summarization using Sentiment Infusion, Procedia Computer Science, Elsevier, 2016. 4. A. Khan, N. Salim and Y. J. Kumar, Genetic Graph Approach for Abstractive Summarization, Digital Information Processing and Communications(ICDIPC), IEEE, 2015. 5. A. Khan, N. Salim and H Farman, Clustered Genetic Graph Approach for - Abstractive Summarization, Intelligent Systems Engineering (ICISE), IEEE, 2016. 6. A. R. Pal, D. Saha, An approach to automatic text summarization using Net, Advance Computing Conference (IACC), IEEE, 2014. 7. K. S. Thakkar, R. V. Dharaska, Graph-Based Algorithms for Text Summarization, Emerging Trends in Engineering and Technology (ICETET), IEEE, 2010. 8. J. Zhan, H. T. Loh, Y. Liu, Gather customer concerns from online product reviews A text summarization approach, Expert Systems with Applications, Elsevier, 2009. 9. T. Workman, M. Fiszman and J. Hurdle, "Text summarization as a decision support aid", BMC Medical Informatics and Decision Making, vol. 12, no. 1, 2012. 10. H. Thanh, T. Manh, An approach to Abstractive Text Summarization, IEEE, 2013. 11. R. Mishra, J. Bion and M. Fiszman, Text summarization in the Biomedical Domain: A Systematic Review of Recent Research, J Biomed Inform, 2014. 12. T. Workman, M. Fiszman and J. Hurdle, "Text summarization as a decision support aid", BMC Medical Informatics and Decision Making, vol. 12, no. 1, 2012. 13. H. Menendez and L. Plaza and D. Camacho, "Combining connectivity and genetic clustering to improve biomedical summarization", IEEE Congress on Evolutionary Computation, Beijing, China, 2016. 14. H. Thanh, T. Manh, An approach to Abstractive Text Summarization, IEEE, 2013. 15. I. Yoo, X. Hu and I. Song, "A coherent -based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method", BMC Bioinformatics, vol. 8, no. 9, p. S4, 2007. 16. H. Reeve, H. Han and D. Brooks, The use of domainspecific concepts in biomedical text summarization, ELSEVIER, 17 July 2006. RRIJM 2015, All Rights Reserved 25 Page