Effective Classroom Presentation Generation Using Text Summarization

Similar documents
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

ScienceDirect. Malayalam question answering system

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Writing a composition

A Case Study: News Classification Based on Term Frequency

Linking Task: Identifying authors and book titles in verbose queries

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

What the National Curriculum requires in reading at Y5 and Y6

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Memory-based grammatical error correction

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

The College Board Redesigned SAT Grade 12

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

National Literacy and Numeracy Framework for years 3/4

CEFR Overall Illustrative English Proficiency Scales

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

BULATS A2 WORDLIST 2

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Derivational and Inflectional Morphemes in Pak-Pak Language

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Problems of the Arabic OCR: New Attitudes

Emmaus Lutheran School English Language Arts Curriculum

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Loughton School s curriculum evening. 28 th February 2017

Parsing of part-of-speech tagged Assamese Texts

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Primary English Curriculum Framework

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

AQUA: An Ontology-Driven Question Answering System

Assignment 1: Predicting Amazon Review Ratings

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Speech Recognition at ICSI: Broadcast News and beyond

The Role of the Head in the Interpretation of English Deverbal Compounds

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

5 th Grade Language Arts Curriculum Map

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Beyond the Pipeline: Discrete Optimization in NLP

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Difficulties in Academic Writing: From the Perspective of King Saud University Postgraduate Students

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Reducing Features to Improve Bug Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Cross-Lingual Text Categorization

Python Machine Learning

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Guidelines for Writing an Internship Report

Corpus Linguistics (L615)

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

A Bayesian Learning Approach to Concept-Based Document Classification

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Using a Native Language Reference Grammar as a Language Learning Tool

Probability and Statistics Curriculum Pacing Guide

Constructing Parallel Corpus from Movie Subtitles

Tutoring First-Year Writing Students at UNM

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Grade 5: Module 3A: Overview

On-Line Data Analytics

Advanced Grammar in Use

First Grade Curriculum Highlights: In alignment with the Common Core Standards

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

LING 329 : MORPHOLOGY

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Human Emotion Recognition From Speech

Appendix D IMPORTANT WRITING TIPS FOR GRADUATE STUDENTS

Coast Academies Writing Framework Step 4. 1 of 7

Applications of memory-based natural language processing

Literature and the Language Arts Experiencing Literature

Adjectives tell you more about a noun (for example: the red dress ).

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Physics 270: Experimental Physics

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

STATUS OF OPAC AND WEB OPAC IN LAW UNIVERSITY LIBRARIES IN SOUTH INDIA

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Proof Theory for Syntacticians

On document relevance and lexical cohesion between query terms

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Speech Emotion Recognition Using Support Vector Machine

Transcription:

Effective Classroom Presentation Generation Using Text Summarization Tulasi Prasad Sariki #1, Dr. Bharadwaja Kumar *2, Ramesh Ragala #1 Assistant Professor #1, Associate Professor *2, SCSE, VIT University, Chennai, India tulasiprasadsarik@gmail.com #1 bharadwaja.kumar@vit.ac.in *2 ramesh.ragala@vit.ac.in #1 Abstract Internet content has been growing day by day and it has become a very difficult task to extract any information needed.this is the situation where Information Extraction (IE) comes to play. Automatic Text summarization which is a subset of Information Extraction is being used to summarize or to reduce a text to desirable size. Automatic summarization solves the problem of Information overload by summarizing the content from which you want to extract something. An automatic text summarization implied tool is used to understand the overall view of a document. Important sentences in a document are found and submitted as summary of the content given as input. Terms are selected based on frequency and sentences are selected based on scores for summarization. The summary is obtained by selecting a particular number of sentences from the top of the list. Based on the requirement of the user, sentences are ranked and presented as a summary. The size of the summary can be specified by the user when invoking the tool. The generated summary can also be visualized in the form of a Power Point presentation (PPT), thus making it easy for the user to create an effective classroom presentation. Index Terms - Information Extraction, Power Point presentation, Text Summarization 1. Introduction The rapid growth of WWW has made large volume of documents and information available to the users. To utilize these documents effectively, it is necessary to be able to get a gist of it. It is not possible for humans to create a hand written summary of all the available information. Automatic text summarization provides a solution to this information overload problem. It can also be used to create an effective classroom presentation. Text summarization is the process of compressing a given document into a shortened version by extracting the most important information from it. Approaches for text summarization can be classified into two major categories: extraction and abstraction [1]. The extraction based approach is to create the summary by extracting the important sentences from the original document. Whereas the abstraction based approach is to construct the summary by paraphrasing concepts of the original document. There are two techniques for extracting sentences during summary generation: statistical and linguistic techniques [4]. Statistical techniques are based on term frequencies to find the term importance. Sentences having important terms are given high priorities. On the other hand, linguistic approach identifies the term relationship in the documents through POS tagging, thesaurus usage and grammar analysis. 1527

Power Point presentation is the method of displaying text in various slides in a form that is easily understandable. Text summarization systems help in creating a PPT. 2. Related Work Different approaches to automatic summarization works are as follows: (i) Without using any linguistic analysis approach that is statistical approach [3] (ii)using lexical acknowledgement and classification methods that is sentence to sentence relations (iii) Based on linguistic approach and processing of the documents. is implemented for single document input only in the proposed system we are giving an extra option to the user for giving keywords while generating summary. Based on the user specific keywords we can improve the quality of the summary. 3. System Architecture One more distinction in the summarization process is single document summary and multiple document summaries. Existing commercial summarizing systems make use of the first approach. The summary is created by selecting statistically frequent terms in the document. Another method is selecting sentences based on position in the text document. First line in the paragraph and the title are leading candidates to summarize the whole document in most of the cases in those text summarization systems. The proposed system is based on the word frequency of the text document after eliminating the stop words which doesn t carry any importance but useful in sentence making examples like connectives (and or). After finding the word frequencies based on the frequency count then sentence scoring is made after that we will decide which sentence is will be consider for summary generation. And generated summary is fed into the system which is capable of converting the summary into the form power point presentation which can be useful for the teachers for demonstration of class. The proposed system 4. Proposed System Pre-processing is the initial step of loading the given text into the proposed system and decomposing it into its constituent sentences. Normalization is the method of converting 1528

the text into normalized form by performing processes such as case-folding, tokenization, stop word removal, stemming and lemmatization. The pre-processing steps are: 4.1 Case-Folding It is the process of converting the given text into lower case text in order to avoid repetition of the same word in different cases. This helps the system to distinguish similar terms and improves its accuracy. 4.2 Tokenization It is the process of splitting text into sentence and each sentence into words. For sentence segmentation dot is taken as separator and for words space is taken into account. 4.3 Stop word removal It is the process of removing the stop words, i.e. words which are of less semantic information. Words which are very common and occur in a large majority of the documents but do not include much semantic information are termed as stop words. For example, the, by, a, an, etc are stop words. Categorization is only based on feature terms and not on full stops, commas, colons, semicolons, etc. So they are removed from the document and will not be stored in the signature file for further process. 4.4 Stemming It is the process of mechanically changing or removing the suffixes of some verbs or nouns. It is done to identify the root of any word in a document. In general, a text document contains repetitions of the same word with variations in grammar such as words in different tense forms or sometimes having gerund ( ing suffixed words). Stemming can be of two types: Derivational Stemming Inflectional Stemming Derivational stemming creates new words from existing words. e.g.: Finalize-Final, Useful-Use, Musical- Music, etc. Inflectional stemming confines normalized words to grammatical variants like past tense or present tense or singular or plural form. e.g.: Management-Manag, Classification- Classific, Payment-Pay, etc. 5. Sentence Scoring. Scoring is the process of assigning a score for each sentence to determine its importance in the summary. We have taken multiple methods for generating the sentence score. 5.1 Cue-Phrase Method: Some phrases imply more significance example like significant, impossible, hardly, etc. 5.2 Word frequencies (Key Method): Considering the words only having highest score depending upon the threshold fixed by user in terms of the compression ratio [3]. 5.3 Title Method: Titles are important, and so are the words they contain sentences are play major role in summary. 5.4 Location Method: First and Last sentences of a paragraph, sentences following titles play vital role in the summary generation. 1529

The Sentence importance is calculate as a linear combination of the different methods: Score=ß1.Cue+ß2.Key+ß3.Title+ß4.Location. We have adjusted the coefficients to control each methods significance and user input. The number of sentences required for the summary is decided based on the compression. Few sentences with respective scores are shown in the following table. Sentence Score Table Sentence Despite some signs that the economy is on the mend, a lack of confidence from consumers and companies alike may hamper job growth during the next few months, economists say. Unlike this point last year, there are some indicators for optimism about the U.S. economy. The market seems to be on a rebound, with stock prices growing steadily since March. It was the largest such growth since the summer of 2007. However, the unemployment rate is staggering. The national rate hit 10.2 percent last month, it has been increased in more than 15 years. The jobless rate increased in 19 states and the District of Columbiana in November, according to a recent Labor Department survey. Thirteen states reported an unemployment rate above the current national rate. Score 1.3870968 1.2 0.8666667 0.9 2.6666667 1.3157895 1.4 2.6363637 not confident about the economy. Of that number, 43 percent described the conditions as "very poor." Track unemployment numbers by state and industry 6. Power Point Generation 1.5 1.8571428 It is the method of visualizing the summary in the form of slides thus making it easily understandable. Summary text is taken and is divided into separate sentences. These sentences are stored in an array. The title and credits can be specified by the user. Slides are created and the sentences that were stored in the array are written into selected slides using a file writer. The.txt/.doc summary file is converted into a.ppt file using the POI package in java and it is stored in the specified location. By default this module will generate power point slides with three sentences per slide. The font, size and colour of text can be set to a default value or can be specified by the user. 7. Results A study was carried out by comparing several other statistical text summarizers with this summarization system. Initially a common text document was taken and it was reduced to a summary by us manually. Later same document was given as input to these systems including this project and the total number of sentences matching the manual summary and these automatic text summarizers were calculated. Efficiency of summarizer with other tools without keywords as follows. Track unemployment numbers by state and industry 1.8571428 Polls suggest many Americans are 2.1 1530

[3] Munesh Chandra, Vikrant Gupta, and Santosh Kr. Paul, A Statistical approach for Automatic Text Summarization by Extraction, International Conference on Communication Systems and Network Technologies, 2011 [4] Ghadeer Natshah, YasminTa amra, Bara Amar, and Manal Tamimi, Text Summarization: Using Combinational Statistical and Linguistic Methods. 8. Conclusions: Compared with the existing summarizing systems, the proposed system has been improved a lot in accuracy, flexibility and user interaction. The proposed system allows the user to increase the accuracy of the summary generated by specifying the keywords and adjusting the length of the final summary to be produced. The interaction with the user allows the system to be more flexible thus can create different summaries for the same input document using compression slider values. The existing systems fail to provide options like keyword based summary generation, save as PDF/PPT options. Generate PPT option allows the user to automatically create power point slides of the summary and can be used for any classroom presentation by the user. In future it can be extended to multiple documents also. 9. References [1] M.Suneetha and Dr.S.Sameen Fatima Corpus based Automatic Text Summarization System with HMM Tagger, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-3, July 2011 pf 118 123. [2] Rafeeq Al-Hashemi, Text Summarization Extraction System (TSES) Using Extracted Keywords, International Arab Journal of e- Technology, Vol. 1, No. 4, June 2010 pp 164-168J. 1531