Effective Classroom Presentation Generation Using Text Summarization Tulasi Prasad Sariki #1, Dr. Bharadwaja Kumar *2, Ramesh Ragala #1 Assistant Professor #1, Associate Professor *2, SCSE, VIT University, Chennai, India tulasiprasadsarik@gmail.com #1 bharadwaja.kumar@vit.ac.in *2 ramesh.ragala@vit.ac.in #1 Abstract Internet content has been growing day by day and it has become a very difficult task to extract any information needed.this is the situation where Information Extraction (IE) comes to play. Automatic Text summarization which is a subset of Information Extraction is being used to summarize or to reduce a text to desirable size. Automatic summarization solves the problem of Information overload by summarizing the content from which you want to extract something. An automatic text summarization implied tool is used to understand the overall view of a document. Important sentences in a document are found and submitted as summary of the content given as input. Terms are selected based on frequency and sentences are selected based on scores for summarization. The summary is obtained by selecting a particular number of sentences from the top of the list. Based on the requirement of the user, sentences are ranked and presented as a summary. The size of the summary can be specified by the user when invoking the tool. The generated summary can also be visualized in the form of a Power Point presentation (PPT), thus making it easy for the user to create an effective classroom presentation. Index Terms - Information Extraction, Power Point presentation, Text Summarization 1. Introduction The rapid growth of WWW has made large volume of documents and information available to the users. To utilize these documents effectively, it is necessary to be able to get a gist of it. It is not possible for humans to create a hand written summary of all the available information. Automatic text summarization provides a solution to this information overload problem. It can also be used to create an effective classroom presentation. Text summarization is the process of compressing a given document into a shortened version by extracting the most important information from it. Approaches for text summarization can be classified into two major categories: extraction and abstraction [1]. The extraction based approach is to create the summary by extracting the important sentences from the original document. Whereas the abstraction based approach is to construct the summary by paraphrasing concepts of the original document. There are two techniques for extracting sentences during summary generation: statistical and linguistic techniques [4]. Statistical techniques are based on term frequencies to find the term importance. Sentences having important terms are given high priorities. On the other hand, linguistic approach identifies the term relationship in the documents through POS tagging, thesaurus usage and grammar analysis. 1527
Power Point presentation is the method of displaying text in various slides in a form that is easily understandable. Text summarization systems help in creating a PPT. 2. Related Work Different approaches to automatic summarization works are as follows: (i) Without using any linguistic analysis approach that is statistical approach [3] (ii)using lexical acknowledgement and classification methods that is sentence to sentence relations (iii) Based on linguistic approach and processing of the documents. is implemented for single document input only in the proposed system we are giving an extra option to the user for giving keywords while generating summary. Based on the user specific keywords we can improve the quality of the summary. 3. System Architecture One more distinction in the summarization process is single document summary and multiple document summaries. Existing commercial summarizing systems make use of the first approach. The summary is created by selecting statistically frequent terms in the document. Another method is selecting sentences based on position in the text document. First line in the paragraph and the title are leading candidates to summarize the whole document in most of the cases in those text summarization systems. The proposed system is based on the word frequency of the text document after eliminating the stop words which doesn t carry any importance but useful in sentence making examples like connectives (and or). After finding the word frequencies based on the frequency count then sentence scoring is made after that we will decide which sentence is will be consider for summary generation. And generated summary is fed into the system which is capable of converting the summary into the form power point presentation which can be useful for the teachers for demonstration of class. The proposed system 4. Proposed System Pre-processing is the initial step of loading the given text into the proposed system and decomposing it into its constituent sentences. Normalization is the method of converting 1528
the text into normalized form by performing processes such as case-folding, tokenization, stop word removal, stemming and lemmatization. The pre-processing steps are: 4.1 Case-Folding It is the process of converting the given text into lower case text in order to avoid repetition of the same word in different cases. This helps the system to distinguish similar terms and improves its accuracy. 4.2 Tokenization It is the process of splitting text into sentence and each sentence into words. For sentence segmentation dot is taken as separator and for words space is taken into account. 4.3 Stop word removal It is the process of removing the stop words, i.e. words which are of less semantic information. Words which are very common and occur in a large majority of the documents but do not include much semantic information are termed as stop words. For example, the, by, a, an, etc are stop words. Categorization is only based on feature terms and not on full stops, commas, colons, semicolons, etc. So they are removed from the document and will not be stored in the signature file for further process. 4.4 Stemming It is the process of mechanically changing or removing the suffixes of some verbs or nouns. It is done to identify the root of any word in a document. In general, a text document contains repetitions of the same word with variations in grammar such as words in different tense forms or sometimes having gerund ( ing suffixed words). Stemming can be of two types: Derivational Stemming Inflectional Stemming Derivational stemming creates new words from existing words. e.g.: Finalize-Final, Useful-Use, Musical- Music, etc. Inflectional stemming confines normalized words to grammatical variants like past tense or present tense or singular or plural form. e.g.: Management-Manag, Classification- Classific, Payment-Pay, etc. 5. Sentence Scoring. Scoring is the process of assigning a score for each sentence to determine its importance in the summary. We have taken multiple methods for generating the sentence score. 5.1 Cue-Phrase Method: Some phrases imply more significance example like significant, impossible, hardly, etc. 5.2 Word frequencies (Key Method): Considering the words only having highest score depending upon the threshold fixed by user in terms of the compression ratio [3]. 5.3 Title Method: Titles are important, and so are the words they contain sentences are play major role in summary. 5.4 Location Method: First and Last sentences of a paragraph, sentences following titles play vital role in the summary generation. 1529
The Sentence importance is calculate as a linear combination of the different methods: Score=ß1.Cue+ß2.Key+ß3.Title+ß4.Location. We have adjusted the coefficients to control each methods significance and user input. The number of sentences required for the summary is decided based on the compression. Few sentences with respective scores are shown in the following table. Sentence Score Table Sentence Despite some signs that the economy is on the mend, a lack of confidence from consumers and companies alike may hamper job growth during the next few months, economists say. Unlike this point last year, there are some indicators for optimism about the U.S. economy. The market seems to be on a rebound, with stock prices growing steadily since March. It was the largest such growth since the summer of 2007. However, the unemployment rate is staggering. The national rate hit 10.2 percent last month, it has been increased in more than 15 years. The jobless rate increased in 19 states and the District of Columbiana in November, according to a recent Labor Department survey. Thirteen states reported an unemployment rate above the current national rate. Score 1.3870968 1.2 0.8666667 0.9 2.6666667 1.3157895 1.4 2.6363637 not confident about the economy. Of that number, 43 percent described the conditions as "very poor." Track unemployment numbers by state and industry 6. Power Point Generation 1.5 1.8571428 It is the method of visualizing the summary in the form of slides thus making it easily understandable. Summary text is taken and is divided into separate sentences. These sentences are stored in an array. The title and credits can be specified by the user. Slides are created and the sentences that were stored in the array are written into selected slides using a file writer. The.txt/.doc summary file is converted into a.ppt file using the POI package in java and it is stored in the specified location. By default this module will generate power point slides with three sentences per slide. The font, size and colour of text can be set to a default value or can be specified by the user. 7. Results A study was carried out by comparing several other statistical text summarizers with this summarization system. Initially a common text document was taken and it was reduced to a summary by us manually. Later same document was given as input to these systems including this project and the total number of sentences matching the manual summary and these automatic text summarizers were calculated. Efficiency of summarizer with other tools without keywords as follows. Track unemployment numbers by state and industry 1.8571428 Polls suggest many Americans are 2.1 1530
[3] Munesh Chandra, Vikrant Gupta, and Santosh Kr. Paul, A Statistical approach for Automatic Text Summarization by Extraction, International Conference on Communication Systems and Network Technologies, 2011 [4] Ghadeer Natshah, YasminTa amra, Bara Amar, and Manal Tamimi, Text Summarization: Using Combinational Statistical and Linguistic Methods. 8. Conclusions: Compared with the existing summarizing systems, the proposed system has been improved a lot in accuracy, flexibility and user interaction. The proposed system allows the user to increase the accuracy of the summary generated by specifying the keywords and adjusting the length of the final summary to be produced. The interaction with the user allows the system to be more flexible thus can create different summaries for the same input document using compression slider values. The existing systems fail to provide options like keyword based summary generation, save as PDF/PPT options. Generate PPT option allows the user to automatically create power point slides of the summary and can be used for any classroom presentation by the user. In future it can be extended to multiple documents also. 9. References [1] M.Suneetha and Dr.S.Sameen Fatima Corpus based Automatic Text Summarization System with HMM Tagger, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-3, July 2011 pf 118 123. [2] Rafeeq Al-Hashemi, Text Summarization Extraction System (TSES) Using Extracted Keywords, International Arab Journal of e- Technology, Vol. 1, No. 4, June 2010 pp 164-168J. 1531