Brent Fitzgerald. CS224N Final Project - June 1, 2000

Size: px
Start display at page:

Download "Brent Fitzgerald. CS224N Final Project - June 1, 2000"

Transcription

1 IMPLEMENTATION OF AN AUTOMATED TEXT SEGMENTATION SYSTEM USING HEARST S TEXTTILING ALGORITHM Brent Fitzgerald brentf@stanford.edu CS224N Final Project - June 1, 2000 ABSTRACT This paper describes the implementation of a text segmentation system based on Hearst s TextTiling algorithm. Hearst is a pioneer in the field of text segmentation, and her algorithm has already been shown to provide good results. The algorithm uses lexical frequency and distribution information to recognize the level of cohesion between blocks of text, and then uses these cohesion estimates to judge which sections are likely to be different topics. INTRODUCTION Most of the texts one comes across are composed of a number of topics, perhaps varying in their relevance to one another and their scope. A system that could automatically detect these subtopics would certainly be useful, allowing the reader to quickly skip to the topics most relevant to her purpose. The segmentation might also aid in tasks of information extraction and summarization, since it provides structural semantic information about the document. The ability to identify the various subtopics could let one quickly build outlines of the essential points. More recently, the web s proliferation has led to an overwhelming increase in readily available information, but finding the information one needs can be a difficult task. Search engines and directories provide a means of classifying and organizing this information on a multi-document level, but there is still a need for a system that can provide organization within long, information rich documents. A good segmentation system, perhaps combined with summarization and information extraction technologies, could fill this niche quite nicely. Thus, any highly accurate segmentation system would certainly be useful in these times of overly abundant, undocumented data. The system described in this paper is currently not up to this daunting task, but it is an interesting experiment in building a system that automatically locates topic boundaries. This paper will review the algorithm 1

2 behind the system as well as some of the practical aspects of the implementation, and will conclude with a discussion of the results and some possible extensions of the current system. ALGORITHM AND IMPLEMENTATION There are several different approaches that have been presented in the literature. The approach used in this paper is based on Hearst s TextTiling algorithm, a moving window approach that uses lexical overlap as a means of detecting topic coherence. Another approach called dotplotting, presented by Reynar (1994) and furthered by Choi (2000), finds the similarity between every pair of sentences in the document and uses these results to identify chunks of cohesive sentences. A very different strategy called Lexical Chaining uses lexical semantic similarity information to create chains of related words. Generally, a document will have at least several of these chains, allowing one to segment the document based on the features of the chains, such as start and end points. Hearst s algorithm is used in this system because it is relatively straightforward and well documented. Hearst defines three main components of the TextTiling algorithm. First, it divides the input text into sequences of relevant tokens and calculates the cohesion at each potential boundary point. It then uses these cohesion scores to produce depth scores for each potential boundary point that has a lower cohesion than the neighboring boundary points. Using these depth scores, the algorithm is able to select boundary points where the depth is low relative to the other depth scores, indicating that that gap represents a topic shift in the text. The output is the text file with boundaries inserted at these gaps with sufficiently high depth scores, delineating the various topics by breaking at the least cohesive points. The first task of this system, then, is to calculate the gap scores. In order to do so, it is first necessary to break the document into appropriately sized sequences of text. Gap cohesion is computed between a group of text sequences immediately prior to the gap and a group of text sequences immediately after. Hearst advocates various strategies regarding methods of breaking the text into sequences. One method is to use chunks of text that have some fixed number of valuable tokens. For this approach, Hearst recommends 20 tokens per sequence. The benefit to this approach is that each sequence carries the same amount of information as the other sequences. The other method is to assign each sentence in the document to its own sequence. One advantage to this approach is that the boundaries tested are sentence boundaries rather than mid-sentence boundaries, and thus are better representative of where a change in topic is most likely to occur. The other, more practical advantage of this approach is that if the system finds the gap scores at the sentence boundaries, then it is extremely straightforward to insert the segmentation break points. The other method requires deciding upon the nearest sentence boundary. This system 2

3 uses a one sentence per sequence approach. The system also takes a list of stop words, which are words that uninformative regarding the topic of a particular passage such as the, and, they, we, a, will, can, have, etc. Eliminating these stopwords will prevent the system form getting distracted by irrelevant data. The gap cohesion score is found by creating a vector from the token counts found in some fixed number n of sentence sequences immediately prior to the gap, and another vector from the token counts found in the same number n of sequences immediately following the gap. Hearst suggests a number of sequences approximately equal to the average paragraph length in sentences. A vector similarity metric, such as the cosine FIGURE 1: Gap score results from analysis of concatenation of 10 New York Times articles. Horizontal axis is the gap number, vertical axis is the gap score measured by cohesion of adjacent blocks. Greater vertical axis values indicate higher levels of cohesion. The breaks between the various articles tend to correlate to the low points in the graph. similarity, is then applied to these two vectors to obtain an estimate of the cohesiveness between the two sections. The cosine similarity can be computed This number is called the gap score, and it is calculated at each potential boundary location, obtaining a distribution of gap scores with a visual representation of the form seen in Figure 1. The next step is the smoothing process. As we see in Figure 1, the initial computation of gap scores leaves one wanting clearer boundary markers, since many small local minima might lead to too many small segments in our output. The system lessens the effect of these small local extremities using an average smoothing technique with a flexible window size. Using this system, gap score s i is replaced by (s i - k/2 + + s i + + s i + k/2 ) / (k + 1), for some optimally configured k. The size of k, of course, should depend on the type of document being segmented and granularity of segmentation desired. A smaller k value will leave more of the original information intact, making it a good choice for shorter texts like newspaper articles, but it can lead to too much fragmentation by failing to sufficiently eliminate undesired noise. Larger values of k eliminate the subtleties in the data, and thus are useful if one is planning to segment a larger text. Note that in this implementation, if there are not enough gap scores to smooth using the k value chosen, then the window size collapses to a suitable value. This allows us to smooth the score distribution near the beginning of the text. See Figure 2 for a visual representation of the effects of smoothing on the gap score 3

4 Now that the correlation scores have been calculated and smoothed, the next step is to locate the high and low points in this set of data. A list of the peaks is obtained by culling the scores for local maxima, and then each pair of adjacent peaks is used to find FIGURE 2: SMOOTHING OF NEW YORK TIMES GAP SCORES The following four figures show the effect of smoothing on the New York Times data with various window sizes. the lowest gap score in the valley between. Using these local minima and their neighboring local maxima, it is fairly straightforward to calculate the depth score, which is the difference in height of the left peak and the low point, plus the difference between the right peak and the low point. The depth score is an indication of the lack of correlation at that gap relative to the Concatenated Times Articles, no smoothing (k = 0) correlation at the nearest maxima. Thus, if the depth score is high, then the correlation is particularly low relative to the nearby preceding and successive gaps. If the depth score of a gap is low, then the gap is most likely not a break, since it s gap score does not differ from it s neighbors so much as the other depth scores. To find the boundary points the system finds the depth scores that are sufficiently large relative to the other candidate Same data, smoothing with window size 10 (k = 10) depth scores in the document. This is accomplished by including only those depth scores that exceed mean c (standard deviation), for some optimally configured value c. Hearst recommends a value of 1/2 based on her experiments. Larger values of c increase the number of inserted boundary points. Same data, smoothing with window size 20 (k = 20) EVALUATION AND RESULTS Evaluation of the system s performance consists of running the system on a concatenation of newspaper articles. Newspaper articles seem a decent choice of data because they are readily available and reasonably short, so they can be concatenated together to obtain longer documents where the topic structure is already known. One potential problem with the use of Same data, smoothing with window size 30 (k = 30) 4

5 newspaper articles is that they don t necessarily contain only one major topic. An article might contain several subtopics, each of which might be relevant to one another but no more so than the other articles in the data, which could lead to boundaries inserted mid article. Ideally, it would have been informative and worthwhile to test the system against the segmentation choices made by human judges, as Hearst did in the original evaluation of her system. Hearst s evaluations compared her implementation s performance to that of human judgement, and it fared relatively well with an average precision score of 0.66 compared to the judge s 0.81, and average recall of 0.61 compared to the judges recall score of Indeed, when run on non-test data, the segmentation of this system seems quite reasonable. The tests were run with a variety of parameter specifications. The default parameters of the system were determined by taking the parameters that yielded the highest combined level of precision and recall. In the initial tests, the smoothing window sizes 10 and 20 were found to be too large and significantly hurt both the precision and performance. In the second round of tests, the parameters were kept much more moderate. The results of these tests are attached to this document. The best precision score was 0.77 when run on the New York Times texts, and it was accompanied by a recall score of 0.77 as well. While these scores may sound relatively impressive, it is important to note that they were only numerically evaluated on this one set of data, and so it is unlikely that those parameters would return such high scores in all circumstances. FURTHER RESEARCH This implementation makes no use of structural cues in the text, and it would be interesting and most assuredly beneficial to consider this structure. This could be done by modifying the algorithm to assign the break only to the nearest paragraph boundary, rather than ignoring the white space as we have in this implementation. The choice was made to ignore white space information in order to allow for greater flexibility in the text data we wish to segment. However, if the system were operating within a narrower domain, it would be advantageous to tune the system to take advantage of available cues. For example, if the system was applied to html tagged web page texts, then it would probably be useful to weight the segmentation scheme to break at <P> paragraph boundary tags or <BR> break tags. Another avenue of research is key word and sentence extraction from the sections obtained using this segmenting system, producing a summary or outline of the topics covered in the document. This might be done using a key sentence extraction technique such as those used in summarization systems. It would be an interesting 5

6 research topic to try to improve summarization systems by using a segmentation system to break the text into its subtopics, then find the key sentence summaries for each topic. Other segmentation systems use a stemming routine in the preprocessing stage of the system. Hearst ignores stem values and uses the bare words, but it would certainly be worthwhile to see how using the stems in the similarity measure might affect the segments produced. Finally, TextTiling is language independent, failing to use any semantic information in measuring cohesiveness. Rather than basing the similarity measure on the number occurrences of words in the sequence, it might be beneficial to base the similarity measure on the occurrences of semantic classes of words. This might be done using the synonyms provided by WordNet, perhaps in combination with a sense disambiguator to determine the intended sense. SUMMARY This paper describes research in text segmentation, specifically Hearst s text segmentation algorithm TextTiling. The system presented in this paper uses the TextTiling algorithm to compute the cohesion between blocks of text and determine the most likely boundary locations. While this system fails to perform as well as many of the other segmentation systems that have recently been presented in the literature, it is certainly on the right path and can produce good results with the proper parameters. REFERENCES Choi, F., 2000, Advances in domain independent linear text segmentation. To appear in Proceedings of NAACL'00, Seattle, USA. Hearst, M TextTiling: A quantitative approach to discourse segmentation. Technical Report 93/24, U. of California, Berkeley. Hearst, M Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94), New Mexico, USA,

7 Ponte, J. M., Croft, W.B. 1997, Text Segmentation by Topic. In Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, pp Reynar, J. C. (1994). An automatic method of finding topic boundaries. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94), New Mexico, USA. Richmond, K., Smith, A. and Amitay, E., 1997, Detecting subject boundaries within text: A language independent statistical approach. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP--97), pages , Providence, Rhode Island, August

8 ABOUT THE SOFTWARE The programs included are everything one needs to get started segmenting text! Several properly formatted documents are already included, but it is straightforward to make new ones as well. To segment a text document using this segmentation system: 1. Run sentencesnipper on the text. Sentencesnipper is a quick and dirty sentence boundary detection system. It takes ASCII text as input along with an optional (but highly recommended) list of common abbreviations. The output of sentencesnipper is a printout of each sentence separated by two newline characters. An example is as follows: %> sentencesnipper/sentencesnipper../raw_data/basketball abbreviations The players dispersed after a tense timeout, but a frantic Coach Jeff Van Gundy was still standing on the court. There were just 12.4 seconds on the clock, and all his team needed was one last defensive stop to leave the Miami Heat in pieces once again.... Note that sentencesnipper is not a full-fledged sentence boundary detection program. It sometimes has problems with some abbreviations (even with the abbreviations file included), and commonly inserts two spaces instead of one, and there may well be other yet to be discovered quirks. Generally, though, it does a good job splitting the sentences apart, and is quite appropriate for this particular task. 2. Run segment on the snipped text. Segment is the actual text segmentation program. It requires only one command argument, the document to be segmented. It also takes four optional arguments: a list of stopwords, the threshold coefficient, the comparison size, and the smoothing window size. For example: %> segment data/unmarked_data/nytimes.unmarked stopwords This command runs segment on nytimes.unmarked data file, with the stopwords file, a threshold coefficient at 1 (higher number translates to increased tendency to break at less salient gaps), a 8

9 comparison size of 10 (10 sentences before gap compared to 10 after), and a smoothing window size of 6 (average of 6 surrounding gap scores plus the one to be replaced). 3. Evaluate using evaluation.pl. This is the third component of the package, and it is used to test the accuracy of the segment program s output against a marked version of the same text. The marked text file should be chopped into sentences using sentencesnipper, with each segment boundary marked with a <--BREAK--> statement with one newline character between the statement and both the preceding and next sentences. evaluation.pl takes the name of the data to be tested, the name of the previously marked data, and an integer indicating the leniency. Here is an example of how to run it: %> evaluation/evaluation.pl../.../nytimes.results../.../nytimes.marked 2 This compares the nytimes.results file with the nytimes.marked file, and counts a successful boundary identification even if the break is two or less sentences from the actual break. Here is an example of the output of the program: Actual System target!target selected 8 25!selected Precision = Recall = If any of this doesn t work right or if you have questions, please brentf@stanford.edu. 9

10 These are the results of the second set of tests, The left field is the name of the file, where the first number in the name is the threshold coefficient, the second is the comparison size, and the third is the smoothing window size. Notice that as we decrease our threshold, disallowing the less pronounced breaks, precision increases as recall decreases. Also, notice that for a smoothing window size of 4 we usually get better results than with the other window sizes, and we also seem to get better results with a comparison size of 7. According to this data, the magic numbers are 0.5 threshold, a 7 sentence comparison size, and a 3 smoothing window of size 4, since these figures yield the highest precision score of 0.77, and a decent recall score of 0.77 as well. However, to maintain some degree of generality and ensure that these good results are not specific only to this data, the default values of the actual system will have a weaker threshold of 0 rather than 0.5, ensuring that some segmentation will occur in most texts. Output file Precision Recall nytimes_0.5_3_ nytimes_0.5_3_ nytimes_0.5_3_ nytimes_0.5_5_ nytimes_0.5_5_ nytimes_0.5_5_ nytimes_0.5_7_ nytimes_0.5_7_ nytimes_0.5_7_ nytimes_0.25_3_ nytimes_0.25_3_ nytimes_0.25_3_ nytimes_0.25_5_ nytimes_0.25_5_ nytimes_0.25_5_ nytimes_0.25_7_ nytimes_0.25_7_ nytimes_0.25_7_ nytimes_0_3_ nytimes_0_3_

11 nytimes_0_3_ nytimes_0_5_ nytimes_0_5_ nytimes_0_5_ nytimes_0_7_ nytimes_0_7_ nytimes_0_7_ nytimes_-0.25_3_ nytimes_-0.25_3_ nytimes_-0.25_3_ nytimes_-0.25_5_ nytimes_-0.25_5_ nytimes_-0.25_5_ nytimes_-0.25_7_ nytimes_-0.25_7_ nytimes_-0.25_7_ nytimes_-0.5_3_ nytimes_-0.5_3_ nytimes_-0.5_3_ nytimes_-0.5_5_ nytimes_-0.5_5_ nytimes_-0.5_5_ nytimes_-0.5_7_ nytimes_-0.5_7_ nytimes_-0.5_7_

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes Stacks Teacher notes Activity description (Interactive not shown on this sheet.) Pupils start by exploring the patterns generated by moving counters between two stacks according to a fixed rule, doubling

More information

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 - C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Instructor: Mario D. Garrett, Ph.D.   Phone: Office: Hepner Hall (HH) 100 San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

How to analyze visual narratives: A tutorial in Visual Narrative Grammar How to analyze visual narratives: A tutorial in Visual Narrative Grammar Neil Cohn 2015 neilcohn@visuallanguagelab.com www.visuallanguagelab.com Abstract Recent work has argued that narrative sequential

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

The following information has been adapted from A guide to using AntConc.

The following information has been adapted from A guide to using AntConc. 1 7. Practical application of genre analysis in the classroom In this part of the workshop, we are going to analyse some of the texts from the discipline that you teach. Before we begin, we need to get

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

SURVIVING ON MARS WITH GEOGEBRA

SURVIVING ON MARS WITH GEOGEBRA SURVIVING ON MARS WITH GEOGEBRA Lindsey States and Jenna Odom Miami University, OH Abstract: In this paper, the authors describe an interdisciplinary lesson focused on determining how long an astronaut

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

CSC200: Lecture 4. Allan Borodin

CSC200: Lecture 4. Allan Borodin CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

Visit us at:

Visit us at: White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Aviation English Training: How long Does it Take?

Aviation English Training: How long Does it Take? Aviation English Training: How long Does it Take? Elizabeth Mathews 2008 I am often asked, How long does it take to achieve ICAO Operational Level 4? Unfortunately, there is no quick and easy answer to

More information

What is PDE? Research Report. Paul Nichols

What is PDE? Research Report. Paul Nichols What is PDE? Research Report Paul Nichols December 2013 WHAT IS PDE? 1 About Pearson Everything we do at Pearson grows out of a clear mission: to help people make progress in their lives through personalized

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Getting Started with Deliberate Practice

Getting Started with Deliberate Practice Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

PowerTeacher Gradebook User Guide PowerSchool Student Information System

PowerTeacher Gradebook User Guide PowerSchool Student Information System PowerSchool Student Information System Document Properties Copyright Owner Copyright 2007 Pearson Education, Inc. or its affiliates. All rights reserved. This document is the property of Pearson Education,

More information

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University The Effect of Extensive Reading on Developing the Grammatical Accuracy of the EFL Freshmen at Al Al-Bayt University Kifah Rakan Alqadi Al Al-Bayt University Faculty of Arts Department of English Language

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

What is beautiful is useful visual appeal and expected information quality

What is beautiful is useful visual appeal and expected information quality What is beautiful is useful visual appeal and expected information quality Thea van der Geest University of Twente T.m.vandergeest@utwente.nl Raymond van Dongelen Noordelijke Hogeschool Leeuwarden Dongelen@nhl.nl

More information

Strategic Practice: Career Practitioner Case Study

Strategic Practice: Career Practitioner Case Study Strategic Practice: Career Practitioner Case Study heidi Lund 1 Interpersonal conflict has one of the most negative impacts on today s workplaces. It reduces productivity, increases gossip, and I believe

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland

More information

TU-E2090 Research Assignment in Operations Management and Services

TU-E2090 Research Assignment in Operations Management and Services Aalto University School of Science Operations and Service Management TU-E2090 Research Assignment in Operations Management and Services Version 2016-08-29 COURSE INSTRUCTOR: OFFICE HOURS: CONTACT: Saara

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Diagnostic Test. Middle School Mathematics

Diagnostic Test. Middle School Mathematics Diagnostic Test Middle School Mathematics Copyright 2010 XAMonline, Inc. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional

More information

Master Program: Strategic Management. Master s Thesis a roadmap to success. Innsbruck University School of Management

Master Program: Strategic Management. Master s Thesis a roadmap to success. Innsbruck University School of Management Master Program: Strategic Management Department of Strategic Management, Marketing & Tourism Innsbruck University School of Management Master s Thesis a roadmap to success Index Objectives... 1 Topics...

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA

More information

Scoring Guide for Candidates For retake candidates who began the Certification process in and earlier.

Scoring Guide for Candidates For retake candidates who began the Certification process in and earlier. Adolescence and Young Adulthood SOCIAL STUDIES HISTORY For retake candidates who began the Certification process in 2013-14 and earlier. Part 1 provides you with the tools to understand and interpret your

More information

Measurement. Time. Teaching for mastery in primary maths

Measurement. Time. Teaching for mastery in primary maths Measurement Time Teaching for mastery in primary maths Contents Introduction 3 01. Introduction to time 3 02. Telling the time 4 03. Analogue and digital time 4 04. Converting between units of time 5 05.

More information

HLTCOE at TREC 2013: Temporal Summarization

HLTCOE at TREC 2013: Temporal Summarization HLTCOE at TREC 2013: Temporal Summarization Tan Xu University of Maryland College Park Paul McNamee Johns Hopkins University HLTCOE Douglas W. Oard University of Maryland College Park Abstract Our team

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

The Role of String Similarity Metrics in Ontology Alignment

The Role of String Similarity Metrics in Ontology Alignment The Role of String Similarity Metrics in Ontology Alignment Michelle Cheatham and Pascal Hitzler August 9, 2013 1 Introduction Tim Berners-Lee originally envisioned a much different world wide web than

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology Michael L. Connell University of Houston - Downtown Sergei Abramovich State University of New York at Potsdam Introduction

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.

More information

TRAITS OF GOOD WRITING

TRAITS OF GOOD WRITING TRAITS OF GOOD WRITING Each paper was scored on a scale of - on the following traits of good writing: Ideas and Content: Organization: Voice: Word Choice: Sentence Fluency: Conventions: The ideas are clear,

More information

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

CHAPTER 4: REIMBURSEMENT STRATEGIES 24 CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS Arizona s English Language Arts Standards 11-12th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS 11 th -12 th Grade Overview Arizona s English Language Arts Standards work together

More information

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1 Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Leader s Guide: Dream Big and Plan for Success

Leader s Guide: Dream Big and Plan for Success Leader s Guide: Dream Big and Plan for Success The goal of this lesson is to: Provide a process for Managers to reflect on their dream and put it in terms of business goals with a plan of action and weekly

More information

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Paper #3 Five Q-to-survey approaches: did they work? Job van Exel

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information