Using dialogue context to improve parsing performance in dialogue systems

Size: px
Start display at page:

Download "Using dialogue context to improve parsing performance in dialogue systems"

Transcription

1 Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh Abstract We explore how to incorporate information from dialogue context to improve the selection of logical forms in the parsing components of dialogue systems. We present a machine learning approach which allows us to identify the most informative elements of the dialogue context for this task, and to improve the performance of a parser in a dialogue system by using a classifier. Features for the classifier are extracted from a dialogue manager [Lemon and Gruenstein, 2004] which implements the Information State Update approach to dialogue management. Our best result to date is a 54.5% reduction in parse error rate, compared to the baseline system of Lemon and Gruenstein [2004]. 1 Introduction Dialogue managers are responsible for controlling the overall behaviour of dialogue systems. One of their main functions is to keep track of information which constitutes the dialogue context. Another critical component of a dialogue system is its parser responsible for producing semantic representations of user inputs, for later integration in the dialogue context. A common problem is that there may be multiple outputs from the parser, each representing the semantics of a different reading of the user input, or representing parses of different speech recognition hypotheses. For example see figure 1, an extract from our corpus 1. This shows n different logical forms produced by the parsing component (Gemini [Dowding et al., 1993]) 1 This example corresponds to the user utterance go to the tower. This has a manual transcription, but the utterance was passed through the speech recogniser and generates n hypotheses. Each hypothesis has a confidence score and a corresponding logical form. 1

2 of a dialogue system (WITAS, Lemon and Gruenstein [2004]), for a single user utterance. In this work we aim to choose one logical form from the available hypotheses (or to reject the interaction) using information from the hypothesis and the dialogue context. In this example, the baseline system behaviour is to choose the first hypothesis, but the desired behaviour is to choose the second hypothesis, since it corresponds to the transcription of the user utterance. We also have the goal of identifying the informative elements of the dialogue context for this task. In order to accomplish this we use a machine learning approach to build a classifier which is able to identify correct hypotheses from the parser. Transcription : go to the tower Logical form : [command([go],[param_list([[pp_loc(to,arg([np([det([def],the), [n(tower,sg)]])]))]])])] Hypothesis 1 : go to the towers Confidence : 70 Logical form : [command([go],[param_list([[pp_loc(to,arg([np([det([def],the), [n(tower,pl)]])]))]])])] Hypothesis 2 : go to the tower Confidence : 70 Logical form : [command([go],[param_list([[pp_loc(to,arg([np([det([def],the), [n(tower,sg)]])]))]])])]... Hypothesis 5 : to the tower Confidence : 65 Logical form : wh_answer([param_list([[pp_loc(to,arg([np([det([def],the), [n(tower,sg)]])]))]])])... Figure 1: Example corpus excerpt The paper is structured as follows: in section 2, the main characteristics of the Information State Update approach and the dialogue manager are presented. In section 3, we present our chosen technique the Maximum Entropy machine learner. Section 4 explains the experimental setup, and section 5 sets out our results. 2 The Information State Approach The Information State (IS) approach is a theoretical framework for dialogue management. Information States are informally defined in Larsson and Traum [2000] as follows: 2

3 The term Information State of a dialogue represents the information necessary to distinguish it from other dialogues, representing the cumulative additions from previous actions in the dialogue, and motivating future action. In the present work, the WITAS dialogue system [Lemon et al., 2001, 2002; Lemon and Gruenstein, 2004] was used to collect data. This is a multimodal command and control dialogue system that allows a human operator to interact with a simulated unmanned aerial vehicle : a small robotic helicopter. The WITAS system uses the multi-threaded dialogue manager presented in Lemon et al. [2002] and Lemon and Gruenstein [2004]. This dialogue manager implements the Information State Update approach to dialogue management (see e.g. Larsson and Traum [2000]). The main characteristic of this system s dialogue manager is that it keeps the history of dialogue moves in a tree structure, while other versions of dialogue systems usually use a stack. Each of the branches of the tree represents a thread of the conversation. An attachment algorithm relates incoming moves with the branches of the tree. Much of the information in the dialogue context is derived from the logical forms (LFs) which encode the meanings of utterances produced by the user or system. Various data structures store this information in a systematic way. For instance, in Figure 2 a logical form is presented. This logical form corresponds to the utterance go to the tower. The head of the logical form represents the class of dialogue move (e.g. command, question, answer). The rest of the LF encodes the structure of the sentence and its meaning. In this case, the main verb is go which has as a parameter one destination. In the example, the destination is marked by the use of the preposition to which has a noun phrase as an argument. The noun phrase represents the destination, the tower. command([go],[param_lst([pp_loc(to,arg([np(det([def],the), [n(tower,sg)])]))])]) Figure 2: Logical form for go to the tower The multi-threaded dialogue manager uses the following data structures: Dialogue Move Tree (DMT ); Active Node List (ANL); Activity Tree (AT ); System Agenda (SA); Pending List (PL); Salience List (SL); Salient Task (ST ); Modality Buffer (MB) see Lemon and Gruenstein [2004] for full details. The DMT is used as a message board 3

4 to keep a record of the history of the dialogue contributions (the moves made by both the user and the system). Each branch in the tree represents a thread in the conversation. The ANL marks the active nodes on the DMT an active node indicates a conversational contribution that is relevant to the current discourse (e.g. an open question). The AT (Activity Tree) stores the current, past, and planned activities of the back-end system. The SA (System Agenda) collects all the utterances that the system intends to produce. The SL (Salience List) is a list of NPs (noun phrases) introduced in the current dialogue ordered by recency. The ST (Salient Task) structure is a list which contains the tasks which have been recently introduced in the dialogue. Finally, the MB buffers click events on the GUI. 3 Maximum Entropy Learning Maximum Entropy (ME) is a machine learning technique which is popular in Natural Language Processing since it has been shown to perform well in different classification tasks (e.g. CONLL03 [Tjong Kim Sang and De Meulder]). Most of this success derives from the fact that the generated model is based on as few assumptions as possible, which allows a lot of flexibility during classification. The created model, which represents the learnt behaviour, relies on a set of features f i and a set of weights λ i which constrain these features. During experimentation several sets of features are proposed to generate different models until a good one is found. A complete list of the features used to generate the different models in our work can be found in subsection 4.3. In particular we use the implementation of ME introduced by Lee [2004] given our previous experience with the tool. For each user utterance, our Maximum Entropy classifier will take as input the features of the current dialogue context (see section 4.3) and of the logical forms of the hypotheses to be classified, and return a decision whether to accept or reject each logical form. 4 Experimental setup We present our experiments as follows: in subsection 4.1, the corpus; in subsection 4.2, the baseline; in subsection 4.3, the extracted features. 4

5 4.1 The corpus In our experiments we used the corpus developed in Lemon and Gruenstein [2004]; Lemon [2004]; Gabsdil and Lemon [2004]. The corpus corresponds to the interactions of six subjects from Edinburgh University (4 male, 2 female) in which they each perform five simple tasks with the WITAS system, resulting in 30 complete dialogues. The corpus is made up of the manual transcription of each utterance, the 10-best hypotheses from the speech recogniser (Nuance 8.0), the logical form of the transcriptions and hypotheses from the parser (Gemini [Dowding et al., 1993]), and the Information States of the system [Lemon and Gruenstein, 2004]. The corpus consists of 339 utterances. Only 303 utterances are intelligible to the system, which means that 36 correspond to utterances which were identified as noise. The corpus has a total of 188 types of utterances, because many utterances were used several times (e.g. yes, go here ). Table 1 shows the 10 most frequent utterances in the corpus. Utterance Occurrences Utterance Occurrences fly to the church 17 now 10 fly to the tower 16 fight the fire 9 fly to the warehouse 15 yes 9 zoom in 12 show me the buildings 9 land 10 take off 8 Table 1: Most frequent user utterances 4.2 The baseline system Our baseline is the behaviour of the WITAS Dialogue System described in Lemon and Gruenstein [2004]; Lemon [2004]; Gabsdil and Lemon [2004]. Note that this system had good task completion rates (see Hockey et al. [2003]) and thus constitutes a sound baseline. The baseline performance was evaluated by the analysing of the logs from the user study. Each of the hypotheses was labelled with accept or reject labels depending on the reaction of the system. At this point, there are four cases: 1. One hypothesis is accepted, and it corresponds with the manual transcription. This is a true positive (tp) event. 2. One hypothesis is accepted, but it does not correspond with the manual transcription. This is a false positive (fp) event (e.g. the user said 5

6 Now but the system accepted a logical form corresponding to no ). 3. None of the hypotheses were accepted, but there was a manual transcription for this interaction. This means that the system failed to recognise an input that should have been accepted. This is a false negative (fn) event. 4. Finally, none of the hypothesis was accepted, but there was not a logical form for manual transcription. This means that the interaction should have been rejected by the system (e.g. user self-talk) This is a true negative (tn) event. For the baseline case the precision, recall and F-B1 figures were calculated, and we obtained the results presented in Table 2. These results allow us to evaluate the performance of the baseline system. The strategy used by the system is good at classifying false negatives (only 2 cases), this situation implies an impressive Recall figure, however the precision is not good (there are 97 fps). From Table 2 we can draw our first conclusion: the strategy used by the baseline to choose the logical form is not adequate, since there are many fp. An improvement in the behaviour of the parser would imply a reduction in the number of fp. A key challenge for the classifier we will construct is the case when the logical form to be accepted is not the first hypothesis but appears later in the list. In total, there are 22 cases where the correct logical form is in the set of LF hypotheses but is not the first case. Precision 60.08% Recall 98.65% FB % Table 2: The baseline system evaluation 4.3 The Context Features We divide the features into three groups (speech processing, logical form, and information state), depending on the type of information they represent: Speech processing features In this group there are just two features: the confidence of the speech recogniser in the hypothesis and the position of this related to the other hypothe- 6

7 ses. The confidence is a score from 0 to 100%, in this case we created 10 groups each representing 10 units of confidence Logical Form features We extract information about the dialogue move class, the main verb, and the complement in order to collect features of the logical form. The following features are extracted in this group: Dialogue move class (dm): this is codified on the head of the logical form. (e.g., command and wh_query). Main verb (verb): the main verb involved in the logical form. Complements (comps): in this case the information corresponds to the complements of the main verb. We extract the noun (noun), the number (num), and the the preposition (prep) (if the noun phrase is headed by a preposition) Information State Features We extract IS features from: the hypothesis, current and past moves, and the information state data structures. In particular, we extracted dialogue context features from the following data structures of the Information State: Dialogue move tree (DMT ): In this case we extracted from the n-most active nodes the dm and verb features (for 2 n 6) Salience list: In this case we extracted from the n most recent noun phases the noun and num features (for 2 n 4) Last n-moves: This corresponds to the dm and verb from the last moves (for 1 n 2). This case can be considered as uni-gram or bi-gram of the dialogue moves. We also extracted the turn which corresponds to an identifier of the speaker who had the previous turn (user or system). No more information was used in this case. Table 3 summarises the context features. 7

8 Feature Confidence (conf ) Position (pos) Turn (turn) Dialogue Move (dm) Main verb (verb) Complements (comps) Noun phrase (noun), number(num) Source speech recogniser speech recogniser Information State hypothesis, current move, previous moves, DMT hypothesis, current move, previous moves, DMT hypothesis, current move Salience List (SL) 4.4 Selection procedure Table 3: Context Features for the Classifier The last important point related to set the experimentation is the procedure for selecting a hypothesis. For each user utterance, the classifier receives a set of hypotheses. The classifier then labels them as accept or reject with a certain confidence score. To choose one hypothesis from this list, or else reject the utterance, we use the following procedure: 1. Scan the list of classified n-best logical forms top-down. Return the highest confidence hypothesis which was classified as accept, and pass it to the dialogue manager as the LF chosen for the current move. 2. If step 1 fails, classify the interaction as rejected. (This means that all the hypotheses were classified as reject.) 5 Results and Analysis We used the standard metrics precision, recall and F B1 to measure the performance of the system. Of course, one important limitation of the current study is the small size of the corpus. For this reason a leave-oneout cross validation method was chosen to evaluate performance. Our current results, after experimenting with several combinations of feature groups, are shown in Table 4. Note that the raw figure for parse correctness of the system has risen by 17.8%. This is a 54.5% reduction in parse error rate. We expect that better results could be achieved with further experimentation and parameter optimisation. In qualitative terms, the strategy used by the classifier is also better than the baseline. First, there is not such a large difference between the precision and recall figures. Second, it was possible for the classifier to identify cases where the first hypothesis does not correspond to the correct one. 8

9 Baseline ME Classifier True positives False positives True negatives False negatives 2 23 Precision 60.08% 87.04% Recall 98.65% 85.98% FB % 86.50% Parse correctness 67.33% 85.15% Table 4: Results: Baseline versus Maximum Entropy Classifier During experimentation with different feature groups, we found the following contextual features to be most informative for this task, 1. Speech processing (both confidence and position features) 2. Dialogue move, main verb, and complement of the hypothesis 3. Dialogue move, main verb, and complement of the previous move 4. The turn feature 5. The dialogue move of the four most active nodes on the DMT 5.1 Error Analysis Even though the improvement of the system is promising against the baseline (11.75% on the figure of F B1), there is still a lot of work to do. The first type of error corresponds to the false positives. There are three causes of these errors: plural and singular nouns, adverbs and adjectives, and speech processing. It was difficult for the system to distinguish between plural and singular nouns. In many cases the plural/singular and position features were the only difference between the available hypotheses. The created classifier showed a preference for the singular cases, since they got a better position on the n-best list hypotheses of the speech recogniser. Finally, some of the errors were caused by the fact that the speech recogniser components was introducing some noise in the hypotheses as extra information. These cases were harder to classify since the hypotheses were too different to the actual utterance. A second group of errors corresponds to the false negatives. In this case the only pattern we found is that the confidence score is relatively low 9

10 Num interactions 302 Classes accept, ignore, clarify, ignore Baseline 61.81% weighted f-score Best Result 86.39% weighted f-score Table 5: Characteristics and results of Gabsdil and Lemon [2004] compared with the standard cases (this is in the range of 50 65). This makes it impossible for the classifier to choose one of the hypotheses, so it rejects all of them, generating in this case a false negative. The third type of errors is related to the cases where the correct hypothesis is not the first one. The system was able to identify 8 of the 22 cases. Most of the cases which were not identified are related to the use of plurals, singulars, adjectives and adverbs. 5.2 Comparison with previous work The closest related work is that of Gabsdil and Lemon [2004]. However, that research focuses on speech processing. It explores a combination of acoustic confidence and pragmatic features to predict the quality of incoming speech recognition hypotheses. Gabsdil and Lemon [2004] also used Machine Learning (TiMBL and RIPPER) with parameter optimisation to identify the most relevant features (the results are presented in Table 5). As can be seen from the table, these experiments include more classes. Additionally to accept and reject, the clarify and ignore classes were used. This makes it difficult to compare the results. However, a key point is that in both cases the information which was contained in the dialogue context was used, and in both cases this inclusion improved the performance of the components. 6 Further work There are several directions for further work. One of the most important is to apply this approach to larger corpora. A corpus used in future work has to be larger in two respects. First, in the actual size of the corpus, and secondly, in the size of the dialogues involved. We plan to run similar experiments on the communicator corpus Walker et al. [2001] which is being annotated with Information States in the TALK Project Another important direction to explore is to measure the actual impact of the improvement in parsing on the dialogue system overall (e.g. on user 10

11 task completion metrics). Up to now, we have supposed that the improvement shown on the parsing task implies an improvement of the dialogue system. However, we do not currently have an estimate about how much this improvement on the dialogue system is. Since this work is related to Gabsdil and Lemon [2004], a useful direction to be explored is the integration of both approaches. There are two ways to integrate them. First, we can put both filters together in a pipeline architecture. Second, we can incorporate the features of Gabsdil and Lemon [2004] with our features and use only one filter. In both cases we do not expect the increment in the performance to be the sum of both approaches. Another direction to be explored is related to the use of more robust parsers (e.g. the TRIPS parser Allen [1995]). 7 Conclusion The results presented above confirm that information from dialogue context can be used to improve the performance of the parsing component of a dialogue system. Our best result, using a Maximum Entropy learner, results in a 54.5% reduction in parse error rate, compared to the baseline system of Lemon and Gruenstein [2004]. Further, we also identified the most informative elements of the dialogue context for this task. We used techniques and features which we can expect to generalise across dialogue systems which implement the Information State Update approach to dialogue management. Acknowledgements This research was supported by a CONACYT / scholarship and by the TALK Project (Sixth Framework Programme of the European Community, contract no. IST , 2 References J. F. Allen. Natural Language Understanding. Benjamin Cummings, J. Dowding, J. M. Gawron, D. E. Appelt, J. Bear, L. Cherny, R. Moore, and D. B. Moran. GEMINI: A natural language system for spoken-language 2 The authors are solely responsible for the content of this document. It does not represent the opinion of the European Community, and the Community is not responsible for any use that might be made of the information contained herein. 11

12 understanding. In Meeting of the Association for Computational Linguistics, pages 54 61, M. Gabsdil and O. Lemon. Combining acoustic and pragmatic features to predict recognition performance in spoken dialogue systems. In Proc. Association for Computational Linguistics (ACL-04), B.-A. Hockey, O. Lemon, E. Campana, L. Hiatt, G. Aist, J. Hieronymus, A. Gruenstein, and J. Dowding. Targeted help for spoken dialogue systems: intelligent feedback improves naive users performance. In Proceedings of European Association for Computational Linguistics (EACL 03), pages , S. Larsson and D. Traum. Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit. Natural Language Engineering, 6(3-4): , Z. Lee. Maximum Entropy Modeling Toolkit for Python and C++, URL O. Lemon. Context-sensitive speech recognition in Information-State Update dialogue systems: results for the grammar switching approach. In Proc. 8th Workshop on the Semantics and Pragmatics of Dialogue, CAT- ALOG 04, O. Lemon, A. Bracy, A. Gruenstein, and S. Peters. The WITAS Mult-Modal Dialogue System I. In EuroSpeech, O. Lemon and A. Gruenstein. Multithreaded context for robust conversational interfaces: context-sensitive speech recognition and interpretation of corrective fragments. ACM Transactions on Computer-Human Interaction (ACM TOCHI), 11(3): , O. Lemon, A. Gruenstein, A. Battle, and S. Peters. Collaborative activities and multi-tasking in dialogue systems. In Proceedings of SIGdial, E. F. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL M. A. Walker, R. J. Passonneau, and J. E. Boland. Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems. In Meeting of the Association for Computational Linguistics, pages ,

BEETLE II: a system for tutoring and computational linguistics experimentation

BEETLE II: a system for tutoring and computational linguistics experimentation BEETLE II: a system for tutoring and computational linguistics experimentation Myroslava O. Dzikovska and Johanna D. Moore School of Informatics, University of Edinburgh, Edinburgh, United Kingdom {m.dzikovska,j.moore}@ed.ac.uk

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Eye Movements in Speech Technologies: an overview of current research

Eye Movements in Speech Technologies: an overview of current research Eye Movements in Speech Technologies: an overview of current research Mattias Nilsson Department of linguistics and Philology, Uppsala University Box 635, SE-751 26 Uppsala, Sweden Graduate School of Language

More information

Dialog Act Classification Using N-Gram Algorithms

Dialog Act Classification Using N-Gram Algorithms Dialog Act Classification Using N-Gram Algorithms Max Louwerse and Scott Crossley Institute for Intelligent Systems University of Memphis {max, scrossley } @ mail.psyc.memphis.edu Abstract Speech act classification

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Andreas Vlachos Computer Laboratory University of Cambridge Cambridge, CB3 0FD, UK av308@cl.cam.ac.uk Caroline Gasperin Computer

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Interactive Corpus Annotation of Anaphor Using NLP Algorithms Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.

More information

Miscommunication and error handling

Miscommunication and error handling CHAPTER 3 Miscommunication and error handling In the previous chapter, conversation and spoken dialogue systems were described from a very general perspective. In this description, a fundamental issue

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

First Grade Curriculum Highlights: In alignment with the Common Core Standards

First Grade Curriculum Highlights: In alignment with the Common Core Standards First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

The Ups and Downs of Preposition Error Detection in ESL Writing

The Ups and Downs of Preposition Error Detection in ESL Writing The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Review in ICAME Journal, Volume 38, 2014, DOI: /icame Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Full text of O L O W Science As Inquiry conference. Science as Inquiry Page 1 of 5 Full text of O L O W Science As Inquiry conference Reception Meeting Room Resources Oceanside Unifying Concepts and Processes Science As Inquiry Physical Science Life Science Earth & Space

More information

CHAT To Your Destination

CHAT To Your Destination CHAT To Your Destination Fuliang Weng 1 Baoshi Yan 1 Zhe Feng 1 Florin Ratiu 2 Madhuri Raya 1 Brian Lathrop 3 Annie Lien 1 Sebastian Varges 2 Rohit Mishra 3 Feng Lin 1 Matthew Purver 2 Harry Bratt 4 Yao

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots Flexible Mixed-Initiative Dialogue Management using Concept-Level Condence Measures of Speech Recognizer Output Kazunori Komatani and Tatsuya Kawahara Graduate School of Informatics, Kyoto University Kyoto

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

An Evaluation of POS Taggers for the CHILDES Corpus

An Evaluation of POS Taggers for the CHILDES Corpus City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 9-30-2016 An Evaluation of POS Taggers for the CHILDES Corpus Rui Huang The Graduate

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

MYCIN. The MYCIN Task

MYCIN. The MYCIN Task MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task

More information

Derivational and Inflectional Morphemes in Pak-Pak Language

Derivational and Inflectional Morphemes in Pak-Pak Language Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes

More information

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1) Houghton Mifflin Reading Correlation to the Standards for English Language Arts (Grade1) 8.3 JOHNNY APPLESEED Biography TARGET SKILLS: 8.3 Johnny Appleseed Phonemic Awareness Phonics Comprehension Vocabulary

More information

Using Semantic Relations to Refine Coreference Decisions

Using Semantic Relations to Refine Coreference Decisions Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu

More information

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL)  Feb 2015 Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) www.angielskiwmedycynie.org.pl Feb 2015 Developing speaking abilities is a prerequisite for HELP in order to promote effective communication

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION SUMMARY 1. Motivation 2. Praat Software & Format 3. Extended Praat 4. Prosody Tagger 5. Demo 6. Conclusions What s the story behind?

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Emotional Variation in Speech-Based Natural Language Generation

Emotional Variation in Speech-Based Natural Language Generation Emotional Variation in Speech-Based Natural Language Generation Michael Fleischman and Eduard Hovy USC Information Science Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 U.S.A.{fleisch, hovy}

More information

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information