Entropy-Guided Feature Induction for Structure Learning

Size: px
Start display at page:

Download "Entropy-Guided Feature Induction for Structure Learning"

Transcription

1 Entropy-Guided Feature Induction for Structure Learning Eraldo R. Fernandes 1 and Ruy L. Milidiú 2 1 Faculdade de Computação Universidade Federal de Mato Grosso do Sul, Brazil eraldo@facom.ufms.br 2 Departamento de Informática Pontifícia Universidade Católica do Rio de Janeiro, Brazil milidiu@inf.puc-rio.br Abstract. In many natural language processing tasks, the output comprises several variables with complex interdependencies, as sequences, trees, alignments and clusterings. Structure learning consists of learning a mapping from inputs to complex outputs by processing a dataset of input-output pairs. Feature generation is an important subtask of structure learning and usually addressed by a domain expert that builds complex feature templates by conjoining the available basic features. This is a limited, expensive approach and is recognized as a modeling bottleneck. Here, we extend the Structure Learning framework by incorporating an automatic feature induction method, which is guided by entropy. We experimentally evaluate our framework on five natural language processing tasks involving four languages, namely: Portuguese, English, Chinese and Arabic. These experiments correspond to nine datasets, with four on Portuguese. For six datasets, our systems remarkably achieve the best known performances, being competitive with the state-of-art on the others. Furthermore, our coreference resolution systems achieved the very first place on the CoNLL-2012 Shared Task. The competing systems were ranked by the mean score over three languages: Arabic, Chinese and English. We further assess our method by experimentally comparing it with two important alternative feature generation methods, namely manual template generation and polynomial kernel. The experimental findings indicate that our method is more attractive than both alternatives. Keywords: Structure Learning, Natural Language Processing, Feature Induction, Entropy 1 Introduction Many important problems involve the prediction of a structure comprising many variables with complex interdependencies. Several structure learning (SL) problems arise from natural language processing (NLP). For instance, dependency The integral work can be obtained at

2 2 Fernandes and Milidiú parsing (DP) is to identify a tree underlying a sentence, whereas coreference resolution consists of clustering entity mentions in a document. The structure learning framework [2, 3, 12] is a general approach to directly learn a mapping function from inputs to complex outputs. The core of the SL framework is the prediction problem, which is an arbitrary optimization problem whose objective function is linear on some joint input-output representation. Hence, this approach is a kind of linear discriminant model. Linear models are pervasive in ML, mainly because there are efficient training algorithms to estimate such models [1, 2, 16]. On the other hand, many SL problems require models that are highly non-linear on the available features. Therefore, when training a linear SL model, it is necessary to use some feature generation method in order to provide the required nonlinear features. Feature generation is frequently solved by a domain expert that builds complex, discriminative templates by conjoining the available features. Manual template generation is a limited, expensive way to obtain feature templates and is recognized as a modeling bottleneck. Another popular alternative is to employ a kernel function if the learning algorithm is kernelized. Besides the fact that kernelized training algorithms are computationally expensive, it is difficult to control its generalization performance. Here, we extend the Structure Learning framework by incorporating an automatic feature induction method, which is called Entropy-Guided Feature Induction (EFI). The resulting framework is called Entropy-Guided Structure Learning Framework (ESL). We experimentally assess ESL by evaluating its performance on nine datasets, which involve five NLP tasks and four languages, including four Portuguese datasets. ESL is competitive on all datasets and reduces the smallest known error on seven cases, including three Portuguese datasets. We also achieved the very first place on the prestigious CoNLL-2012 Shared Task using ESL-based systems. This shared task was to resolve coreference on three languages, namely Arabic, Chinese and English. We further assess EFI by experimentally comparing it to manual template generation and polynomial kernel, two important feature generation alternative methods. EFI outperforms both methods on the considered datasets. Additionally, EFI is much cheaper than manual templates and computationally faster than kernel methods. This work has six main contributions: (i) the ESL framework; (ii) a comparison of EFI with manual templates and polynomial kernels; (iii) nine ESL-based systems for fundamental NLP tasks; (iv) state-of-the-art systems for three Portuguese tasks, namely POS tagging, text chunking and quotation extraction; (v) the first place on the CoNLL-2012 Shared Task on multilingual coreference resolution; and (vi) state-of-the-art systems for coreference resolution on Arabic, Chinese and English. Partial results of this work have been published by the authors in two journal papers [6, 11], the most recent in Computational Linguistics. We have also presented some preliminary findings in eight international conferences, namely: WWW 2009 [7], STIL 2009 [9], PROPOR 2010 [10], CoNLL 2010 Shared Task [4], CICLing 2010 [15], ECML-PKDD 2011 [3], EMNLP-CoNLL 2012 Shared Task [5] and PROPOR 2012 [8].

3 Entropy-Guided Feature Induction for Structure Learning 3 2 Entropy-Guided Structure Learning Framework ESL naturally extends the SL framework by automatically inducing feature templates, solving a relevant modeling bottleneck. In this section, we describe the SL framework and how EFI extends it to derive the ESL framework. 2.1 Prediction via Optimization The core of the SL framework is the prediction problem, which, for a given input x, is formulated as the following w-parameterized optimization problem F (x; w) = arg max y Y(x) w, Φ(x, y), where Y(x) is the set of feasible output structures for the input x;, is the scalar product operator; w is the vector of parameters, which comprises the learned model; and Φ(x, y) is a vector of arbitrary real-valued feature functions, or simply features, that jointly represent an input-output pair. Thus, the prediction problem is to find the output with the highest score, where the score is given by a discriminant function that is linear on some joint feature representation. The feature vector Φ(x, y) and the output domain Y(x) are arbitrary and task dependent. In dependency parsing, for instance, the output domain comprises all possible rooted trees whose nodes are sentence tokens, and the features are functions of a dependency arc. Hence, DP prediction is to solve a maximum branching problem. 2.2 Entropy-Guided Feature Induction EFI is based on the same strategy of Entropy-Guided Transformation Learning [14]. When modeling a Machine Learning problem, a set of features is usually available. These features encode basic information about the examples. In Dependency Parsing, for instance, datasets include features that are either naturally present in sentences, as words, or automatically generated by external systems, as POS tags and lemmas. We call this available information basic features. EFI automatically derives a set of basic feature conjunctions, which we call feature templates. These templates are later used to generate the derived features, which comprise the feature vectors Φ(x, y) used in the Structure Learning modeling. In SL problems, the basic features are functions of the local decision variables that compose the output structure. In DP, features are functions of a dependency arc, because these are the local decision variables that compose a dependency tree. The first step in EFI is to build the basic dataset, which contains an example for each local decision variable. The basic dataset features are the basic features in the original dataset. Many decision tree (DT) learning algorithms use the gain ratio to iteratively select the most informative feature. The gain ratio is a normalized version of the information gain measure, which is based in entropy. Hence, these algorithms provide a quick way to obtain entropy-guided feature selection. EFI uses the well known Quinlan s C4.5 algorithm to train a DT using the basic dataset. This DT predicts the local decision variable given the basic features that mostly reduce its entropy. In Figure 1 (a), we present a sample DT for Dependency

4 4 Fernandes and Milidiú Parsing. EFI uses a very simple scheme to extract feature templates from a DT. It considers the paths from the root node to all DT nodes. In Figure 1 (b), we present the induced templates. For each path, a template is created by conjoining the features of its nodes. Because we aim to generate feature templates conjunctions of basic features that do not include feature values we ignore the feature values (arc labels) and the decision variable values (leaves) in the DT, considering only the features in the internal nodes. (a) Decision Tree (b) Induced Feature Templates dist dist mod-pos dist mod-pos head-pos dist mod-pos side dist side Fig. 1. Feature template induction from a decision tree. 2.3 Learning Algorithm Some algorithms are frequently used to train SL models. Due mainly to simplicity and extensibility, our learning algorithm [3] is a Large Margin extension of Collins Structured Perceptron [1]. A simplified version of this algorithm is presented in Figure 2. The first step of this algorithm is to employ EFI in order to derive nonlinear feature templates from the basic dataset D. Then, these templates are used to generate the derived features Φ(x, y) in the SL dataset Φ(D). As the binary perceptron, ESL training algorithm is an iterative procedure. On each iteration, the algorithm draws an example (x, y, Φ(x, y)) from the training dataset Φ(D), makes a prediction using the current model w, and updates the model parameters according to the difference between the predicted output ŷ and correct output y. Φ(D) EFI(D) w 0 while no convergence Draw (x, y, Φ(x, y)) from Φ(D) ŷ arg max y Y(x) w, Φ(x, y ) w w + Φ(x, y) Φ(x, ŷ) return w Fig. 2. ESL training algorithm.

5 Entropy-Guided Feature Induction for Structure Learning 5 3 Experimental Evaluation We apply ESL to nine datasets involving five NLP tasks, namely: POS tagging, text chunking, dependency parsing, quotation extraction and coreference resolution. There are four languages involved in the tasks Arabic, Chinese, English and Portuguese. Next, we briefly describe our main experimental findings. 3.1 Portuguese Language We apply ESL to four Portuguese datasets, namely: (i) Mac-Morpho for POS Tagging,(ii) Bosque for Text Chunking,(iii) CoNLL-2006 for Dependency Parsing,and (iv) GloboQuotes for Quotation Extraction.In Table 1, we present the performances achieved by the ESL systems along with the best known performances for each dataset. Our systems improve the best known performance on three datasets. Additionally, it is important to notice that the best performing system for Portuguese DP uses high-order feature functions that could be incorporated in the ESL modeling, which would substantially improve ESL performance. When comparing ESL with the best available system that uses the same kind of feature functions, ESL performance is superior, which is reported in the integral work. Task State of the Art ESL Error Accuracy Reduction POS Tagging % Text Chunking % Dependency Parsing % Quotation Extraction % Table 1. State-of-the-art performance comparison for Portuguese datasets. 3.2 Other Languages We apply ESL to other five datasets, namely: Brown Corpus for English POS Tagging, CoNLL-2000 for English Text Chunking, and CoNLL-2012 for Arabic, Chinese, English and Multilingual Coreference Resolution. In Table 2, we present the performances achieved by the ESL systems along with the best known performances for each dataset. ESL-based systems improve the best known performance on four cases. It is important to notice that the best performing system for English POS Tagging is a committee of models. We could have combined several ESL models to improve our final performance on this task. Moreover, when comparing ESL performance to one single model, ESL is better. The multilingual coreference resolution corresponds to the average performance on the

6 6 Fernandes and Milidiú Arabic, Chinese and English datasets, as reported for the CoNLL-2012 Shared Task. ESL system obtained the best performance on this competition. However, at that occasion, ESL performance for Chinese coreference resolution was not the best known result. After the competition, we solved a minor issue with ESL modeling for this particular dataset and achieved the performance reported in this table, which is currently the best known result. Task Language State of the Art ESL Error Accuracy Reduction POS Tagging English % Text Chunking English % Coreference Resolution Arabic % Coreference Resolution Chinese % Coreference Resolution English % Coreference Resolution Multilingual % Table 2. State-of-the-art performance comparison for non-portuguese languages. 3.3 EFI Assessment We further assess EFI by experimentally comparing it to manual template generation and quadratic kernel on three datasets. In Table 3, we summarize these results. Observe that EFI outperforms both alternative methods on the evaluated datasets. Additionally, EFI is much cheaper than manual templates and computationally faster than kernel methods. Alternative System EFI Task Method F 1 F 1 Error Reduction Portuguese DP Manual Templates % English Chunking Quadratic Kernel % Portuguese Chunking Quadratic Kernel % Table 3. Comparison of EFI with other feature generation methods. We report on another experiment, in which we apply ESL to the CoNLL Coreference Resolution datasets using only the 70 basic features we use in our modeling, that is not using EFI templates. It is important to notice that these 70 basic features include several complex features. Some of these features are even conjunctions of simpler basic features, and others provide complex task dependent information, as head words and agreement on number and gender. These 70 basic features were manually generated by domain experts and encode valuable coreference information. In Table 4, we report the performance of the

7 Entropy-Guided Feature Induction for Structure Learning 7 systems trained with the 70 basic features alone and the performance of the systems trained with EFI feature templates. We observe that the CoNLL score is impressively improved by 8.54 points when using EFI-based features. Moreover, EFI consistently outperforms the baseline on all languages. EFI Arabic Chinese English CoNLL Score No Yes Table 4. Impact of EFI on the CoNLL-2012 development sets. 4 Conclusions We propose the entropy-guided structure learning framework that extends the general SL framework by integrating EFI, an automatic feature generation approach. Our empirical findings indicate that EFI is faster than kernel methods and avoids overfitting. Compared to manual feature templates, the fact that EFI bypasses domain experts is highly valuable. We evaluated ESL on nine datasets involving five natural language processing tasks and four different languages. ESL presents state-of-the-art comparable performances on all evaluated datasets. Moreover, it outperforms the previous best performing systems on six datasets, namely the Mac-Morpho dataset for Portuguese POS tagging, the Bosque dataset for Portuguese text chunking, the GloboQuotes dataset for Portuguese quotation extraction, the CoNLL-2012 Shared Task datasets for Arabic, Chinese, and multilingual coreference resolution. Additionally, ESL coreference systems achieved the first place on the CoNLL-2012 Shared Task on multilingual coreference resolution. As future work, we plan to introduce second and third order features in our models for dependency parsing and coreference resolution. For coreference resolution, some authors [13] report that features based on partial clusters bring substantial improvements on performance. We also plan to extend our latent modeling in order to include such type of features. Text chunking and named entity recognition have been recurrently recast as sequence labeling problems. Nevertheless, these tasks require sentence segmentation and, additionally, segment classification. We plan to model such tasks directly as a sequence segmentation and classification problem. In fact, we have successfully done this for quotation extraction. By using this modeling, we will be able to use more meaningful features for these tasks and, hopefully, improve performance.

8 8 Fernandes and Milidiú References 1. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. pp. 1 8 (2002) 2. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 3, (2003) 3. Fernandes, E.R., Brefeld, U.: Learning from partially annotated sequences. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece (2011) 4. Fernandes, E.R., Crestana, C.E.M., Milidiú, R.L.: Hedge detection using the Rel- Hunter approach. In: Proceedings of the Conference on Computational Natural Language Learning Shared Task. pp (2010) 5. Fernandes, E.R., dos Santos, C.N., Milidiú, R.L.: Latent structure perceptron with feature induction for unrestricted coreference resolution. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning Shared Task. pp Jeju Island, Korea (2012) 6. Fernandes, E.R., Milidiú, R.L., Rentería, R.P.: RelHunter: a machine learning method for relation extraction from text. Journal of the Brazilian Computer Society 16, (2010) 7. Fernandes, E.R., Milidiú, R.L., dos Santos, C.N.: Portuguese language processing service. In: Proceedings of the International World Wide Web Conference WWW in Ibero-America Alternate Track (2009) 8. Fernandes, E.R., Milidiú, R.L.: Entropy-guided feature generation for structured learning of Portuguese dependency parsing. In: Proceedings of the Conference on Computational Processing of the Portuguese Language. pp (2012) 9. Fernandes, E.R., Pires, B.A., dos Santos, C.N., Milidiú, R.L.: Clause identification using entropy guided transformation learning. In: Proceedings of the Brazilian Symposium in Information and Human Language Technology. pp (2009) 10. Fernandes, E.R., dos Santos, C.N., Milidiú, R.L.: A machine learning approach to Portuguese clause identification. In: Proceedings of the Computational Processing of the Portuguese Language. pp (2010) 11. Fernandes, E.R., dos Santos, C.N., Milidiú, R.L.: Latent trees for coreference resolution. Computational Linguistics 40(4) (2014), (to appear) 12. Joachims, T.: Learning to align sequences: A maximum-margin approach. Tech. rep., Cornell University (2003) 13. Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the Conference on Computational Natural Language Learning Shared Task. pp (2011) 14. dos Santos, C.N., Milidiú, R.L.: Foundations of Computational Intelligence, Volume 1: Learning and Approximation, Studies in Computational Intelligence, vol. 201, chap. Entropy Guided Transformation Learning, pp Springer (2009) 15. dos Santos, C.N., Milidiú, R.L., Crestana, C.E.M., Fernandes, E.R.: ETL ensembles for chunking, NER and SRL. In: Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing. pp (2010) 16. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, (2005)

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

An OO Framework for building Intelligence and Learning properties in Software Agents

An OO Framework for building Intelligence and Learning properties in Software Agents An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

USER ADAPTATION IN E-LEARNING ENVIRONMENTS USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing Jan C. Scholtes Tim H.W. van Cann University of Maastricht, Department of Knowledge Engineering.

More information

Experiments with a Higher-Order Projective Dependency Parser

Experiments with a Higher-Order Projective Dependency Parser Experiments with a Higher-Order Projective Dependency Parser Xavier Carreras Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) 32 Vassar St., Cambridge,

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

A Vector Space Approach for Aspect-Based Sentiment Analysis

A Vector Space Approach for Aspect-Based Sentiment Analysis A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

A student diagnosing and evaluation system for laboratory-based academic exercises

A student diagnosing and evaluation system for laboratory-based academic exercises A student diagnosing and evaluation system for laboratory-based academic exercises Maria Samarakou, Emmanouil Fylladitakis and Pantelis Prentakis Technological Educational Institute (T.E.I.) of Athens

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Building a Semantic Role Labelling System for Vietnamese

Building a Semantic Role Labelling System for Vietnamese Building a emantic Role Labelling ystem for Vietnamese Thai-Hoang Pham FPT University hoangpt@fpt.edu.vn Xuan-Khoai Pham FPT University khoaipxse02933@fpt.edu.vn Phuong Le-Hong Hanoi University of cience

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download

More information

Team Formation for Generalized Tasks in Expertise Social Networks

Team Formation for Generalized Tasks in Expertise Social Networks IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

arxiv: v1 [cs.cv] 10 May 2017

arxiv: v1 [cs.cv] 10 May 2017 Inferring and Executing Programs for Visual Reasoning Justin Johnson 1 Bharath Hariharan 2 Laurens van der Maaten 2 Judy Hoffman 1 Li Fei-Fei 1 C. Lawrence Zitnick 2 Ross Girshick 2 1 Stanford University

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v2 [cs.cv] 30 Mar 2017 Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Short Text Understanding Through Lexical-Semantic Analysis

Short Text Understanding Through Lexical-Semantic Analysis Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China

More information

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Natural Language Processing: Interpretation, Reasoning and Machine Learning Natural Language Processing: Interpretation, Reasoning and Machine Learning Roberto Basili (Università di Roma, Tor Vergata) dblp: http://dblp.uni-trier.de/pers/hd/b/basili:roberto.html Google scholar:

More information

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Optimizing to Arbitrary NLP Metrics using Ensemble Selection Optimizing to Arbitrary NLP Metrics using Ensemble Selection Art Munson, Claire Cardie, Rich Caruana Department of Computer Science Cornell University Ithaca, NY 14850 {mmunson, cardie, caruana}@cs.cornell.edu

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

A deep architecture for non-projective dependency parsing

A deep architecture for non-projective dependency parsing Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Named Entity Recognition: A Survey for the Indian Languages

Named Entity Recognition: A Survey for the Indian Languages Named Entity Recognition: A Survey for the Indian Languages Padmaja Sharma Dept. of CSE Tezpur University Assam, India 784028 psharma@tezu.ernet.in Utpal Sharma Dept.of CSE Tezpur University Assam, India

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

Welcome to. ECML/PKDD 2004 Community meeting

Welcome to. ECML/PKDD 2004 Community meeting Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,

More information

An investigation of imitation learning algorithms for structured prediction

An investigation of imitation learning algorithms for structured prediction JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer

More information

An Efficient Implementation of a New POP Model

An Efficient Implementation of a New POP Model An Efficient Implementation of a New POP Model Rens Bod ILLC, University of Amsterdam School of Computing, University of Leeds Nieuwe Achtergracht 166, NL-1018 WV Amsterdam rens@science.uva.n1 Abstract

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information