Corpus-Based Terminology Extraction

Size: px
Start display at page:

Download "Corpus-Based Terminology Extraction"

Transcription

1 CHAPTER NUMBER Corpus-Based Termiology Extractio ALEXANDRE PATRY AND PHILIPPE LANGLAIS Termiology maagemet is a key compoet of may atural laguage processig activities such as machie traslatio (Laglais ad Carl, 2004), text summarizatio ad text idexatio. With the rapid developmet of sciece ad techology cotiuously icreasig the umber of techical terms, termiology maagemet is certai to become of the utmost importace i more ad more cotet-based applicatios. While the automatic idetificatio of terms from texts has bee the focus of past studies (Jacquemi, 2001) (Castellví et al, 2001), the curret tred i Termiology Maagemet (TM) has shifted to the issue of term etworkig (Kageura et al, 2004). A possible explaatio of this shiftig may lie i the fact that Termiology Extractio (TE), although beig a oisy activity, ecompasses well established techiques that seem difficult to improve sigificatly upo. Despite this shift, we do believe that better extractio of terms could carry over subsequet steps of TM. A traditioal TE system usually ivolves a subtle mixture of liguistic rules ad statistical metrics i order to idetify a list of cadidate terms where it is hoped that terms are raked first. We distiguish our approach to TE from traditioal oes i two differet ways. First, we give back to the user a active role i the extractio process. That is, istead of ecodig a static defiitio of what might or might ot be a term, we let the user specify his ow. We do so by askig him to set up a traiig corpus (a corpus where the terms have bee idetified by a huma) from which our extractor will lear how to defie a term. Secod, our approach is completely automatic ad is readily adapted to the tools (part-of-speech tagger, 1

2 Alexadre Patry ad Philippe Laglais lemmatizer) ad metrics of the user. Oe might object that requirig a traiig corpus is askig the user to do a part of the job the machie is supposed to do, but we see it i a differet way. We cosider that a little help from the user could pay back i flexibility. The structure of our paper outlies the three steps ivolved i our approach. I the followig sectio, we describe our algorithm to idetify cadidate terms. I the third sectio, we itroduce the differet metrics we compute to score them. The fourth sectio explais how we applied AdaBoost (Freud ad Schapire, 1999), a machie learig algorithm, to rak ad idetify a list of terms. We the evaluate our approach o a corpus which was set up by the Office québécois de la lague fraçaise to evaluate commercially available term extractors. We show that our classifier outperforms the idividual metrics used i this study. Fially, we discuss some limitatios of our approach ad propose future works to be doe. Extractio of cadidate terms It is a commo practice to extract cadidate terms usig a part-of-speech (POS) tagger ad a automato (a program extractig word sequeces correspodig to predefied POS patters). Usually, those patters are maually hadcrafted ad target ou phrases, sice most of the terms of iterest are ou phrases (Justeso ad Katz, 1995). Typical examples of such patters ca be foud i (Jacquemi, 2001). As poited out i (Justeso ad Katz, 1995), relyig o a POS tagger ad legitimate patter recogitio is error proe, sice taggers are ot perfect. This might be especially true for very domai specific texts where a tagger is likely to be more erratic. To overcome this problem without givig up the use of POS patters (sice they are easy to desig ad to use), we propose a way to use a traiig corpus i order to automate the creatio of a automato. There are may potetial advatages with this approach. First, the POS tagger ad the taggig errors, to the extet that they are cosistet, will be automatically assimilated by the automato. Secod, this gives to the user the opportuity to specify the terms that are of iterest for him. If may terms ivolvig verbs are foud i the traiig corpus, the automato will reflect that iterest as well. We also observed i iformal experimets that wide spread patters ofte fails to extract may terms foud i our traiig corpus. Several approaches ca be applied whe geeratig a automato from sequeces of POS ecoutered i a traiig corpus. A 2

3 Corpus-Based Termiology Extractio straightforward approach is to memorize all the sequeces see i the traiig corpus. A sequece of words is thus a cadidate term oly if its sequece of POS tags has bee see before. This approach is simple but aive. It caot geerate ew patters that are slight variatios of the oes see at traiig time, ad a isolated taggig error ca lead to a bad patter. To avoid those problems, we propose to geerate the patters usig a laguage model traied o the POS tags of the terms foud i the traiig corpus. A laguage model is a fuctio computig the probability that a sequece of words has bee geerated by a certai laguage. I our case, the words are POS tags ad the laguage is the oe recogizig the sequeces of tags correspodig to terms. Our laguage model ca be described as follow: where P( w1 ) =! P( wi H i ) i= 1 w 1 is a sequece of POS tags ad summarizes the iformatio of the! 1 H i is called the history which i previous tags. To build a automato, we oly have to set a threshold ad geerate all the patters whose probability is higher tha it. A excerpt of such a automato is give i Figure 1. Probability Patter NomC AdjQ NomC Prep NomC NomC Dete-dart-ddef NomC NomC Verb-ParPas NomC Prep Dete-dart-ddef NomC Figure 1 Excerpt of a automatically geerated automato. Aother advatage of such a automato is that all patters are associated with a probability, givig more iformatio tha a biary value (legitimate or ot). Ideed, the POS patter probability is oe of the umerous metrics that we feed our classifier with. Scorig the cadidate terms I the previous sectio, we showed a way to geerate a automato that extracts a set of cadidate terms that we ow wat to rak ad/or filter. Followig may other works o term extractio, we score each 3

4 Alexadre Patry ad Philippe Laglais cadidate usig various metrics. May differet oes have bee idetified i (Daille, 1994) ad (Castellví et al, 2001). We do ot believe that a sigle metric is sufficiet, but istead thik that it is more fruitful to use several of them ad trai a classifier to lear how to take beefit of each of them. Because we thik they are iterestig for the task, we retaied the followig metrics: the frequecy, the legth, the log-likelyhood, the etropy, tf idf ad the POS patter probabilities discussed i the previous sectio. Recall however that our approach is ot restricted to these metrics, but istead ca beefit from ay other oe that ca be computed automatically. Aloe, the frequecy is ot a robust metric to assess the termiological property of a cadidate, but it does carry useful iformatio, as does also the legth of terms. I (Duig, 1993), Duig advocates the use of log-likelyhood to measure whether two evets that occur together do so as a coicidece or ot. I our case, we wat to measure the cohesio of a complex cadidate term (a cadidate term composed of two words or more) by verifyig if its words occur together as a coicidece or ot. The loglikelyhood ratio of two adjacet words ( u ad v) ca be computed with the followig formula (Daille, 1994):! loguv = a log a + b logb + c log c + d log d + N log N! ( a + c) log( a + c)! ( a + b) log( a + b)! ( c + d) log( c + d)! ( d + b) log( d + b) where a is the umber of times uv appears i the documet, b the umber of times u appears ot followed by v, c the umber of times v appears ot preceded by u, N the corpus size ad d the umber of cadidate terms that does ot ivolve u or v. Followig (Russell, 1998), to compute log-likelyhood o cadidate terms ivolvig more tha two words, we keep the miimum value amog the log-likelyhood of each possible split i the cadidate term. With the ituitio that terms are coheret uits that ca appear surrouded by various differet words, we use as well the etropy to rate a cadidate term. The etropy of a cadidate is computed by averagig its left ad right etropy: e e( w ) = 1 e left left ( w ) + e 1 2 ( s) =! right { u: us" C} ( w ) h 1 us ( ) s 4

5 Corpus-Based Termiology Extractio h( x) = x log x where w 1 is the cadidate term ad C is the corpus from which we are extractig the terms. Fially, to weight the saliece of a cadidate term, we also use tf idf. This metric is based o the idea that terms describig a documet should appear ofte i it but should ot appear i may other documets. It is computed by dividig the frequecy of a cadidate term by the umber of documets i a out-of-domai corpus that cotais it. Because tf idf is usually computed o oe word, whe we evaluated complex cadidate terms, we computed tf idf o each of its words ad kept five values: the first, the last, the miimum, the maximum ad the average. I our experimets, the out-of-domai corpus was composed of texts take from the Frech Caadia parliametary debates (the so-called Hasard), totalizig 1.4 millio seteces. Idetifyig terms amog cadidates Oce each cadidate terms is scored, we must decide which oes should fially be elected a term. To accomplish this task, we trai a biary classifier (a fuctio which qualifies a cadidate as a term or ot) o the face of the scores we computed for a cadidate. We use the AdaBoost learig algorithm (Freud ad Schapire, 1999) to build this classifier. AdaBoost is a simple but efficiet learig techique that combies may weak classifiers (a weak classifier must be right more tha half of the time) ito a stroger oe. To achieve this, it trais them successively, each time focusig o examples that have bee hard to classify correctly by the previous weak classifiers. I our experimets, the weak classifiers were biary stumps (biary classifiers that compare oe of the score to a give threshold to classify a cadidate term) ad we limited their umber to 50. A example of such a classifier is preseted i Figure 2. Experimets e right ( s) =! { u: su" C} su ( ) Our commuity lacks a commo bechmark o which we could compare our result with others. I this work, we applied our approach to a corpus called EAU. It is composed of six texts dealig with water supply. Its complex terms have bee listed by some members or the h s 5

6 Alexadre Patry ad Philippe Laglais Office québécois de la lague fraçaise for a project called ATTRAIT (Atelier de Travail Iformatisé du Termiologue) whose mai objective was to evaluate existig software solutios for the termiologist 1. Iput: A scored cadidate term c " = 0 if etropy(c) > 1.6 the " = " else " = " if legth(c) > 1.6 the " = " else " = " if " > 0 the retur term else retur ot-term Figure 2 A excert from a classifier geerated by the Adaboost learig algorithm. I our experimets, we kept the preprocessig stage as simple as possible. The corpus ad the list of terms were automatically tokeized, lemmatized ad had their POS tagged with a i-house package (Foster, 1991). Oce preprocessed, the EAU corpus is composed words ad 208 terms. Of these 208 terms, 186 appear without sytactic variatio (as they were listed) a total of 400 times. Sice the terms of our evaluatio corpus are already idetified, it is straightforward to compute the precisio ad the recall of our system. Precisio (resp. recall) is the ratio of terms correctly idetified by the system over the total umber of terms idetified as such (resp. over the total umber of terms maually idetified i the list). We evaluated our system usig five fold cross-validatio. This meas that the corpus was partitioed ito five subsets ad that five experimets were ru each time testig with a differet subset ad traiig the automato ad the classifier with the four others. Each traiig set (resp. testig set) was composed of about (resp. 3000) words cotaiig a average of about 150 (resp. 50) terms. Because oly complex terms are listed ad because we do ot cosider term variatios, our results oly cosider complex terms that appear without variatio. Also, after iformal experimets, we set the miimum probability of a patter to be accepted by our automato to The performace of our system, averaged o the five fold of the cross-validatio, ca be foud i Table 1. From the results, we ca see that the automato has a high recall but a low precisio, which was to be expected. Ideed, the automato is oly 1. See for more details o this project. 6

7 Corpus-Based Termiology Extractio a rough filter that elimiates easy to elimiate word sequeces, but keep as much terms as possible. O the other had, the selectio did ot perform as well as we expected. Its low recall ad precisio could be explaied by the metrics that are ot as expressive as we though ad by the fact that 75% of the terms i our test corpora appears oly oe time. Whe a term appears oly oe time, its frequecy ad etropy become useless. The results preseted i Table 2 seem to cofirm our hypothesis. Extractio Idetificatio Part µ! Precisio Recall Precisio Recall Overall system Precisio Recall Table 1 Mea ( µ ) ad stadard deviatio (! ) of the precisio ad recall of the differet parts of our system. Because we wated to compare our system with the idividual metrics that it uses, we had to modify it such that it raks the cadidate terms istead of simply acceptig or rejectig them. To do so, we made our system retur " istead of term or ot term (see Figure 2). We the sorted the cadidate terms i decreasig order of their " value. A commo practice whe comparig rakig algorithms is to build their ROC (receivig operator curve), which shows the ratio of good idetificatios (y axis) agaist the ratio of bad idetificatio (x axis) for all the acceptatio thresholds. The best curve will augmet i y faster tha i x, so will have a greater area uder it. We ca see i Figure 3 that our system performs better tha etropy or log-likelyhood aloe. This leads us to believe that differet scores carry differet iformatio ad that combiig them, as we did it, is fruitful. Discussio ad future works I this paper, we preseted a approach to automatically geerate a ed-to-ed term extractor from a traiig corpus. We also proposed a way to combie may statistical scores i order to extract terms more efficietly tha whe each score is used i isolatio. Because of the ature of the traiig algorithm, we ca easily exted 7

8 Alexadre Patry ad Philippe Laglais the set of metrics we cosidered here. Eve a priori kowledge could be itegrated by specifyig keywords before the extractio ad settig a score to oe whe a cadidate term cotais a keyword or zero otherwise. The same flexibility is achieved whe the automato is created. By geeratig it directly from the output of the POS tagger, our solutio does ot deped of a particular tagger ad is tolerat to cosistet taggig errors. Criteria µ! Cadidates appearig oe time Precisio Recall Cadidates appearig at least two times Precisio Recall Table 2 Compariso of the performace of the term idetificatio part for cadidates appearig with differet frequecies. Figure 3 The ROC of our system (AdaBoost) agaist two other score whe we traied our system o oe half of our corpus ad tested o the other. A greater area uder the curve is better. A shortcomig of this work is that we did ot treat term variatios. Termiology variatio is a well-kow pheomeo, whose amout is estimated accordig to (Kageura et al., 2004) from 15% to 35%. We thik that the best way to deal with them i our framework would be to 8

9 Corpus-Based Termiology Extractio itroduce a preprocessig stage where variatios are ormalized to a caoical form. Term variatios have bee extesively studied i (Jacquemi, 2001) ad (Daille, 2003). I our experimets, we focused o complex terms. Because some scores do ot apply to simple terms (e.g. log-likelyhood ad legth), we thik that the best way to extract simple terms would be to trai a dedicated classifier. Ackowledgemets We would like to thak Hugo Larochelle who foud the corpus we used i our experimets ad Elliott Macklovitch who made some useful commets o the first draft of this documet. This work has bee subsidized by NSERC ad FQRNT. Refereces Castellví, M. Teresa Cabré; Bagot, Rosa Estopà; Palastresi, Jordi Vivaldi; Automatic Term Detectio: A Review of Curret Systems i Recet advaces i computatioal termiology. Joh Bejami, Daille, Béatrice; Study ad Implemetatio of Combied Techiques for Automatic Extractio of Termiology i The Balacig Act: Combiig Symbolic ad Statistical Approaches to Laguage. New Mexico State Uiversity, Las Cruces, Daille, Béatrice; Coceptual structurig through term variatios i Proceedigs of the ACL Workshop o Multiword Expressios: Aalysis, Acquisitio ad Treatmet Duig, Ted; Accurate Methods for the Statistics of Surprise ad Coicidece Foster, George; Statistical lexical disambiguatio, Master Thesis. McGill Uiversity, Motreal, Freud, Y.; Schapire, R.E.; A Short Itroductio to Boostig i Joural of Japaese Society for Artificial Itelligece Jacquemi, Christia; Spottig ad Discoverig Terms through Natural Laguage Processig. MIT Press, Justeso, Joh S.; Katz, Slava M.; Techical Termiology: Some Liguistic Properties ad a Algorithm for Idetificatio i Text i Natural Laguage Egieerig Kageura, Kyo; Daille, Béatrice; Nakagawa, Hiroshi; Chie, Lee-Feg; Recet Treds i Computatioal Termiology i Termiology. Joh Bejami,

10 Alexadre Patry ad Philippe Laglais Laglais, Philippe; Carl, Michael; Geeral-purpose statistical traslatio egie ad domai specific texts: Would it work? i Termiology. Joh Bejami,

Natural language processing implementation on Romanian ChatBot

Natural language processing implementation on Romanian ChatBot Proceedigs of the 9th WSEAS Iteratioal Coferece o SIMULATION, MODELLING AND OPTIMIZATION Natural laguage processig implemetatio o Romaia ChatBot RALF FABIAN, MARCU ALEXANDRU-NICOLAE Departmet for Iformatics

More information

arxiv: v1 [cs.dl] 22 Dec 2016

arxiv: v1 [cs.dl] 22 Dec 2016 ScieceWISE: Topic Modelig over Scietific Literature Networks arxiv:1612.07636v1 [cs.dl] 22 Dec 2016 A. Magalich, V. Gemmetto, D. Garlaschelli, A. Boyarsky Uiversity of Leide, The Netherlads {magalich,

More information

'Norwegian University of Science and Technology, Department of Computer and Information Science

'Norwegian University of Science and Technology, Department of Computer and Information Science The helpful Patiet Record System: Problem Orieted Ad Kowledge Based Elisabeth Bayega, MS' ad Samso Tu, MS2 'Norwegia Uiversity of Sciece ad Techology, Departmet of Computer ad Iformatio Sciece ad Departmet

More information

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev Titre du documet / Documet title E-learig usability : A learer-adapted approach based o the evaluatio of leaer's prefereces Auteur(s) / Author(s) TERZIEVA Valetia ; PAVLOV Yuri (1) ; ANDREEV Rume (2) ;

More information

Management Science Letters

Management Science Letters Maagemet Sciece Letters 4 (24) 2 26 Cotets lists available at GrowigSciece Maagemet Sciece Letters homepage: www.growigsciece.com/msl A applicatio of data evelopmet aalysis for measurig the relative efficiecy

More information

Consortium: North Carolina Community Colleges

Consortium: North Carolina Community Colleges Associatio of Research Libraries / Texas A&M Uiversity www.libqual.org Cotributors Collee Cook Texas A&M Uiversity Fred Heath Uiversity of Texas BruceThompso Texas A&M Uiversity Martha Kyrillidou Associatio

More information

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent Fuzzy Referece Gai-Schedulig Approach as Itelliget Agets: FRGS Aget J. E. ARAUJO * eresto@lit.ipe.br K. H. KIENITZ # kieitz@ita.br S. A. SANDRI sadra@lac.ipe.br J. D. S. da SILVA demisio@lac.ipe.br * Itegratio

More information

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING  Version 1.1, September 2014 preview begis oct 2014 lauches ja 2015 INTRODUCING WWW.FEEDBACKCOMMONS.ORG A serviced cloud platform to share ad compare feedback data ad collaboratively develop feedback ad learig practice CONSTITUENT

More information

Application for Admission

Application for Admission Applicatio for Admissio Admissio Office PO Box 2900 Illiois Wesleya Uiversity Bloomig, Illiois 61702-2900 Apply o-lie at: www.iwu.edu Applicatio Iformatio I am applyig: Early Actio Regular Decisio Early

More information

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO HANDBOOK Career Ceter Hadbook CALIFORNIA STATE UNIVERSITY, SACR AMENTO Tools & Tips for Career Search Success Academic Advisig ad Career Ceter 6000 J Street Lasse Hall 1013 Sacrameto, CA 95819-6064 916-278-6231

More information

part2 Participatory Processes

part2 Participatory Processes part part2 Participatory Processes Participatory Learig Approaches Whose Learig? Participatory learig is based o the priciple of ope expressio where all sectios of the commuity ad exteral stakeholders

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

VISION, MISSION, VALUES, AND GOALS

VISION, MISSION, VALUES, AND GOALS 6 VISION, MISSION, VALUES, AND GOALS 2010-2015 VISION STATEMENT Ohloe College will be kow throughout Califoria for our iclusiveess, iovatio, ad superior rates of studet success. MISSION STATEMENT The Missio

More information

also inside Continuing Education Alumni Authors College Events

also inside Continuing Education Alumni Authors College Events SUMMER 2016 JAMESTOWN COMMUNITY COLLEGE ALUMNI MAGAZINE create a etrepreeur creatig a busiess a artist creatig beauty a citize creatig the future also iside Cotiuig Educatio Alumi Authors College Evets

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

2014 Gold Award Winner SpecialParent

2014 Gold Award Winner SpecialParent Award Wier SpecialParet Dedicated to all families of childre with special eeds 6 th Editio/Fall/Witer 2014 Desig ad Editorial Awards Competitio MISSION Our goal is to provide parets of childre with special

More information

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary michiga veteriary medical associatio i this issue... 3 Great Lakes Veteriary Coferece 4 What You Need to Kow Whe Issuig a Iterstate Certificate of Ispectio 6 Low Pathogeic Avia Iflueza H5 Virus Detectios

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

DERMATOLOGY. Sponsored by the NYU Post-Graduate Medical School. 129 Years of Continuing Medical Education

DERMATOLOGY. Sponsored by the NYU Post-Graduate Medical School. 129 Years of Continuing Medical Education Advaces i DERMATOLOGY THURSDAY - FRIDAY JUNE 7-8, 2012 New York, NY Sposored by the NYU Post-Graduate Medical School 129 Years of Cotiuig Medical Educatio THE RONALD O. PERELMAN DEPARTMENT OF DERMATOLOGY

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Methods for the Qualitative Evaluation of Lexical Association Measures

Methods for the Qualitative Evaluation of Lexical Association Measures Methods for the Qualitative Evaluation of Lexical Association Measures Stefan Evert IMS, University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart, Germany evert@ims.uni-stuttgart.de Brigitte Krenn Austrian

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Mathematics process categories

Mathematics process categories Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Variations of the Similarity Function of TextRank for Automated Summarization

Variations of the Similarity Function of TextRank for Automated Summarization Variations of the Similarity Function of TextRank for Automated Summarization Federico Barrios 1, Federico López 1, Luis Argerich 1, Rosita Wachenchauzer 12 1 Facultad de Ingeniería, Universidad de Buenos

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Task Tolerance of MT Output in Integrated Text Processes

Task Tolerance of MT Output in Integrated Text Processes Task Tolerance of MT Output in Integrated Text Processes John S. White, Jennifer B. Doyon, and Susan W. Talbott Litton PRC 1500 PRC Drive McLean, VA 22102, USA {white_john, doyon jennifer, talbott_susan}@prc.com

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Accuracy (%) # features

Accuracy (%) # features Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten How to read a Paper ISMLL Dr. Josif Grabocka, Carlotta Schatten Hildesheim, April 2017 1 / 30 Outline How to read a paper Finding additional material Hildesheim, April 2017 2 / 30 How to read a paper How

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Test Effort Estimation Using Neural Network

Test Effort Estimation Using Neural Network J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish

More information

Life and career planning

Life and career planning Paper 30-1 PAPER 30 Life and career planning Bob Dick (1983) Life and career planning: a workbook exercise. Brisbane: Department of Psychology, University of Queensland. A workbook for class use. Introduction

More information

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

Integrating Semantic Knowledge into Text Similarity and Information Retrieval Integrating Semantic Knowledge into Text Similarity and Information Retrieval Christof Müller, Iryna Gurevych Max Mühlhäuser Ubiquitous Knowledge Processing Lab Telecooperation Darmstadt University of

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

HLTCOE at TREC 2013: Temporal Summarization

HLTCOE at TREC 2013: Temporal Summarization HLTCOE at TREC 2013: Temporal Summarization Tan Xu University of Maryland College Park Paul McNamee Johns Hopkins University HLTCOE Douglas W. Oard University of Maryland College Park Abstract Our team

More information

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System IBM Software Group Mastering Requirements Management with Use Cases Module 6: Define the System 1 Objectives Define a product feature. Refine the Vision document. Write product position statement. Identify

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

A Pipelined Approach for Iterative Software Process Model

A Pipelined Approach for Iterative Software Process Model A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,

More information