1 Pereira,F.; Singer, Y.; Tishby, N. Beyond Word N-Grams. 2 enabled by Geva Patz
|
|
- Rosalind Warren
- 5 years ago
- Views:
Transcription
1 Information Explication from Computer-Transcribed Conversations: The Weighted Markov Classifier Nathan Eagle MIT Media Lab, 20 Ames St., Cambridge, MA Abstract In this research, I attempt to extract information from a computer-transcribed conversation. Leveraging the SDK of a sophisticated, commercial speech recognition system, I have created a classifier that weights words based on the output confidence factor from the recognition engine, and maximizes the probability of the word frequency using both 0th and first order Markov models. I demonstrate the success of this algorithm when applied to location classification, even when used on different speakers. While location certainty cannot be achieved from this data alone, given enough labeled training sets, a type of 'situational awareness' can be developed. Applications for such awareness could potentially include topic spotting, conversation mediation, and virtual memory augmentation. Finally I will discuss other potential impacts of this technology within the realm of ubiquitous computing. Background The accuracy of commercial voice recognition systems has reached a point where almost half of all words spoken by a single individual can be correctly recognized during most conversations. Extracting information from gigabytes of transcribed conversation data is a job for an appropriate machine learning algorithm. However, data analysis on a sequence appears to be more difficult when the sequence can be drawn from an unbounded set. In Problem Set 4 we looked at character frequency and took data from 27 possible values. Words, however, are in principle unbounded because there is always a non-zero probability of finding a word never previously encountered. 1 Despite this fact, this project is able to take advantage of limitations in the vocabulary both of the speaker and of the speech recognition engine to condense a working vocabulary into a set of approximately 5000 words. Data Gathering I collected data for one month, between November and December Both at home and at lab, I installed a wireless network to support streaming audio directly to my computer. I wore a wireless microphone during most of the day and, after informing people of the purpose of the apparatus, began to record many of my daily conversations. I captured data in two ways. For the first fifteen days, the audio data was sent directly into a recognition engine and was transcribed in real-time. During the latter portion of the data gathering process, I captured my conversations as audio files for later processing. Saving the raw audio data enabled me to gather more information about the data such as word confidence factor and time stamp. This richer set of data was later used as my test set. After each conversation I saved the audio file with a "label name" including place, people involved, and subject matter (ie: lab.vish.vikram.mla.wav or home.liz.dinner.wav). During that month I recorded over 30 hours of my daily conversations and transcribed approximately 30,000 words. Enabling Software and Hardware and Additional Data TCL interface to the ViaVoice Recognition Engine Leveraging the sophisticated algorithms behind a commercial speech recognition system, a TCL script was written 2 to interface with the ViaVoice SDK. Recorded conversations in a 22kHz, 16 bit.wav file format were placed in a select directory where they were input into the script. The script's output corresponds to the transcribed words from the ViaVoice recognition engine. Along with each word is a time stamp marking the beginning of the utterance and a recognition confidence factor ranging from to 100. The three data types in each output data file (transcribed word, timestamp, and confidence factor) are then input into a Matlab program that parses the data into a 3xn cell array. Wireless Microphone Transmitters and Receivers AudioTechnica wireless microphones and receivers were used for all of the data collection. The settings on each microphone were adjusted manually and the AF gain was tuned for optimal sound quality. The two receivers were set up in my office and at home. They input raw sound into either the recognition engine or a sound editor that converted the file to the correct data type for later processing. (22kHz, 16 bit, mono) Third Person Audio Data For additional training data, I used 10 hours of labeled audio data gathered in the similar environments with the help of Brian Clarkson. 1 Pereira,F.; Singer, Y.; Tishby, N. Beyond Word N-Grams 2 enabled by Geva Patz
2 Discussion of Possible Classification Models: Prediction Suffix Trees (PSTs) Modeling longer-range regularities is quite computationally intensive due to a size explosion caused by model order. Indeed, each bi-gram used in this paper from a simple bounded 5000-word vocabulary takes well over 150 MB of memory. An uncompressed tri-gram would need almost 2 GB of RAM. These current computation constraints prohibit traditional methods of classification to capture even relatively local dependencies that exceed model order. However, a prediction suffix tree (PST) has nodes that store suffixes of previous inputs and can provide a probability distribution for possible subsequent words in a computationally efficient manner. This ability makes PSTs a good model choice for many text classification problems. Unfortunately, even with advanced features such as 'wildcards' that allow a particular word position to be ignored in a prediction, PSTs are not the best candidate to model data output from today's voice recognition engines. The output word from the engine has an accuracy of approximately 50%, and when the engine is incorrect, it often models one word as several, or vice versa. Longer streams of data are not likely to be repeatable by the audio engine. In addition, the grammar model discovered by the PST will likely be more highly correlated with that of the speech recognition engine rather than the actual grammar spoken. Hence the utility of the PST model is no longer appropriate to this application. Linear Regression Linear regression could be used on this type of data; using each word in the vocabulary as a "feature" and weighting each feature according to how instrumental it is in the classification would probably provide a satisfactory answer given an unlimited number of training sets. However, 5000 discrete features would have to be trained, which signifies that an enormous amount of training sets would be necessary to use this method for text classification. 0th and First Order Markov Models Using word frequency to explicate location is an obvious choice for this classification problem. Intuitively, it is clear there should be many words in our vocabularies that are extremely 'location-dependent'. These words are expected to have a frequency that correlates to social setting, and indirectly to location. The vocabulary that people use in different locations can be clustered into sets that, although they may have significant overlap, have features that will remain relatively distinct. The zeroth order Markov model simply creates a normalized 5000-by-1 histogram of word frequency for each of the training data (home vs. lab), and finds the log-likelihoods of a test stream of data. The classification thus disregards all contextual information. The first order Markov model in contrast, considers pairs of neighboring words and places them into a similar 5000-by histogram. The Classifiers Four classifiers were chosen for this data: a 0th and 1st order Markov model, and a modified version of the two models using word confidence factor outputs. Both of the unmodified the Markov models are straightforward classifiers whose outputs depend on a log probability to determine the appropriate classification. Both methods also add pseudovalues to the count histograms in order to prevent log(0) problems. Finally, the methods are normalized such that each value corresponds to an actual probability of the word or sequence. Each word in a test set corresponds to one of the 5000 words in the model's working vocabulary. The test set of n words is then turned into a stream of n numbers using a 'key' which correlates every individual word with a number from Eq 1.) 0th Order Markov: n j = 1 log( count1( stream( j))) Eq 2.) 1st Order Markov: n 1 j = 1 log( count2( stream( j), stream( j + 1))) The other two classifiers incorporate additional information from the TCL script related to the confidence factor of each word generated from the speech recognition engine. The confidence factors are distributed in a Gaussian manner with values ranging from zero to one. The pseudovalues are again added to the counts and then the total is normalized. However, now the confidence factors are placed within the probability summation. By multiplying the confidence factor with the initial count probabilities, the classifier becomes biased towards the words that have a greater probability of being correct. Thus, the classifier attempts to overcome the fact that every other word in the data set is incorrect by shifting the weights of the features. Changing the exponential q on the confidence factor (conf(i)) determines how heavily to weight the confidence data. Eq 3.) 0th Order Weighted Markov: n j = 1 n 1 j = 1 q log( count1( stream( j)) * conf ( j) ) Eq 4.) 1st Order Weighted Markov: q log( count2( stream( j), stream( j + 1))* conf ( j) * conf ( j + 1) Evaluation of Results Metrics for Comparison After training on approximately 20 hours of data, the classifiers were given 10 hours of test data. Two methods were used to assess the success of each classifier: certainty and speed. Certainty was defined by the two log-likelihoods q
3 associated with the 'home' and 'lab' classes, where LL1> LL2: LL1 Eq 5.) cert = 1 LL2 Eq 6.) cert2 = LL1 LL2, Speed was defined as the number of words required before the classifier converges on what will be a final answer. Initial Over-fitting of the Confidence Factor Weights The weight of the confidence factor (signified by q in equation 3 and 4) was initially high due to prior thinking that the poor word accuracy from the recognition engine would dramatically degenerate the underlying classification performance. However, it was soon learned that setting the value of q above one would begin to have the opposite effect on classification certainty. Although the optimal number varied between test sets, q was finally set to one for the following tests. Certainty of the four Classifiers When the classifiers are applied to the transcribed conversations, it was interesting to note how insignificant the confidence weights were in the overall log probability. It can be seen from the example in Figure 3 and 4 that there is almost no difference in final log probabilities. When the 0th order approach was compared with the 1st order Markov model it was surprising to see that the 0th order model output a higher certainty than the Markov model in most of the data sets. 'learning' could begin. 3 The current training data incorporated a total of 30,000 words, which may not have been enough to model the 5000-by-5000 matrix adequately. The other explanation relates to subtleties within the speech recognition output. When ViaVoice misses a word two times, it seldom substitutes the same 'wrong' word both times. Since there is so much discrepancy (half of all words are incorrect), any type of contextual assumptions related to one word following the next could be invalid. Although much more data collecting needs to be done, this initial analysis supports the theory that vocabulary is indeed 'location-centric' and that some locations elicit similar vocabulary across multiple individuals. However, to make this theory more substantiated it will be necessary to increase the class size (more locations) and gather conversation data from many individuals. While meaning cannot be generated from this data alone, given enough labeled training sets, a type of 'situational awareness' can be developed. 4 Applications for such awareness could potentially include topic spotting, conversation mediation, and virtual memory augmentation. With our future soon to be inundated with similar data sets, applications that employ the classification techniques discussed in this paper shall play a large role in the lives of many. Convergence Comparison Due to the lack of influence the confidence factors had in determining the overall log-probability, it was expected that they did not influence the speed of the convergence. However, it is important to note that again, the 0th order model converges faster than the first order Markov model in the majority of the test sets (see Figure 1 and 2). Speaker Independence Initial data supports the theory that similar location elicits the common vocabulary from multiple people. Over 90% of the data from an individual uninvolved in generating the training data was classified correctly. Conclusions and Further Work It has been demonstrated that higher order Markov models generally tend to perform worse than 0th order Markov models when trained with a limited set of computertranscribed data. There are two possible explanations for this result related to either training data size or speech engine characteristics. For a first order Markov model, even a vocabulary limited to 500 of the most common words would still require over 25,000 training words before any 3 Discussion with Tommi Jaakkola (12/5/01) 4 Jebara,T; Ivanov, Y; Rahimi, A; Pentland, A Tracking Conversational Context.
4 Appendix I. - Data Gathered Training Data: Home liz.dinner.8pm liz.afterdinner.930pm adlar.bora.dinner liz.gamenight family.phone home.alldaysat liz.dinner.5pm Total Training Home Word Count: 13,000 words Lab geva.school marco.computers mla.meeting pentlandians rao.vik.vish.mla vikram.phc steve.markov Total Training Lab Word Count: 22,000 words Test Data: Home Lab liz.dinner - CORRECT liz.cleaning - CORRECT liz.phonecalls - INCORRECT (Both 0th and 1st Order) homeallday - CORRECT dipesh.phone - CORRECT dipesh2.phone - CORRECT dipesh.marco.india - CORRECT niloy.rec7 - CORRECT niloy2.rec7 - CORRECT tanzeem.grad - CORRECT vik.vish.mla - CORRECT tar.steve.modems - CORRECT 0th Order, INCORRECT 1st Order
5 Appendix II. - Figures Fig Typical data set (niloy.rec7.dat) classified with a 0th order Markov Fig Typical data set (niloy.rec7.dat) classified with a 1st order Markov model Fig Certainty of the 0th and 1st order Markov Models
6 Fig Certainty of the two Classifiers modified with Confidence Values
Lecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationPurdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study
Purdue Data Summit 2017 Communication of Big Data Analytics New SAT Predictive Validity Case Study Paul M. Johnson, Ed.D. Associate Vice President for Enrollment Management, Research & Enrollment Information
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationCircuit Simulators: A Revolutionary E-Learning Platform
Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationMath-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade
Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationCSC200: Lecture 4. Allan Borodin
CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More informationENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering
ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationHow to Judge the Quality of an Objective Classroom Test
How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationMYCIN. The MYCIN Task
MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationMathematics process categories
Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationFive Challenges for the Collaborative Classroom and How to Solve Them
An white paper sponsored by ELMO Five Challenges for the Collaborative Classroom and How to Solve Them CONTENTS 2 Why Create a Collaborative Classroom? 3 Key Challenges to Digital Collaboration 5 How Huddle
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationA Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems
A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems Hannes Omasreiter, Eduard Metzker DaimlerChrysler AG Research Information and Communication Postfach 23 60
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationActivities, Exercises, Assignments Copyright 2009 Cem Kaner 1
Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of
More informationOn Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC
On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationUsing Moodle in ESOL Writing Classes
The Electronic Journal for English as a Second Language September 2010 Volume 13, Number 2 Title Moodle version 1.9.7 Using Moodle in ESOL Writing Classes Publisher Author Contact Information Type of product
More informationAssessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationConversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games
Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games David B. Christian, Mark O. Riedl and R. Michael Young Liquid Narrative Group Computer Science Department
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationPROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING
PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING Mirka Kans Department of Mechanical Engineering, Linnaeus University, Sweden ABSTRACT In this paper we investigate
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationDevice Independence and Extensibility in Gesture Recognition
Device Independence and Extensibility in Gesture Recognition Jacob Eisenstein, Shahram Ghandeharizadeh, Leana Golubchik, Cyrus Shahabi, Donghui Yan, Roger Zimmermann Department of Computer Science University
More informationUsing the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT
The Journal of Technology, Learning, and Assessment Volume 6, Number 6 February 2008 Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the
More informationIntegrating simulation into the engineering curriculum: a case study
Integrating simulation into the engineering curriculum: a case study Baidurja Ray and Rajesh Bhaskaran Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, New York, USA E-mail:
More informationAn Introduction to the Minimalist Program
An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationCarnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.
Carnegie Mellon University Department of Computer Science 15-415/615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014 Homework 2 IMPORTANT - what to hand in: Please submit your answers in hard
More informationSpoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers
Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie
More informationA Version Space Approach to Learning Context-free Grammars
Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)
More informationA Pipelined Approach for Iterative Software Process Model
A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,
More informationWhy Did My Detector Do That?!
Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationGCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)
GCSE Mathematics A General Certificate of Secondary Education Unit A503/0: Mathematics C (Foundation Tier) Mark Scheme for January 203 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge and RSA)
More informationGCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education
GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationMathematics Scoring Guide for Sample Test 2005
Mathematics Scoring Guide for Sample Test 2005 Grade 4 Contents Strand and Performance Indicator Map with Answer Key...................... 2 Holistic Rubrics.......................................................
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationGenerating Test Cases From Use Cases
1 of 13 1/10/2007 10:41 AM Generating Test Cases From Use Cases by Jim Heumann Requirements Management Evangelist Rational Software pdf (155 K) In many organizations, software testing accounts for 30 to
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More information