Native Language Identification
|
|
- Sheila Wilkinson
- 6 years ago
- Views:
Transcription
1 Native Language Identification Ishan Somshekar Stanford University Symbolic Systems Bogac Kerem Goksel Stanford University Symbolic Systems Huyen Nguyen Stanford University Computer Science Abstract Native Language Identification is the task that involves identifying the first, or native, language of a speaker solely from their speech in another language, in this case primarily English. In this paper, we explore the use of Support Vector Machines and and Neural Networks and compare their performance on a classification task involving NLI. 1 Introduction Native Language Identification is a fascinating problem, as the idea that simple linear classifiers or neural networks can learn differences in how speakers with different native languages will speak English can possibly have far-reaching implications, particularly in the science of language acquisition. If machines can easily learn tendencies and mistakes that language learners make, then it would help facilitate the improvement of education systems. Native language identification is a classification problem, and in this particular dataset there are speakers with 11 different native languages. We approach this problem of 11-way classification with two basic approaches: using linear regression models and using machine learning with neural networks. We explore the use of a variety of different features on both frameworks. Since the dataset can be separated into text (transcriptions of speech) and audio (i-vector representations of the speakers) data, we use a combination of techniques on both kinds of data. 2 Related Work Malmasi and Dras extensively test different models on this task. The authors tested a variety of linear classifiers, concluding that ensemble classifiers are the state of the art approach. They discuss the various features used in great detail, some of which we discuss below. Malmasi and Dras found that using function word and POS-tagged n-grams in addition to simple unigrams, bigrams, and character n-grams was very helpful. In addition, using Stanford CoreNLP parsers to generate CFG s and dependencies for the sentences improved the score. The authors best performing models were ensemble classifiers, which are classifiers that take in the outputs of the first-level classifiers as input and then outputs it s own classification. Stehwein and Pado also analyze the performance of SVM s on this task, and use their results to identify key features of the datasets. Usually, models will learn that speakers with different L1 s(first languages) will have different topic biases. In addition, particular misspellings or mispronunciations can be characterized to specific L1 s. Certain L1 s will also overuse the same words. Linguistic style is also a factor, as there are L1 s like Japanese which favor a much more formal style, than say, French. Malrasi et al 2016 also examine the performance of neural networks when compared to SVM s on this task with a similar dataset, and interestingly find that SVM s tend to outperform NN s, and that character level features generally outperform word level features. 2.1 Data Data Overview The data that we are using for our project is a dataset that was recently released into the public domain by ETS. The dataset contains information collected from English speakers of 11 different native languages. It includes 13,200 essays written by the different speakers, 13,200 transcriptions of the oral part of the test, and the i-vectors that correspond to the pitch, tone and speech of the speakers involved. The dataset is divided into
2 the train set (11,000 responses), dev set (1,100 responses), and test set (1,100 responses). We currently only have access to train set and dev set as the organization has not yet released the test set. Therefore, all the scores reported in this paper are on the dev set released by ETS. The test set will be released at the end of June, and we are excited to see the performance of our models. In this project, we did not use the essays provided and focused on the speech transcriptions and i-vectors. The speech transcriptions are provided by the dataset in text files and are already tokenized into words or distinct sounds made. We collected statistics on the lengths of the speech transcriptions, and found that the length ranges from 0 to 212 words long. Further analysis of the data showed that there is an excellent distribution of transcription length around the median. Visualizations of the data are seen below. Length Mean Median Stddev Max Min Train Dev Figure 1: Distribution of transcription lengths in the train set Figure 2: Distribution of transcription lengths in the dev set We can see that train set and dev set are from very similar distribution. Furthermore, both train and dev sets are perfectly balanced across classes, meaning no bias correction is required in that aspect. The dataset is also balanced in terms of the prompts the speakers had to respond to across languages, eliminating topic biases. Finally, the dataset is also controlled on the rough English proficiency of the speaker (based on their test score). All in all, the dataset provides a good and balanced variety on topics and proficiencies of the speakers. 2.2 Models We have implemented a variety of both linear classifier models and neural network models. We designed numerous different features based on our research. Our simple baseline models are linear classifiers. Since the two datasets that we are using for this task are a collection of speech transcriptions and i-vectors, we first implemented a simple linear classifier that used unigram indicator features on the speech transcription, and then another simple linear classifier that used the same unigram features on the speech transcriptions along with the i-vector for that particular speaker. We planned to add features to these classifiers to improve performance at the same time that we implemented neural net models Features We first began by using more advanced n-gram features. The classifiers use counters on both unigrams and bigrams. Keeping with what related papers on this task had said, people with the same L1 tend to overuse the same words, meaning that counting unigrams and bigrams would help us identify the native language. Additionally, we stemmed the words and then added another unigram feature on the stemmed words, the lemmas, to remove inaccuracies that could arise due to varied word endings. We also used the Spacy library to tag each of our words with their part of speech. We implemented a feature that counted the parts-of-speech in the transcript. Our reasoning here involved the idea that speakers with the same L1 would have similar distributions in POS usage. Interestingly, in contrast to other tasks in NLP where stop-words are often filtered out at the first point of asking, since this task is actually more of stylistic classification as opposed to semantic, we felt that a unigram feature on stop-words would actually be quite helpful. Our reasoning here is again based on the fact that speakers with the same L1 would use similar stop-words when they stopped to think during their speech recordings.
3 Finally, we added the i-vectors as features to the linear classifier. The i-vectors are high dimensional representations of the sounds of the speakers speech. They encode tone, pitch, prosody, and other features, and will of course be excellent ways to classify L1, as they will bring a representation of the accents of the speaker which can be used to identify their L Linear Classifier Our Linear Classifier is a linear SVM classifier of the sklearn toolkit, using the default parameters of squared hinge loss, l2 penaltization and a oneversus-rest approach for multi-label classification. While we considered working with more elaborate ensemble SVM and meta-classifier schemes, we wanted to use the linear classifier as a baseline to compare against the neural network models. As such, we evaluated the performance of a simple SVM against the neural network models when given different sets of features. In general, we evaluated the linear classifier using the following feature combinations (all features presented as unigram and bigram counts of the tokens of the given type): Neural Networks Our first choice for a neural network architecture was a recurrent architecture, due to their success in language-related tasks thanks to their retention of longer term dependencies. Upon experimentation, we ended up using GRU cells as they achieved equal performance as LSTM cells with much lower training times. Our general architecture took a sequence of embeddings for the inputs, fed them through a one or two layer LSTM, and the final hidden state of the LSTM was fed to a fully connected tanh layer, and finally the outputs of the fully connected layer were fed through a softmax layer to be normalized into a probability distribution. We also implemented a bidirectional LSTM model that would read the inputs in both forward and backward directions. The concatenated hidden state from both directions was used to make a prediction. 1. Unstemmed words (unigrams only) 2. Unstemmed words (with bigrams) 3. Unstemmed words + stemmed words 4. Unstemmed words + stemmed words + partof-speech tags 5. Unstemmed words + stemmed words + partof-speech tags + stop words 6. Unstemmed words + stemmed words + partof-speech tags + stop words + i-vectors In the end, the biggest performance gains were achieved by going from unigrams to bigrams, and adding i-vector data. While the high gains from the i-vector data were expected as they contain a lot more information on the acoustic speech patterns of the speakers, we were surprised that other features like part-of-speech tags and stop words did not lead to net increases in performance, despite results by Malmasi and Dras. One reason for this could be that as opposed to a stacked SVM architecture, we fed all the features into the same SVM, leading to overfitting due to too many features. Figure 3: The architecture of our Bidirectional LSTM Neural Network Model We tried several approaches to embedding the inputs to our model. The following are the different inputs to our model that we tested: 1. GLoVe vectors: We used the Wikipedia + GigaWord pretrained GLoVe vectors. These vectors cover a vocab of 400,000 words and are trained over 6,000,000,000 tokens. Despite the size of the GLoVe vocabulary, we found that over 5,000 of the 10,000 unique words seen in our training data were out-ofvocabulary for this vector set. One further improvement here could be using the CommonCrawl GLoVe set, which has a vocab size 1,900,000. However, even then, a lot of the out-of-vocabulary words in our dataset are pause words like uh and umm or words
4 cut in the middle due to hesitation and timing of the essay prompts. In the end, we think that most pretrained word vectors will fail to cover big parts of our vocabulary due to such phenomena only found in speech data, as opposed to written data which most pretrained word vectors are trained on. 2. Random initialized word vectors: Due to the aforementioned deficiencies in pretrained vectors, we also tried using random initialized word vectors as embeddings, and training these embeddings through backpropagation during model training. While this should perform better due to full vocabulary coverage on the train set, the disadvantage of this approach is that adding the word embeddings to the backpropagation means that the model has more parameters, and requires even more data to train well. It also increases the risk of overfitting when ran over multiple epochs over the same training data. In the end, we discovered that this approach ended up being worse than using GLoVe vectors for this dataset and our architecture. 3. Character-level: Another approach we took was to scan the entire training set to build an alphabet of all characters used, and onehot encode each character and feed the onehot encoding to the neural network. In this scheme, the sequences are character sequences. The advantage of a character level model is that it has fewer parameters than a word-embedding model as the dimensionality is smaller, and does not have the vocab issues of pretrained models. However, it still requires more training to learn even longer dependencies (with the longer sequence lengths). As such, given our dataset size limitation, this embedding type also underperformed GLoVe vectors, and we decided to not use it. 4. Part-of-Speech tags, one-hot encoding: Malmasi et Dras state of the art model on NLI on essays makes use of part of speech tags as a feature of their stacked generalization classifier. We wanted to experiment with feeding in the part-of-speech tags manually to the neural network model. While this is not a common practice, we felt that POS tags would be especially helpful in this task, because language transfer affects stylistic/syntactic elements of language production more so than semantic elements. As such, while word embeddings are good at representing the semantic content of a word, the model may benefit from having access to syntactic content like POS tags directly. While more complex architectures trained on larger datasets may learn to infer the information carried in the POS tags of the words, the dataset size limitation of this model meant that we could not expect our model to achieve this. As a result, we used the CoreNLP library and its Stanza python client to tag the parts of speech of the entire dataset, and onehot encode all the part of speech tags to create POS embeddings alongside the word embeddings. In the end, we concatenated the word embeddings and the POS embeddings before feeding them into the model. This provided a 3% increase in accuracy coupled with both random initialized and GLoVe word vectors. 5. Part-of-Speech tags, vector embeddings: We also experimented with embedding the POS tags with random initialized trainable vectors. In this model, each POS tag that appeared in the dataset was given a random initialized vector as its embedding, and these vectors were concatenated with random initialized word vectors. During train time, the entire embedding layer was trained as well. This model had the same disadvantages of the random word vectors only model and as such, it underperformed compared to the one-hot encoding of POS tags. 6. I-Vectors: While our main focus was on seeing what we could achieve without resorting to the i-vectors of speakers, we also experimented with feeding in the i-vector data into the neural network for comparison purposes. In our case, we appended the i-vector of the whole speech excerpt directly to the word embedding of each word in the sequence. This is a redundant way to feed in this information since there is a single i-vector for the entire speech excerpt, and as such in the sequence, each index had the exact same i- vector. However, this approach let us keep our simple neural network structure, and it still gave us a big performance boost.
5 Figure 4: The architecture of our Neural Network Model that used word embeddings concatenated with POS tags and I-Vectors training examples for the neural net to learn the POS of the words inherently. This is illustrated in the figure below, which shows that train accuracy often jumps to close to 90% within a small number of epochs, resulting in overfitting. This problem persisted even with very high dropout probabilities, as the dataset was simply too small for the model to not overfit. Additionally, the dev accuracy hits its plateau quite early, and is not able to increase any further, showcasing the fact that the model is not able to generalize from the training data. 3 Evaluation 3.1 Experimental Results Neural Network Model Score GRU with random initialization GRU with GloVe Bidirectional LSTM with random initialization Bidirectional LSTM with GloVe GRU with GloVe & POS tags GRU with GloVe & i-vectors Linear Classifier Features Score Unstemmed words Stemmed words Stemmed words, POS tags, Stopwords Unstemmed words & i-vectors Stemmed words, POS tags, Stopwords, i-vectors Stemmed words & i-vectors Table 1: Results. We had difficulty in replicating the accuracy scores attained by the linear classifiers with our neural network models(malmasi and Dras, 2017) We believe that this is mainly due to the fact that 11,000 training samples was far too few to successfully train these models. This can specifically be seen in the case of the trainable randomly initialized word embeddings being outperformed by GloVe, even though over 1/3 of the corpus did not exist in GloVe. 11,000 examples would not have been enough to train word vectors, leading to the drop in accuracy. Additionally, the increase in accuracy caused by the POS tags in the neural net is probably due to the fact that there are not enough Figure 5: Train and dev accuracy for different models An important observation is that our neural net models and linear classifiers drastically improved once we either encoded i-vectors into the word embeddings or added them as features for the classifier. Since all other features are built off of the speech-transcriptions, the models were limited to the content of the subject s speech, and although some stylistic tendencies can be learned, the addition of a feature that encapsulated the actual aural features of the subject s voice would greatly enrich the information provided to the model and make classification much easier. This is quite intuitive, as if we were to ask humans to identify native languages, they would probably make most of their judgments based off of accents, information that is not in the speech transcriptions at all. 3.2 Error Analysis Looking at the confusion matrix, we found the performance of our model over different L1 s to be distributed similarly to the performance of models from previous literature. The biggest pain point for most models in the TOEFL datasets in NLI is the Hindi-Telugu language pair, and our model
6 Figure 6: A confusion matrix showcasing predicted and actual labels had the same trouble as Telugu was more often classified as Hindi than itself. Another group of languages that were confused with each other were Japanese, Korean and Chinese, as these are linguistically and historically close languages. Similarly Turkish had a higher level of confusion with Arabic, due to a similar linguistic proximity. Hindi, Japanese and German were the most accurate L1 languages for our model, whereas Telugu and Chinese were the worst ones. The main reasons for this were a strong tendency to misclassify Telugu as Hindi (but not the other way around) and a strong tendency to misclassify Chinese as Korean (but not the other way around). Italian, French, German and Spanish were generally more likely to be confused with each other, again fitting with geographic proximity, but not to the extent where it affected performance significantly on these languages. Another interesting linguistic observation was that Arabic was the most likely to be misclassified as Spanish (and Spanish most likely as Arabic), hinting at the historical connections between the two languages due to the long Arab rule over the Iberian peninsula. Overall, our confusion matrix shows that even with its relatively weak accuracy, our model s performance can be explained by the linguistic factors behind the L1 languages, and that similar L1 s are more likely to be confused. This shows that our model is learning relevant features, and increases our confidence that with a bigger dataset, it could succeed to generalize well. 4 Future Work While our experiments confirmed the observations from previous papers that neural networks underperform linear classifiers for the NLI task, we still believe there s room for exploration in getting the neural network models to perform better in this task. The biggest potential improvement would come from increasing the dataset size. One way of achieving this is using the previous NLI Shared Task (The TOEFL11 dataset) and combining it with this dataset. Since both datasets are from the same context (TOEFL exams) and are topic balanced, this should lead to a direct improvement of the model performance. There are also other English L2 datasets such as the scientists corpus (Stehwien and Padó, 2015). However, due to different topical biases of these datasets, they may not lead to direct increases in performance over the TOEFL dataset. Beyond acquiring more data, different neural architectures may be more suitable for this task. While recurrent neural networks are good at semantic tasks as they can synthesize semantic content and dependencies over longer sections of text, this may not be beneficial for this task. Since language transfer to L1 affects stylistic and syntactic features more so than semantic features, a convolutional model may be better at capturing such repeating language constructs throughout the text. As such, we expect convolutional models to perform better on this dataset. Furthermore, the inputs to the model may further be extended with more syntactic information. POS tags gave a solid performance boost, and other features used in linear classifiers may also improve performance. These could include dependency parse information and CFG fragments. Further experimentation is required to see which combinations yield the best results. Finally, looking at the confusion matrix we see that most of the classification errors of the model are between specific language pairs. This may be a reason why stacked classifiers have been so successful in this task in the past. The implication for neural net models may be that adding an attention mechanism that attends over the sequence based on the initial likelihoods to force the model to learn to discern certain language pairs. Another approach may be training specific 2-way or 3-way classifiers for oft-confused language groups, and feeding inputs through the more specific classifier for their most likely language group. Either of these approaches may improve the performance of the system.
7 References Shervin Malmasi and Mark Dras Native language identification using stacked generalization. CoRR, abs/ Sabrina Stehwien and Sebastian Padó Generalization in native language identification: Learners versus scientists. CLiC it, page 264.
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationA Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationarxiv: v1 [cs.cv] 10 May 2017
Inferring and Executing Programs for Visual Reasoning Justin Johnson 1 Bharath Hariharan 2 Laurens van der Maaten 2 Judy Hoffman 1 Li Fei-Fei 1 C. Lawrence Zitnick 2 Ross Girshick 2 1 Stanford University
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationROSETTA STONE PRODUCT OVERVIEW
ROSETTA STONE PRODUCT OVERVIEW Method Rosetta Stone teaches languages using a fully-interactive immersion process that requires the student to indicate comprehension of the new language and provides immediate
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationLinking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report
Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationSemantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma
Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Adam Abdulhamid Stanford University 450 Serra Mall, Stanford, CA 94305 adama94@cs.stanford.edu Abstract With the introduction
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationarxiv: v1 [cs.cl] 20 Jul 2015
How to Generate a Good Word Embedding? Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China {swlai, kliu,
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationA deep architecture for non-projective dependency parsing
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective
More informationTRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen
TRANSFER LEARNING OF WEAKLY LABELLED AUDIO Aleksandr Diment, Tuomas Virtanen Tampere University of Technology Laboratory of Signal Processing Korkeakoulunkatu 1, 33720, Tampere, Finland firstname.lastname@tut.fi
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationRevisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab
Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationTimeline. Recommendations
Introduction Advanced Placement Course Credit Alignment Recommendations In 2007, the State of Ohio Legislature passed legislation mandating the Board of Regents to recommend and the Chancellor to adopt
More informationResidual Stacking of RNNs for Neural Machine Translation
Residual Stacking of RNNs for Neural Machine Translation Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Akiva Miura Nara Institute of Science and Technology miura.akiba.lr9@is.naist.jp
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationSecond Exam: Natural Language Parsing with Neural Networks
Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationГлубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках
Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationArabic Orthography vs. Arabic OCR
Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among
More informationarxiv: v1 [cs.lg] 7 Apr 2015
Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationCollege Entrance Testing:
College Entrance Testing: SATs, ACTs, Subject Tests, and test-optional schools College & Career Day April 1, 2017 Today s Workshop Goal: Learn about different college entrance exams to develop a testing
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationWhat Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017
What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationThe Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University
The Effect of Extensive Reading on Developing the Grammatical Accuracy of the EFL Freshmen at Al Al-Bayt University Kifah Rakan Alqadi Al Al-Bayt University Faculty of Arts Department of English Language
More informationContent Language Objectives (CLOs) August 2012, H. Butts & G. De Anda
Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of
More informationINVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT
INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationA JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS
A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka & Richard Socher The University of Tokyo {hassy, tsuruoka}@logos.t.u-tokyo.ac.jp
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationA Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting
A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting Turhan Carroll University of Colorado-Boulder REU Program Summer 2006 Introduction/Background Physics Education Research (PER)
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationarxiv: v4 [cs.cl] 28 Mar 2016
LSTM-BASED DEEP LEARNING MODELS FOR NON- FACTOID ANSWER SELECTION Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou IBM Watson Core Technologies Yorktown Heights, NY, USA {mingtan,cicerons,bingxia,zhou}@us.ibm.com
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationDesigning a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses
Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,
More informationSETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT
SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT By: Dr. MAHMOUD M. GHANDOUR QATAR UNIVERSITY Improving human resources is the responsibility of the educational system in many societies. The outputs
More informationExams: Accommodations Guidelines. English Language Learners
PSSA Accommodations Guidelines for English Language Learners (ELLs) [Arlen: Please format this page like the cover page for the PSSA Accommodations Guidelines for Students PSSA with IEPs and Students with
More informationCurriculum and Assessment Policy
*Note: Much of policy heavily based on Assessment Policy of The International School Paris, an IB World School, with permission. Principles of assessment Why do we assess? How do we assess? Students not
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationCultivating DNN Diversity for Large Scale Video Labelling
Cultivating DNN Diversity for Large Scale Video Labelling Mikel Bober-Irizar mikel@mxbi.net Sameed Husain sameed.husain@surrey.ac.uk Miroslaw Bober m.bober@surrey.ac.uk Eng-Jon Ong e.ong@surrey.ac.uk Abstract
More informationStacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes
Stacks Teacher notes Activity description (Interactive not shown on this sheet.) Pupils start by exploring the patterns generated by moving counters between two stacks according to a fixed rule, doubling
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationlearning collegiate assessment]
[ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationSpoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers
Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More information