# DEEP STACKING NETWORKS FOR INFORMATION RETRIEVAL. Li Deng, Xiaodong He, and Jianfeng Gao.

Save this PDF as:

Size: px
Start display at page:

Download "DEEP STACKING NETWORKS FOR INFORMATION RETRIEVAL. Li Deng, Xiaodong He, and Jianfeng Gao."

## Transcription

2 ... U3 W3... U4 W4 Given target vectors [ ], where each vector is [ ], the parameters and are learned to minimize the average of the total square error [( )( ) ] (1) where [ ]. Note that once the lower layer weights are fixed (e.g., by random numbers), the hidden layer values [ ] are also determined uniquely. Consequently, the upper layer weights can be determined by setting the gradient... U1 W1 U2 W2 [( )( ) ] ( ) (2) to zero, leading to the closed-form solution ( ) (3) Fig. 1: An illustration of the DSN architecture. 3. DEEP STACKING NETWORK: LEARNING In each module of the DSN, the output units are linear and the hidden units are sigmoidal nonlinear. The linearity in the output units permits highly efficient, parallelizable, and close-form estimation (a result of convex optimization) for the output network weight matrices given the hidden units activities. Due to the close-form constraints between the input and output weights, the input weight matrices can also be elegantly estimated in an efficient, parallelizable, batch-mode manner. In following sections, the indices of the network weight matrices are omitted for simplification. 3.1 Basic learning algorithm Denote training vectors by [ ], in which each vector is denoted by [ ] where is the dimension of the input vector, which is a function of the module, and is the total number of training samples. Denote the number of hidden units and the dimension of the output vector. Then, the output of the a DSN module is where ( ) is the hidden layer output, is an weight matrix at the upper layer, is an weight matrix at the lower layer, and ( ) is the sigmoid function. (Bias terms are implicitly represented in the above formulation if and are augmented with ones.) 3.2 Module-bound fine tuning The weight matrices of the DSN in each module can be further learned using batch-mode gradient descent [18]. The computation of the error gradient makes use of Eq. (3) and proceeds by [( )( ) ] (4) [([( ) ] )([( ) ] ) ] [ ( ) ] [( ) ] [( ( )[ ( )] ) ( ) [ ( )] ] [ ( ) [ ( )( ) ( )]] where ( ) is pseudo-inverse of. How to initialize in gradient descent in stacking modules can be found in [4]. 3.3 Regularization in the DSN learning During this study, we found that regularization in DSN learning is much more important for the IR task than for the speech and image classification tasks investigated earlier.

4 indicates that the inconsistency between the training objective and the IR-quality measure becomes a critical issue in that region. In the future, it is desirable to train the model using techniques that optimizes an objective that is closely related to the end-to-end IR quality, like the discriminative training methods widely used for speech recognition [7][8] and more recent end-to-end decisionfeedback training approaches applied successfully to speech translation [19]. Finally, to analyze the learning behavior of DSN, we plot in Fig. 3 the learning curves in terms of the three NDCG measures for the test set as a function of the training epoch. Each epoch is one sweep of all 189K training vectors in learning and in a DSN module. Seven epochs are used in each module. Thus the improvement of NDCG saturates at three to four models. No over fitting occurs due to careful regularization as described in Section % 16.5% 16.0% 15.5% 15.0% 14.5% Fig. 2: Relationship between the classification error rates and the NDCG1 values on the test set Error rate vs NDCG1 NDCG10 NDCG1 NDCG Fig. 3: Learning curves: NDCG values as a function of the training epochs cumulated over DSN modules. 6. CONCLUSION We present in this paper the first study, to the best of our knowledge, on the use of deep learning techniques, the DSN architecture in particular, to the ad-related IR problem. We conclude from the experiments that the classification error rate, which is closely correlated with MSE as the DSN training objective, is generally correlated well with the NDCG as the IR quality measure, with the exception in the region of high IR quality. We also conclude that despite such exception the NDCG values obtained on the independent test set using MSE as the training criterion are significantly higher than the state-of-the-art baseline system. The poorer correlation observed so far between the DSN training objective and the IR quality measure in the high IRquality region suggests promise of further improvement of the DSN method. This would demand future research directed to the development of more suitable objective functions and new DSN learning methods. We also expect that with greater levels of IR targets than two as used in the current experiments, the effectiveness of the DSN will become stronger than reported in this paper since the stacking information from one module to another will become richer. 7. REFERENCES [1] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G.Hullender. Learning to rank using gradient descent. In Proc. ICML [2] C. Burges, R. Rango, and Q. Le. Learning to rank with non-smooth cost functions. In Proc. NIPS [3] L. Deng and D. Yu. Deep Convex Network: A scalable architecture for deep learning. In Proc. Interspeech [4] L. Deng, D. Yu, and J. Platt. Scalable stacking and learning for building deep architectures. In Proc. ICASSP [5] J. F. Gao, W. Yuan, X. Li, K. Deng and J.Y. Nie. Smoothing clickthrough data for web search ranking. In Proc. SIGIR [6] J. F. Gao, X. He and J.Y. Nie. Clickthrough-based translation models for web search: from word models to phrase models. In Proc. CIKM [7] X. He, L. Deng, and W. Chou. Discriminative learning in sequential pattern recognition. IEEE Signal Processing Magazine, September, [8] X. He, L. Deng, and W. Chou. A novel learning method for hidden Markov models in speech and audio processing. In Proc. IEEE Workshop on Multimedia Signal Processing, [9] D. Hillard, E. Manavoglu, H. Raghavan, C. Leggetter, E. Cantú-Paz, and R. Iyer.. The Sum of Its Parts: Reducing Sparsity in Click Estimation with Query

5 Segments. In Journal of Information Retrieval, V14(3) pp , June, [10] D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan and C. Leggetter. Improving Ad relevance in sponsored search. In Proc. WSDM [11] B. Jansen and M. Resnick. Examining searcher perceptions of and interactions with sponsored results. In Proc. Workshop on Sponsored Search Auctions, [12] K. Jarvelin and J. Kekalainen.. IR evaluation methods for retrieving highly relevant documents. In Proc. SIGIR [13] M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. WWW [14] S. Robertson, S. Walker, S. Jones, M. Hancock- Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference, Gaithersburg, USA, November, [15] G. Salton; M. McGill. Introduction to modern information retrieval. McGraw-Hill [16] G. Tur, L. Deng, D. Hakkani-Tür, and X. He. Towards deep understanding: Deep convex networks for semantic utterance classification. In Proc. ICASSP [17] D. Wolpert. Stacked generalization. Neural Networks, vol. 5(2), pp , [18] D. Yu, and L. Deng. Accelerated parallelizable neural networks learning algorithms for speech recognition. In Proc. Interspeech [19] Y. Zhang, L. Deng, X. He, and A. Acero. A novel decision function and the associated decision-feedback learning for speech translation. In Proc. ICASSP 2011.

### END-TO-END LEARNING OF PARSING MODELS FOR INFORMATION RETRIEVAL. Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA

END-TO-END LEARNING OF PARSING MODELS FOR INFORMATION RETRIEVAL Jennifer Gillenwater *, Xiaodong He, Jianfeng Gao, Li Deng jengi@seas.upenn.edu, {xiaohe,jfgao,deng}@microsoft.com Microsoft Research, One

### Discriminative Learning of Feature Functions of Generative Type in Speech Translation

Discriminative Learning of Feature Functions of Generative Type in Speech Translation Xiaodong He Microsoft Research, One Microsoft Way, Redmond, WA 98052 USA Li Deng Microsoft Research, One Microsoft

### Discriminative Learning of Feature Functions of Generative Type in Speech Translation

Discriminative Learning of Feature Functions of Generative Type in Speech Translation Xiaodong He Microsoft Research, One Microsoft Way, Redmond, WA 98052 USA Li Deng Microsoft Research, One Microsoft

### DEEP LEARNING FOR MONAURAL SPEECH SEPARATION

DEEP LEARNING FOR MONAURAL SPEECH SEPARATION Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,

### Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

### Building Deep Structured Semantic Similarity Features to Improve the Media Search

Building Deep Structured Semantic Similarity Features to Improve the Media Search Xugang Ye, Zijie Qi, Xiaodong He Microsoft {xugangye, xiaohe, zijieqi}@microsoft.com Jingjing Li University of Virginia

### Deep (Structured) Learning

Deep (Structured) Learning Yasmine Badr 06/23/2015 NanoCAD Lab UCLA What is Deep Learning? [1] A wide class of machine learning techniques and architectures Using many layers of non-linear information

### Adaptive Behavior with Fixed Weights in RNN: An Overview

& Adaptive Behavior with Fixed Weights in RNN: An Overview Danil V. Prokhorov, Lee A. Feldkamp and Ivan Yu. Tyukin Ford Research Laboratory, Dearborn, MI 48121, U.S.A. Saint-Petersburg State Electrotechical

### COMP150 DR Final Project Proposal

COMP150 DR Final Project Proposal Ari Brown and Julie Jiang October 26, 2017 Abstract The problem of sound classification has been studied in depth and has multiple applications related to identity discrimination,

### arxiv: v3 [cs.lg] 9 Mar 2014

Learning Factored Representations in a Deep Mixture of Experts arxiv:1312.4314v3 [cs.lg] 9 Mar 2014 David Eigen 1,2 Marc Aurelio Ranzato 1 Ilya Sutskever 1 1 Google, Inc. 2 Dept. of Computer Science, Courant

### Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

### Generalized Learning of Neural Network based Semantic Similarity Models and its Application in Movie Search

Generalized Learning of Neural Network based Semantic Similarity Models and its Application in Movie Search Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey Microsoft Bellevue, WA, USA {xugangye,

### SPEECH RECOGNITION WITH PREDICTION-ADAPTATION-CORRECTION RECURRENT NEURAL NETWORKS

SPEECH RECOGNITION WITH PREDICTION-ADAPTATION-CORRECTION RECURRENT NEURAL NETWORKS Yu Zhang MIT CSAIL Cambridge, MA, USA yzhang87@csail.mit.edu Dong Yu, Michael L. Seltzer, Jasha Droppo Microsoft Research

### A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval Yelong Shen Microsoft Research Redmond, WA, USA yeshen@microsoft.com Xiaodong He Jianfeng Gao Li Deng Microsoft Research

### CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017

CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-

### Using Word Confusion Networks for Slot Filling in Spoken Language Understanding

INTERSPEECH 2015 Using Word Confusion Networks for Slot Filling in Spoken Language Understanding Xiaohao Yang, Jia Liu Tsinghua National Laboratory for Information Science and Technology Department of

### Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities

### Towards automatic generation of relevance judgments for a test collection

Towards automatic generation of relevance judgments for a test collection Mireille Makary / Michael Oakes RIILP University of Wolverhampton Wolverhampton, UK m.makary@wlv.ac.uk / michael.oakes@wlv.ac.uk

### Speech Emotion Recognition Using Deep Neural Network and Extreme. learning machine

INTERSPEECH 2014 Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine Kun Han 1, Dong Yu 2, Ivan Tashev 2 1 Department of Computer Science and Engineering, The Ohio State University,

### PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

### 545 Machine Learning, Fall 2011

545 Machine Learning, Fall 2011 Final Project Report Experiments in Automatic Text Summarization Using Deep Neural Networks Project Team: Ben King Rahul Jha Tyler Johnson Vaishnavi Sundararajan Instructor:

### Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models

Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models Yajie Miao Hao Zhang Florian Metze Language Technologies Institute School of Computer Science Carnegie Mellon University 1 / 23

### Improving Paragraph2Vec

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Collaborative Ranking

Collaborative Ranking Suhrid Balakrishnan AT&T Labs Research 180 Park Ave. Florham Park, NJ suhrid@research.att.com Sumit Chopra AT&T Labs Research 180 Park Ave. Florham Park, NJ schopra@research.att.com

### CS 510: Lecture 8. Deep Learning, Fairness, and Bias

CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already

### Introduction to Deep Learning

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.

### Machine Learning : Hinge Loss

Machine Learning Hinge Loss 16/01/2014 Machine Learning : Hinge Loss Recap tasks considered before Let a training dataset be given with (i) data and (ii) classes The goal is to find a hyper plane that

### The Generalized Delta Rule and Practical Considerations

The Generalized Delta Rule and Practical Considerations Introduction to Neural Networks : Lecture 6 John A. Bullinaria, 2004 1. Training a Single Layer Feed-forward Network 2. Deriving the Generalized

### A Distributional Representation Model For Collaborative

A Distributional Representation Model For Collaborative Filtering Zhang Junlin,Cai Heng,Huang Tongwen, Xue Huiping Chanjet.com {zhangjlh,caiheng,huangtw,xuehp}@chanjet.com Abstract In this paper, we propose

### Supplement for BIER. Let η m = 2. m+1 M = number of learners, I = number of iterations for n = 1 to I do /* Forward pass */ Sample triplet (x (1) s 0

Supplement for BIER. Introduction In this document we provide further insights into Boosting Independent Embeddings Robustly (BIER). First, in Section we describe our method for loss functions operating

### CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network

CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network Byungwoo Kang Stanford University Department of Physics bkang@stanford.edu Hyun Sik Kim Stanford University Department of Electrical Engineering

### Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)

Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling

### arxiv: v3 [cs.ir] 28 Jul 2014

Yuyu Zhang Institute of Computing Technology Chinese Academy of Sciences zhangyuyu@ict.ac.cn Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks Hanjun Dai Fudan University

### mizes the model parameters by learning from the simulated recognition results on the training data. This paper completes the comparison [7] to standar

Self Organization in Mixture Densities of HMM based Speech Recognition Mikko Kurimo Helsinki University of Technology Neural Networks Research Centre P.O.Box 22, FIN-215 HUT, Finland Abstract. In this

### Retrieval Term Prediction Using Deep Belief Networks

Retrieval Term Prediction Using Deep Belief Networks Qing Ma Ibuki Tanigawa Masaki Murata Department of Applied Mathematics and Informatics, Ryukoku University Department of Information and Electronics,

### OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

### Computer Vision for Card Games

Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program

### The Effect of Large Training Set Sizes on Online Japanese Kanji and English Cursive Recognizers

The Effect of Large Training Set Sizes on Online Japanese Kanji and English Cursive Recognizers Henry A. Rowley Manish Goyal John Bennett Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, USA

### Reverse Dictionary Using Artificial Neural Networks

International Journal of Research Studies in Science, Engineering and Technology Volume 2, Issue 6, June 2015, PP 14-23 ISSN 2349-4751 (Print) & ISSN 2349-476X (Online) Reverse Dictionary Using Artificial

### HMM-Based Emotional Speech Synthesis Using Average Emotion Model

HMM-Based Emotional Speech Synthesis Using Average Emotion Model Long Qin, Zhen-Hua Ling, Yi-Jian Wu, Bu-Fan Zhang, and Ren-Hua Wang iflytek Speech Lab, University of Science and Technology of China, Hefei

### Learning Ranking vs. Modeling Relevance

Learning Ranking vs. Modeling Relevance Dmitri Roussinov Department of Information Systems W.P.Carey School of Business Arizona State University dmitri.roussinov@asu.edu Abstract The classical (ad hoc)

### Artificial Neural Networks

Artificial Neural Networks Outline Introduction to Neural Network Introduction to Artificial Neural Network Properties of Artificial Neural Network Applications of Artificial Neural Network Demo Neural

### Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

### Pairwise Document Classification for Relevance Feedback

Pairwise Document Classification for Relevance Feedback Jonathan L. Elsas, Pinar Donmez, Jaime Callan, Jaime G. Carbonell Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213

### Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks and Language Models Abigail See Announcements Assignment 1: Grades will be released after class Assignment

### An Artificial Neural Network Approach for User Class-Dependent Off-Line Sentence Segmentation

An Artificial Neural Network Approach for User Class-Dependent Off-Line Sentence Segmentation César A. M. Carvalho and George D. C. Cavalcanti Abstract In this paper, we present an Artificial Neural Network

### Neural Network Ensembles, Cross Validation, and Active Learning

Neural Network Ensembles, Cross Validation, and Active Learning Anders Krogh" Nordita Blegdamsvej 17 2100 Copenhagen, Denmark Jesper Vedelsby Electronics Institute, Building 349 Technical University of

### Gender Classification Based on FeedForward Backpropagation Neural Network

Gender Classification Based on FeedForward Backpropagation Neural Network S. Mostafa Rahimi Azghadi 1, M. Reza Bonyadi 1 and Hamed Shahhosseini 2 1 Department of Electrical and Computer Engineering, Shahid

### Speech Accent Classification

Speech Accent Classification Corey Shih ctshih@stanford.edu 1. Introduction English is one of the most prevalent languages in the world, and is the one most commonly used for communication between native

### Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

### Context-Dependent Connectionist Probability Estimation in a Hybrid HMM-Neural Net Speech Recognition System

Context-Dependent Connectionist Probability Estimation in a Hybrid HMM-Neural Net Speech Recognition System Horacio Franco, Michael Cohen, Nelson Morgan, David Rumelhart and Victor Abrash SRI International,

### Predicting Diverse Subsets Using Structural SVMs

Yisong Yue Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY 14853 USA yyue@cs.cornell.edu tj@cs.cornell.edu Abstract In many retrieval tasks, one important goal involves

### Predicting Yelp Ratings Using User Friendship Network Information

Predicting Yelp Ratings Using User Friendship Network Information Wenqing Yang (wenqing), Yuan Yuan (yuan125), Nan Zhang (nanz) December 7, 2015 1 Introduction With the widespread of B2C businesses, many

### Vector Space Models (VSM) and Information Retrieval (IR)

Vector Space Models (VSM) and Information Retrieval (IR) T-61.5020 Statistical Natural Language Processing 24 Feb 2016 Mari-Sanna Paukkeri, D. Sc. (Tech.) Lecture 3: Agenda Vector space models word-document

### Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

### Designing States, Actions, and Rewards for Using POMDP in Session Search

Designing States, Actions, and Rewards for Using POMDP in Session Search Jiyun Luo, Sicong Zhang, Xuchu Dong, and Hui Yang Department of Computer Science, Georgetown University 37th and O Street NW, Washington

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

### Deep learning for automatic speech recognition. Mikko Kurimo Department for Signal Processing and Acoustics Aalto University

Deep learning for automatic speech recognition Mikko Kurimo Department for Signal Processing and Acoustics Aalto University Mikko Kurimo Associate professor in speech and language processing Background

### Convolutional Neural Networks for Speech Recognition

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL 22, NO 10, OCTOBER 2014 1533 Convolutional Neural Networks for Speech Recognition Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang,

### Semantic Parsing for Single-Relation Question Answering

Semantic Parsing for Single-Relation Question Answering Wen-tau Yih Xiaodong He Christopher Meek Microsoft Research Redmond, WA 98052, USA {scottyih,xiaohe,meek}@microsoft.com Abstract We develop a semantic

### A study of the NIPS feature selection challenge

A study of the NIPS feature selection challenge Nicholas Johnson November 29, 2009 Abstract The 2003 Nips Feature extraction challenge was dominated by Bayesian approaches developed by the team of Radford

### Semi-Supervised Document Retrieval

Semi-Supervised Document Retrieval Ming Li a, Hang Li b, Zhi-Hua Zhou a, a National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China b Microsoft Research Asia, 49

### Cross-Market Model Adaptation with Pairwise Preference Data for Web Search Ranking

Cross-Market Model Adaptation with Pairwise Preference Data for Web Search Ranking Jing Bai Microsoft Bing 1065 La Avenida Mountain View, CA 94043 jbai@microsoft.com Fernando Diaz, Yi Chang, Zhaohui Zheng

### ROBUST SPEECH RECOGNITION FROM RATIO MASKS. {wangzhon,

ROBUST SPEECH RECOGNITION FROM RATIO MASKS Zhong-Qiu Wang 1 and DeLiang Wang 1, 2 1 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive and Brain Sciences,

### Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

### ZERO-SHOT LEARNING OF INTENT EMBEDDINGS FOR EXPANSION BY CONVOLUTIONAL DEEP STRUCTURED SEMANTIC MODELS

ZERO-SHOT LEARNING OF INTENT EMBEDDINGS FOR EXPANSION BY CONVOLUTIONAL DEEP STRUCTURED SEMANTIC MODELS Yun-Nung Chen Dilek Hakkani-Tür Xiaodong He Carnegie Mellon University, Pittsburgh, PA, USA Microsoft

### ROBUST DIALOG STATE TRACKING USING DELEXICALISED RECURRENT NEURAL NETWORKS AND UNSUPERVISED ADAPTATION

ROBUST DIALOG STATE TRACKING USING DELEXICALISED RECURRENT NEURAL NETWORKS AND UNSUPERVISED ADAPTATION Matthew Henderson 1, Blaise Thomson 2 and Steve Young 1 1 Department of Engineering, University of

### Modelling Student Knowledge as a Latent Variable in Intelligent Tutoring Systems: A Comparison of Multiple Approaches

Modelling Student Knowledge as a Latent Variable in Intelligent Tutoring Systems: A Comparison of Multiple Approaches Qandeel Tariq, Alex Kolchinski, Richard Davis December 6, 206 Introduction This paper

### Learning facial expressions from an image

Learning facial expressions from an image Bhrugurajsinh Chudasama, Chinmay Duvedi, Jithin Parayil Thomas {bhrugu, cduvedi, jithinpt}@stanford.edu 1. Introduction Facial behavior is one of the most important

### Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod

Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod Music Information Retrieval (MIR) Science of retrieving information from music. Includes tasks such as Query by Example,

### Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks

Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks Aliaksei Severyn Google Inc. aseveryn@gmail.com Alessandro Moschitti Qatar Computing Research Institute amoschitti@qf.org.qa ABSTRACT

### WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization

WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization Animesh Prasad School of Computing, National University of Singapore, Singapore a0123877@u.nus.edu

### arxiv: v1 [cs.ir] 30 May 2017

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models Jun Wang University College London j.wang@cs.ucl.ac.uk Lantao Yu, Weinan Zhang Shanghai Jiao Tong University

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

### CONTEXTUAL SPOKEN LANGUAGE UNDERSTANDING USING RECURRENT NEURAL NETWORKS. Microsoft

CONTEXTUAL SPOKEN LANGUAGE UNDERSTANDING USING RECURRENT NEURAL NETWORKS Yangyang Shi, Kaisheng Yao, Hu Chen, Yi-Cheng Pan, Mei-Yuh Hwang, Baolin Peng Microsoft ABSTRACT We present a contextual spoken

### Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis

Target Target Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis Vanika Singhal, Anupriya Gogna and Angshul Majumdar Indraprastha Institute of Information Technology,

### Pattern Classification and Clustering Spring 2006

Pattern Classification and Clustering Time: Spring 2006 Room: Instructor: Yingen Xiong Office: 621 McBryde Office Hours: Phone: 231-4212 Email: yxiong@cs.vt.edu URL: http://www.cs.vt.edu/~yxiong/pcc/ Detailed

### Visualization Tool for a Self-Splitting Modular Neural Network

Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009 Visualization Tool for a Self-Splitting Modular Neural Network V. Scott Gordon, Michael Daniels,

### Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition

Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition Michiel Bacchiani, Andrew Senior, Georg Heigold Google Inc. {michiel,andrewsenior,heigold}@google.com

### Data Fusion and Bias

Data Fusion and Bias Performance evaluation of various data fusion methods İlker Nadi Bozkurt Computer Engineering Department Bilkent University Ankara, Turkey bozkurti@cs.bilkent.edu.tr Hayrettin Gürkök

### Proficiency Assessment of ESL Learner s Sentence Prosody with TTS Synthesized Voice as Reference

INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Proficiency Assessment of ESL Learner s Sentence Prosody with TTS Synthesized Voice as Reference Yujia Xiao 1,2*, Frank K. Soong 2 1 South China University

### A SELF-LEARNING NEURAL NETWORK

769 A SELF-LEARNING NEURAL NETWORK A. Hartstein and R. H. Koch IBM - Thomas J. Watson Research Center Yorktown Heights, New York ABSTRACf We propose a new neural network structure that is compatible with

### EXTRACTING MEDICAL KNOWLEDGE FROM QUERY RELATED WEBSITE-A SURVEY

EXTRACTING MEDICAL KNOWLEDGE FROM QUERY RELATED WEBSITE-A SURVEY V. Meena Gomathy, M.Phil, Research Scholar, Department of Computer Science, Kongunadu Arts & Science College, Coimbatore, Tamilnadu, India.

### Load Forecasting with Artificial Intelligence on Big Data

1 Load Forecasting with Artificial Intelligence on Big Data October 9, 2016 Patrick GLAUNER and Radu STATE SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 2

### CS224n: Homework 4 Reading Comprehension

CS224n: Homework 4 Reading Comprehension Leandra Brickson, Ryan Burke, Alexandre Robicquet 1 Overview To read and comprehend the human languages are challenging tasks for the machines, which requires that

### QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

### Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010 Assignments To read this week: Chapter 18, sections 1-4 and 7 Problem Set 3 due next week! Learning a Decision Tree We look

### Qihang Lin. RESEARCH Machine Learning Convex Optimization

Qihang Lin CONTACT Tippie College of Business (319) 335-0988 INFORMATION University of Iowa qihang-lin@uiowa.edu PBB S380, E Market St tippie.uiowa.edu/people/qihang-lin Iowa City, IA, 52242-1994 RESEARCH

### arxiv: v2 [cs.ir] 29 May 2017

Neural Ranking Models with Weak Supervision Mostafa Dehghani University of Amsterdam dehghani@uva.nl Hamed Zamani University of Massachusetts Amherst zamani@cs.umass.edu arxiv:1704.08803v2 [cs.ir] 29 May

### Incorporating Diversity and Density in Active Learning for Relevance Feedback

Incorporating Diversity and Density in Active Learning for Relevance Feedback Zuobing Xu, Ram Akella, and Yi Zhang University of California, Santa Cruz, CA, USA, 95064 Abstract. Relevance feedback, which

### SEQUENCE TRAINING OF MULTIPLE DEEP NEURAL NETWORKS FOR BETTER PERFORMANCE AND FASTER TRAINING SPEED

2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SEQUENCE TRAINING OF MULTIPLE DEEP NEURAL NETWORKS FOR BETTER PERFORMANCE AND FASTER TRAINING SPEED Pan Zhou 1, Lirong

### DEEP HIERARCHICAL BOTTLENECK MRASTA FEATURES FOR LVCSR

DEEP HIERARCHICAL BOTTLENECK MRASTA FEATURES FOR LVCSR Zoltán Tüske a, Ralf Schlüter a, Hermann Ney a,b a Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University,

### Calibration of Confidence Measures in Speech Recognition

Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

### Stochastic Gradient Descent using Linear Regression with Python

ISSN: 2454-2377 Volume 2, Issue 8, December 2016 Stochastic Gradient Descent using Linear Regression with Python J V N Lakshmi Research Scholar Department of Computer Science and Application SCSVMV University,

### arxiv: v1 [cs.cl] 2 Jun 2015

Learning Speech Rate in Speech Recognition Xiangyu Zeng 1,3, Shi Yin 1,4, Dong Wang 1,2 1 CSLT, RIIT, Tsinghua University 2 TNList, Tsinghua University 3 Beijing University of Posts and Telecommunications

### Deep Belief Network based Semantic Taggers for Spoken Language Understanding

Deep Belief Network based Semantic Taggers for Spoken Language Understanding Anoop Deoras, Ruhi Sarikaya Microsoft Corporation, 1065 La Avenida, Mountain View, CA Anoop.Deoras@microsoft.com, Ruhi.Sarikaya@microsoft.com

### Discriminative Phonetic Recognition with Conditional Random Fields

Discriminative Phonetic Recognition with Conditional Random Fields Jeremy Morris & Eric Fosler-Lussier Dept. of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {morrijer,fosler}@cse.ohio-state.edu

### Deep Multi-Task Learning with evolving weights

Deep Multi-Task Learning with evolving weights Soufiane Belharbi1, Romain He rault1, Cle ment Chatelain1 and Se bastien Adam2 1- INSA de Rouen - LITIS EA 4108 Saint E tienne du Rouvray 76800 - France 2-