Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod

Similar documents
A Deep Bag-of-Features Model for Music Auto-Tagging

A study of speaker adaptation for DNN-based speech synthesis

Speech Emotion Recognition Using Support Vector Machine

Modeling function word errors in DNN-HMM based LVCSR systems

Human Emotion Recognition From Speech

Modeling function word errors in DNN-HMM based LVCSR systems

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

On the Formation of Phoneme Categories in DNN Acoustic Models

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Python Machine Learning

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Speaker Identification by Comparison of Smart Methods. Abstract

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

WHEN THERE IS A mismatch between the acoustic

THE world surrounding us involves multiple modalities

Learning Methods in Multilingual Speech Recognition

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

THE enormous growth of unstructured data, including

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Calibration of Confidence Measures in Speech Recognition

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Affective Classification of Generic Audio Clips using Regression Models

Word Segmentation of Off-line Handwritten Documents

Probabilistic Latent Semantic Analysis

Learning Methods for Fuzzy Systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Speech Recognition at ICSI: Broadcast News and beyond

Lecture 1: Machine Learning Basics

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Speech Recognition by Indexing and Sequencing

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v2 [cs.cv] 30 Mar 2017

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Review: Speech Recognition with Deep Learning Methods

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

arxiv: v1 [cs.lg] 15 Jun 2015

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Artificial Neural Networks written examination

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v1 [cs.lg] 20 Mar 2017

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Model Ensemble for Click Prediction in Bing Search Ads

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Speaker recognition using universal background model on YOHO database

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Lecture 1: Basic Concepts of Machine Learning

Deep Neural Network Language Models

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Softprop: Softmax Neural Network Backpropagation Learning

AQUA: An Ontology-Driven Question Answering System

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

SARDNET: A Self-Organizing Feature Map for Sequences

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

INPE São José dos Campos

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Support Vector Machines for Speaker and Language Recognition

Proceedings of Meetings on Acoustics

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Linking Task: Identifying authors and book titles in verbose queries

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v1 [cs.cv] 10 May 2017

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Time series prediction

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Assignment 1: Predicting Amazon Review Ratings

Offline Writer Identification Using Convolutional Neural Network Activation Features

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Knowledge Transfer in Deep Convolutional Neural Nets

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Device Independence and Extensibility in Gesture Recognition

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

Transcription:

Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod

Music Information Retrieval (MIR) Science of retrieving information from music. Includes tasks such as Query by Example, Query by Humming Music Recommendation, Automatic Playlist Generation Genre Classification, Artist Classification, Instrument Classification, Chord recognition Emotion recognition

Solutions Commonly use Machine Learning (ML) Raw Data (Audio Signal) Feature Extraction (representation) Machine Learning Algorithm (e.g. SVM, knn, etc.)

Feature Design Feature extraction stage usually involves hand-crafted feature design ML Algorithm performance critically depends on feature design Features have to be robust to noise, translations and other variations Hand-crafted features are heuristic based Most common features are Mel-frequency Cepstral Coefficients (MFCCs) and variants

Learning Features/Representations Learning representations is far less tedious than engineering features Context-dependent feature extraction is made possible Not necessarily task-specific, Transfer learning allows reusing features across multiple tasks

Learning Features/Representations Vector Quantization Use simple feature extraction techniques Perform clustering on all data points Transformed representation is a vector where a single entry is non-zero, whose index corresponds to cluster id. e.g. [x 1, x 2, x 3 x n ] [ 0, 0, 1, 0]

Learning Features/Representations Sparse Coding A class of algorithms that learn to represent each data point as a linear combination of basis vectors (features) Set of basis vectors form the dictionary e.g. if [x 1, x 2, x 3 x n ] = a 1.f 1 + 0.f 2 + 0.f 3 + a 4.f 4 + then [x 1, x 2, x 3 x n ] [a 1, 0, 0, a 4, ]

Learning Features/Representations Autoencoder A feed forward neural network with one hidden layer and same number of output nodes as input Task is to reconstruct the input Hidden layer learns a sparse encoding of the data when constraints are placed on hidden layer activations during the training procedure May be combined with supervised criterion

Input Hidden Layer / Encoding x 1 Reconstruction x 1 h 1 x 2 x 2 h 2 x... x... h 3 x n x n +1 +1

Deep Architectures A stack of shallow transformations Output of one stage serves as input to next Complex transformation thus modeled as a series of simpler transformations Each transformation encodes some specific variance Can be learned Stacked auto-encoders and variants (unsupervised) Deep feedforward neural networks (supervised)

Does it work? Humphrey et al. [2012] review initial research in the area All of which achieve state of the art performance Tasks include genre recognition, instrument classification, chord recognition,...

Does it work? Musical Onset Detection using CNNs - Schlüter et al. [2014]

Does it work? Deep content-based music recommendation - Oord et al. [2013]

References Humphrey, Eric J., Juan Pablo Bello, and Yann LeCun. "Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics." ISMIR. 2012. Lee, Honglak, et al. "Unsupervised feature learning for audio classification using convolutional deep belief networks." Advances in neural information processing systems. 2009. Schluter, Jan, and Sebastian Bock. "Improved musical onset detection with convolutional neural networks." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014. Van den Oord, Aaron, Sander Dieleman, and Benjamin Schrauwen. "Deep content-based music recommendation." Advances in Neural Information Processing Systems. 2013. Hamel, Philippe, and Douglas Eck. "Learning Features from Music Audio with Deep Belief Networks." ISMIR. 2010. Schluter, Jan, and Christian Osendorfer. "Music similarity estimation with the mean-covariance restricted boltzmann machine." Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on. Vol. 2. IEEE, 2011.