A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization

Similar documents
Python Machine Learning

CS Machine Learning

Learning From the Past with Experiment Databases

Reducing Features to Improve Bug Prediction

Assignment 1: Predicting Amazon Review Ratings

Affective Classification of Generic Audio Clips using Regression Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

CSL465/603 - Machine Learning

Speech Emotion Recognition Using Support Vector Machine

(Sub)Gradient Descent

Rule Learning With Negation: Issues Regarding Effectiveness

Human Emotion Recognition From Speech

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Linking Task: Identifying authors and book titles in verbose queries

A Case Study: News Classification Based on Term Frequency

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speech Recognition at ICSI: Broadcast News and beyond

Lecture 1: Machine Learning Basics

arxiv: v1 [cs.lg] 3 May 2013

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Probabilistic Latent Semantic Analysis

The stages of event extraction

arxiv: v2 [cs.cv] 30 Mar 2017

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Activity Recognition from Accelerometer Data

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Disambiguation of Thai Personal Name from Online News Articles

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Ensemble Technique Utilization for Indonesian Dependency Parser

Beyond the Pipeline: Discrete Optimization in NLP

Universidade do Minho Escola de Engenharia

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

CS 446: Machine Learning

Truth Inference in Crowdsourcing: Is the Problem Solved?

Comment-based Multi-View Clustering of Web 2.0 Items

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Using dialogue context to improve parsing performance in dialogue systems

Modeling function word errors in DNN-HMM based LVCSR systems

Exposé for a Master s Thesis

Word Segmentation of Off-line Handwritten Documents

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Learning Methods in Multilingual Speech Recognition

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Modeling function word errors in DNN-HMM based LVCSR systems

Probability and Statistics Curriculum Pacing Guide

Content-based Image Retrieval Using Image Regions as Query Examples

Axiom 2013 Team Description Paper

Multivariate k-nearest Neighbor Regression for Time Series data -

Multi-Lingual Text Leveling

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Applications of memory-based natural language processing

Genre classification on German novels

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Cross Language Information Retrieval

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Term Weighting based on Document Revision History

Australian Journal of Basic and Applied Sciences

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Corpus Linguistics (L615)

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

On-the-Fly Customization of Automated Essay Scoring

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Computerized Adaptive Psychological Testing A Personalisation Perspective

Active Learning. Yingyu Liang Computer Sciences 760 Fall

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Online Updating of Word Representations for Part-of-Speech Tagging

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Software Maintenance

Reinforcement Learning by Comparing Immediate Reward

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Conference Presentation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Handling Concept Drifts Using Dynamic Selection of Classifiers

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

Generative models and adversarial training

Time series prediction

Lecture 1: Basic Concepts of Machine Learning

Prediction of Maximal Projection for Semantic Role Labeling

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

South Carolina English Language Arts

College Pricing and Income Inequality

Transcription:

A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization Thomas Lidy Rudolf Mayer Andreas Rauber 1 Pedro J. Ponce de León Antonio Pertusa Jose M. Iñesta 2 1 2 Information & Software Engineering Group (IFS) Department of Software Technology and Interactive Systems Vienna University of Technology, Austria http://www.ifs.tuwien.ac.at/mir Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante, Spain http://grfia.dlsi.ua.es/cm ISMIR Conference, 2010

Motivation Audio Score Lyrics Metadata Rhythm Patterns Global features Bag of words Metadata Statistical Spectrum Descriptors Local features......... Temporal features... Given a tagged corpus, several feature sets from different modalities are available (e.g., audio, symbolic, lyrics,...) Improve classification through combination of feature sets/classification schemes Release the user from explicitly choosing the best single feature set/classifier combination.

Motivation Funding: Bilateral (Spain-Austria) R&D programm Project: Music genre classification by combining audio and symbolic descriptors through an automatic transcription system. Period: January 2008 - July 2010 Audio file Audio features Project Overview Audio-to-Midi Transcription (A fancy model goes here) Genre category Midi file Symbolic features

Early fusion Late fusion Cartesian Ensemble Early fusion: Audio and symbolic feature subspace concatenation Audio file Audio-to-Midi Transcription Audio features + Classifier Genre category ISMIR 2007 MIREX 2007 MIREX 2008 Midi file Symbolic features

Early fusion Late fusion Cartesian Ensemble Late fusion: model outcomes combination Audio features N classifiers Decision combination rule Genre category ISMIR 2010 Symbolic features M classifiers Base models can come from different machine learning paradigms. Key factor: The more diverse and accurate the ensemble of classifiers, the more improvement is expected. Ensemble diversity: How varied model opinions are. A wide range of decision combination rules exists.

Early fusion Late fusion Cartesian Ensemble Late fusion: the Cartesian Ensemble Classification schemes Audio file... D feature subspaces, Transcription Audio descriptors... C classification schemes, then DxC models to combine MIDI file Chord extraction Chord sequence Symbolic descriptors Decision combination Category label Build on top of the Weka a data mining toolkit. a M. Hall, et al.(2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1. www.cs.waikato.ac.nz/ml/weka/

Early fusion Late fusion Cartesian Ensemble Input section Feature sets in Weka and SomLIB format currently supported. Feature subspaces aligned through a common ID attribute. Labeled samples are mandatory only in first subspace.

Early fusion Late fusion Cartesian Ensemble Model training Model training (single model) Each model built using a given classification scheme and feature subspace All possible feature subspace/scheme models are built Model accuracy estimation Outer train Inner train Inner test Outer test Model accuracy estimated through inner crossvalidation. Needed for model selection and weighted decision combination rules.

Early fusion Late fusion Cartesian Ensemble Model selection Pareto-optimal classifier selection e <1,2> non-dominated pair <2,3> [Remember:] The more diverse and accurate the ensemble, the more improvement is expected. Selects pairs of models based on accuracy and diversity metrics. <3,4> k All non-dominated by all criteria pairs are selected. Given <i,j>, κ ij is the inter-rater agreement, e ij is pair average error rate. k κ ij = m kk ABC 1 ABC e ij = 1 α i + α j 2 ABC = ( m r,s)( m s,r ) r s s

Early fusion Late fusion Cartesian Ensemble Late fusion strategies: combining model outcomes Unweighted combination MAJ Majority vote rule AVG Average of p.p. MAX Maximum of p.p. MED Median of p.p. (p.p.: posterior probability) Weighted majority vote rules SWV Simple Weighted RSWV Rescaled Simple Weighted BWWV Best-Worst Weighted QBWWV Quadratic Best-Worst Weighted WMV Weighted Majority Model weight: based on model estimated accuracy a RSWV k a BWWV k a QBWWV k 1 1 1 0 0 chance e e k e k e e k Best Worst e Best e Worst

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Corpora Dataset Files Genres File length 9GDB 856 9 full GTZAN 1000 10 30 sec ISMIRgenre 1458 6 full ISMIRrhythm 698 8 30 sec LatinMusic 3225 10 full Africa-function 1024 27 full Africa-instrument 1024 11 full Africa-country 1024 11 full Africa-ethnic 1024 40 full

Feature subspaces Motivation Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Audio features Feature subspace no. feats. Rhythm Pattern (RP) 1440 Rhythm Histogram (RH) 60 Statistical Spectrum Descriptor (SSD) 168 Modulation Variance Descriptor (MVD) 420 Temporal RH (TRH) 420 Temporal SSD (TSSD) 1176 Symbolic features Feature subspace no. feats. Global features 52 Chord Relative Frequency 9 (Chord extraction algorithm: [Pardo & Birmingham, 2002])

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Evaluation Outer c.v. 10 folds Inner c.v. 3 folds Classification schemes (10) Scheme Paradigm Naïve Bayes (NB) Bayes rule Nearest Neighbor (1-NN) lazy learner 3-NN, Manhattan dist. lazy learner RIPPER rule learner C4.5 decision tree REPTree decision tree Random Forest (RF) decision tree ensemble SVM, linear kernel (SVM-lin) statistical learning theory SVM, quadratic kernel (SVM-quad) " SVM, Puk kernel (SVM-Puk) " 8 feature subspaces 10 schemes = 80 models

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Ensemble vs. single best model results Ensemble vs. single best model accuracy (in %) Corpus Single best Ensemble Comb. rule 9GDB 78.15 (2.25) 81.66 (3.96) AVG GTZAN 72.60 (3.92) 77.50 (4.30) QBWWV ISMIRgenre 81.28 (3.13) 84.02 (1.50) QBWWV ISMIRrhythm 87.97 (4.28) 89.11 (4.62) BWWV LatinMusic 89.46 (1.62) 92.71 (0.99) QBWWV Africa-country 86.29 (2.30) 89.03 (1.63) QBWWV Africa-ethnic 81.10 (2.41) 82.97 (3.30) WMV Africa-function 51.06 (6.63) 54.84 (6.29) QBWWV Africa-instrument 69.90 (4.69) 73.00 (4.25) WMV

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Extending feature subspaces: segmenting the input Segment each audio file into 3 equal-sized segments. 6 3 = 18 audio subspaces Symbolic features were not segmented. Results inferior than using full song features.

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Ensemble cross-validation execution times Corpus files train (sec.) test (sec.) 9GDB 856 6645 140 GTZAN 1000 10702 345 ISMIRgenre 1458 12510 275 ISMIRrhythm 698 5466 185 Test times are averaged over decision combination methods. Roughly, 10 sec. per sample on a Quad machine (e.g., 3 hours for GTZAN)

Corpora Feature subspaces Evaluation parameters Results Conclusions and further work Conclusions A generic ensemble framework based on feature subspaces was devised. The ensemble improves classification accuracy over best single model. The user is released from having to choose a particular feature subspace/classifier. Relying on the QBWWV decision combination rule seems feasible. Further work Reduce training times by feature selection. Preliminary results presented at MML 2010. Add other input modalities: Lyric features, metadata, symbolic features by statistical language modeling techniques...

Thanks! Motivation Corpora Feature subspaces Evaluation parameters Results Conclusions and further work A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization Thomas Lidy, Rudolf Mayer, Andreas Rauber Pedro J. Ponce de León, Antonio Pertusa, Jose M. Iñesta Information & Software Engineering Group (IFS) Department of Software Technology and Interactive Systems Vienna University of Technology, Austria http://www.ifs.tuwien.ac.at/mir Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante, Spain http://grfia.dlsi.ua.es/cm