MusicMood. Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

Similar documents
Python Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lecture 1: Machine Learning Basics

CS Machine Learning

CS 446: Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

(Sub)Gradient Descent

Switchboard Language Model Improvement with Conversational Data from Gigaword

Assignment 1: Predicting Amazon Review Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Probabilistic Latent Semantic Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CSL465/603 - Machine Learning

A Case Study: News Classification Based on Term Frequency

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Detecting English-French Cognates Using Orthographic Edit Distance

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Calibration of Confidence Measures in Speech Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Reducing Features to Improve Bug Prediction

Speech Emotion Recognition Using Support Vector Machine

Cross-lingual Short-Text Document Classification for Facebook Comments

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Linking Task: Identifying authors and book titles in verbose queries

Indian Institute of Technology, Kanpur

Artificial Neural Networks written examination

Modeling function word errors in DNN-HMM based LVCSR systems

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Learning From the Past with Experiment Databases

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Modeling function word errors in DNN-HMM based LVCSR systems

Comment-based Multi-View Clustering of Web 2.0 Items

Human Emotion Recognition From Speech

Australian Journal of Basic and Applied Sciences

Semi-Supervised Face Detection

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning Methods in Multilingual Speech Recognition

Generative models and adversarial training

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Using dialogue context to improve parsing performance in dialogue systems

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Word Segmentation of Off-line Handwritten Documents

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Online Updating of Word Representations for Part-of-Speech Tagging

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

An investigation of imitation learning algorithms for structured prediction

Experts Retrieval with Multiword-Enhanced Author Topic Model

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Corrective Feedback and Persistent Learning for Information Extraction

WHEN THERE IS A mismatch between the acoustic

Applications of data mining algorithms to analysis of medical data

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Truth Inference in Crowdsourcing: Is the Problem Solved?

A Bayesian Learning Approach to Concept-Based Document Classification

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Georgetown University at TREC 2017 Dynamic Domain Track

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.lg] 3 May 2013

As a high-quality international conference in the field

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Universidade do Minho Escola de Engenharia

Speech Recognition at ICSI: Broadcast News and beyond

Multi-Lingual Text Leveling

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Finding Translations in Scanned Book Collections

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Exposé for a Master s Thesis

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Latent Semantic Analysis

Preference Learning in Recommender Systems

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

A Comparison of Two Text Representations for Sentiment Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.lg] 15 Jun 2015

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Transductive Inference for Text Classication using Support Vector. Machines. Thorsten Joachims. Universitat Dortmund, LS VIII

A Vector Space Approach for Aspect-Based Sentiment Analysis

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A study of speaker adaptation for DNN-based speech synthesis

Disambiguation of Thai Personal Name from Online News Articles

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Teaching ideas. AS and A-level English Language Spark their imaginations this year

Applications of memory-based natural language processing

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Lecture 1: Basic Concepts of Machine Learning

Transcription:

MusicMood Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics Sebastian Raschka December 10, 2014

Music Mood Prediction We like to listen to music [1][2] Digital music libraries are growing Recommendation system for happy music (clinics, restaurants...) & genre selection [1] Thomas Schaefer, Peter Sedlmeier, Christine Sta dtler, and David Huron. The psychological functions of music listening. Frontiers in psychology, 4, 2013. [2] Daniel Vaestfjaell. Emotion induction through music: A review of the musical mood induction procedure. Musicae Scientiae, 5(1 suppl):173 211, 2002.

Predictive Modeling Reinforcement learning Unsupervised learning Supervised learning Hidden Markov models Clustering Ranking Classification Regression DBSCAN on a toy dataset Linear classifier on Iris (after LDA)

Supervised Learning In a Nutshell

Missing Data Feature Extraction Raw Data Collection Pre-processing Supervised Learning - A Learning Quick Overview Sampling Training Dataset Split Feature Selection Normalization Pre-processing Test Dataset New Data Dimensionality Reduction Cross Validation Refinement Training Learning Algorithms Prediction Hyperparameter optimization Prediction-error Metrics Model Selection Post-Processing Final Classification/ Regression Model Sebastian Raschka 2014 This work is licensed under a Creative Commons Attribution 4.0 International License.

MusicMood - The Plan

The Dataset http://labrosa.ee.columbia.edu/millionsong/

Sampling 1000 songs for training Lyrics available? http://lyrics.wikia.com/lyrics_wiki Lyrics in English? Python NLTK 200 songs for validation

Mood Labels Downloading mood labels from Last.fm Manual labeling based on lyrics and listening sad if... Dark topic (killing, war, complaints about politics,...) Artist in sorrow (lost love,...)

Word Clouds happy: sad:

A Short Introduction to Naive Bayes Classification

Naive Bayes - Why? Small sample size, can outperform the more powerful alternatives [1] "Eager learner" (on-line learning vs. batch learning) Fast for classification and re-training Success in Spam Filtering [2] High accuracy for predicting positive and negative classes in a sentiment analysis of Twitter data [3] [1] Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine learning, 29(2-3):103 130, 1997. [2] Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 workshop, volume 62, pages 98 105, 1998. [3] Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1 12, 2009.

Bayes Classifiers It s All About Posterior Probabilities objective function: maximize the posterior probability

The Prior Probability Maximum Likelihood Estimate (MLE)

The Effect of Priors on the Decision Boundary

Class-Conditional Probability Maximum Likelihood Estimate (MLE) "chance of observing feature given that it belongs to class " "

Evidence just a normalization factor, can be omitted in decision rule:

Naive Bayes Models Gaussian Naive Bayes for continuous variables

Naive Bayes Models Multi-variate Bernoulli Naive Bayes for binary features

Naive Bayes Models Multinomial Naive Bayes

Naive Bayes and Text Classification

Feature Vectors The Bag of Words Model

Tokenization and N-grams a swimmer likes swimming thus he swims

Stemming and Lemmatization Porter Stemming Lemmatization

Stop Word Removal

Term and Frequency

Term Frequency - Inverse Document Frequency (Tf-idf)

Grid Search and 10-fold Cross Validation to Optimize F1 TP = true positive (happy predicted as happy) FP = false positive (sad predicted as happy) FN = false negative (happy predicted as sad)

K-Fold Cross Validation

10-Fold Cross Validation After Grid Search (final model)

10-fold Cross Validation (mean ROC) Multinomial vs Multi-variate Bernoulli Naive Bayes

10-fold Cross Validation (mean ROC) Multinomial Naive Bayes & Hyperparameter Alpha

10-fold Cross Validation (mean ROC) Multinomial Naive Bayes & Vocabulary Size

10-fold Cross Validation (mean ROC) Multinomial Naive Bayes & Document Frequency Cut-off

10-fold Cross Validation (mean ROC) Multinomial Naive Bayes & N-gram Sequence Length

Contingency Tables of the Final Model training test

http://sebastianraschka.com/ Webapps/musicmood.html Live Demo

Future Plans Growing a list of mood labels (majority rule). Performance comparisons of different machine learning algorithms. Genre prediction and selection based on sound.

Thank you!