Density Ratio Estimation in Machine Learning

Similar documents
Advanced Grammar in Use

Developing Grammar in Context

Guide to Teaching Computer Science

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Generative models and adversarial training

THE PROMOTION OF SOCIAL AWARENESS

BENG Simulation Modeling of Biological Systems. BENG 5613 Syllabus: Page 1 of 9. SPECIAL NOTE No. 1:

(Sub)Gradient Descent

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Analysis of Enzyme Kinetic Data

School Inspection in Hesse/Germany

Word Segmentation of Off-line Handwritten Documents

Principles of Public Speaking

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Degree Qualification Profiles Intellectual Skills

Learning Methods in Multilingual Speech Recognition

WHEN THERE IS A mismatch between the acoustic

International Series in Operations Research & Management Science

Australian Journal of Basic and Applied Sciences

Lecture Notes on Mathematical Olympiad Courses

PRODUCT PLATFORM AND PRODUCT FAMILY DESIGN

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

leading people through change

Dakar Framework for Action. Education for All: Meeting our Collective Commitments. World Education Forum Dakar, Senegal, April 2000

World University Rankings. Where s India?

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

What is Thinking (Cognition)?

Knowledge management styles and performance: a knowledge space model from both theoretical and empirical perspectives

EGRHS Course Fair. Science & Math AP & IB Courses

MGT/MGP/MGB 261: Investment Analysis

Acquiring Competence from Performance Data

EDUCATION IN THE INDUSTRIALISED COUNTRIES

Practical Integrated Learning for Machine Element Design

Kentucky s Standards for Teaching and Learning. Kentucky s Learning Goals and Academic Expectations

University of Groningen. Systemen, planning, netwerken Bosman, Aart

GOING GLOBAL 2018 SUBMITTING A PROPOSAL

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Conducting the Reference Interview:

TextGraphs: Graph-based algorithms for Natural Language Processing

MMOG Subscription Business Models: Table of Contents

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Dimensions of Classroom Behavior Measured by Two Systems of Interaction Analysis

Probabilistic Latent Semantic Analysis

CS Machine Learning

Instrumentation, Control & Automation Staffing. Maintenance Benchmarking Study

PIRLS 2006 ASSESSMENT FRAMEWORK AND SPECIFICATIONS TIMSS & PIRLS. 2nd Edition. Progress in International Reading Literacy Study.

Integrating simulation into the engineering curriculum: a case study

Welcome to. ECML/PKDD 2004 Community meeting

Active Learning. Yingyu Liang Computer Sciences 760 Fall

DG 17: The changing nature and roles of mathematics textbooks: Form, use, access

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Detailed course syllabus

Discriminative Learning of Beam-Search Heuristics for Planning

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Time series prediction

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

BENCHMARK TREND COMPARISON REPORT:

Communication and Cybernetics 17

On the Combined Behavior of Autonomous Resource Management Agents

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

DICE - Final Report. Project Information Project Acronym DICE Project Title

INPE São José dos Campos

International Examinations. IGCSE English as a Second Language Teacher s book. Second edition Peter Lucantoni and Lydia Kellas

DECISION MAKING THE INTERNATIONAL NEGOTIATION AUTHORITY

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Lecture 10: Reinforcement Learning

Speech Recognition at ICSI: Broadcast News and beyond

Reviewed by Florina Erbeli

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

All Professional Engineering Positions, 0800

Seminar - Organic Computing

Grade 6: Module 3B: Unit 2: Overview

Virtual Teams: The Design of Architecture and Coordination for Realistic Performance and Shared Awareness

New Project Learning Environment Integrates Company Based R&D-work and Studying

A cognitive perspective on pair programming

COMMUNICATION-BASED SYSTEMS

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

GDP Falls as MBA Rises?

More ESL Teaching Ideas

Grade 3: Module 2B: Unit 3: Lesson 10 Reviewing Conventions and Editing Peers Work

For information only, correct responses are listed in the chart below. Question Number. Correct Response

CHALLENGES FACING DEVELOPMENT OF STRATEGIC PLANS IN PUBLIC SECONDARY SCHOOLS IN MWINGI CENTRAL DISTRICT, KENYA

Growth of empowerment in career science teachers: Implications for professional development

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Answers To Managerial Economics And Business Strategy

The Impact of the Multi-sensory Program Alfabeto on the Development of Literacy Skills of Third Stage Pre-school Children

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Artificial Neural Networks written examination

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

L.E.A.P. Learning Enrichment & Achievement Program

USA GYMNASTICS ATHLETE & COACH SELECTION PROCEDURES 2017 WORLD CHAMPIONSHIPS Pesaro, ITALY RHYTHMIC

South Carolina English Language Arts

Transcription:

Density Ratio Estimation in Machine Learning Machine learning is an interdisciplinary field of science and engineering that studies mathematical theories and practical applications of systems that learn. This book introduces theories, methods, and applications of density ratio estimation, which is a newly emerging paradigm in the machine learning community. Various machine learning problems such as non-stationarity adaptation, outlier detection, dimensionality reduction, independent component analysis, clustering, classification, and conditional density estimation can be systematically solved via the estimation of probability density ratios. The authors offer a comprehensive introduction of various density ratio estimators including methods via density estimation, moment matching, probabilistic classification, density fitting, and density ratio fitting as well as describing how these can be applied to machine learning. The book also provides mathematical theories for density ratio estimation including parametric and non-parametric convergence analysis and numerical stability analysis to complete the first and definitive treatment of the entire framework of density ratio estimation in machine learning. Dr. Masashi Sugiyama is an Associate Professor in the Department of Computer Science at the Tokyo Institute of Technology. Dr. Taiji Suzuki is an Assistant Professor in the Department of Mathematical Informatics at the University of Tokyo, Japan. Dr. Takafumi Kanamori is an Associate Professor in the Department of Computer Science and Mathematical Informatics at Nagoya University, Japan.

Density Ratio Estimation in Machine Learning MASASHI SUGIYAMA Tokyo Institute of Technology TAIJI SUZUKI The University of Tokyo TAKAFUMI KANAMORI Nagoya University

cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press 32 Avenue of the Americas, New York, NY 10013-2473, USA Information on this title:/9780521190176 Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori 2012 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2012 Printed in the United States of America A catalog record for this publication is available from the British Library. Library of Congress Cataloging in Publication data is available ISBN 978-0-521-19017-6 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet Web sites referred to in this publication and does not guarantee that any content on such Web sites is, or will remain, accurate or appropriate.

Contents Foreword Preface page ix xi Part I Density-Ratio Approach to Machine Learning 1 Introduction 3 1.1 Machine Learning 3 1.2 Density-Ratio Approach to Machine Learning 9 1.3 Algorithms of Density-Ratio Estimation 13 1.4 Theoretical Aspects of Density-Ratio Estimation 17 1.5 Organization of this Book at a Glance 18 Part II Methods of Density-Ratio Estimation 2 Density Estimation 25 2.1 Basic Framework 25 2.2 Parametric Approach 27 2.3 Non-Parametric Approach 33 2.4 Numerical Examples 36 2.5 Remarks 37 3 Moment Matching 39 3.1 Basic Framework 39 3.2 Finite-Order Approach 39 3.3 Infinite-Order Approach: KMM 43 3.4 Numerical Examples 44 3.5 Remarks 45 4 Probabilistic Classification 47 4.1 Basic Framework 47 4.2 Logistic Regression 48 4.3 Least-Squares Probabilistic Classifier 50 v

vi Contents 4.4 Support Vector Machine 51 4.5 Model Selection by Cross-Validation 53 4.6 Numerical Examples 53 4.7 Remarks 54 5 Density Fitting 56 5.1 Basic Framework 56 5.2 Implementations of KLIEP 57 5.3 Model Selection by Cross-Validation 64 5.4 Numerical Examples 65 5.5 Remarks 65 6 Density-Ratio Fitting 67 6.1 Basic Framework 67 6.2 Implementation of LSIF 68 6.3 Model Selection by Cross-Validation 70 6.4 Numerical Examples 73 6.5 Remarks 74 7 Unified Framework 75 7.1 Basic Framework 75 7.2 Existing Methods as Density-Ratio Fitting 77 7.3 Interpretation of Density-Ratio Fitting 81 7.4 Power Divergence for Robust Density-Ratio Estimation 84 7.5 Remarks 87 8 Direct Density-Ratio Estimation with Dimensionality Reduction 89 8.1 Discriminant Analysis Approach 89 8.2 Divergence Maximization Approach 99 8.3 Numerical Examples 108 8.4 Remarks 115 Part III Applications of Density Ratios in Machine Learning 9 Importance Sampling 119 9.1 Covariate Shift Adaptation 119 9.2 Multi-Task Learning 131 10 Distribution Comparison 140 10.1 Inlier-Based Outlier Detection 140 10.2 Two-Sample Test 148 11 Mutual Information Estimation 163 11.1 Density-Ratio Methods of Mutual Information Estimation 164 11.2 Sufficient Dimension Reduction 174 11.3 Independent Component Analysis 183

Contents vii 12 Conditional Probability Estimation 191 12.1 Conditional Density Estimation 191 12.2 Probabilistic Classification 203 Part IV Theoretical Analysis of Density-Ratio Estimation 13 Parametric Convergence Analysis 215 13.1 Density-Ratio Fitting under Kullback Leibler Divergence 215 13.2 Density-Ratio Fitting under Squared Distance 219 13.3 Optimality of Logistic Regression 223 13.4 Accuracy Comparison 225 13.5 Remarks 235 14 Non-Parametric Convergence Analysis 236 14.1 Mathematical Preliminaries 236 14.2 Non-Parametric Convergence Analysis of KLIEP 242 14.3 Convergence Analysis of KuLSIF 247 14.4 Remarks 250 15 Parametric Two-Sample Test 252 15.1 Introduction 252 15.2 Estimation of Density Ratios 253 15.3 Estimation of ASC Divergence 257 15.4 Optimal Estimator of ASC Divergence 259 15.5 Two-Sample Test Based on ASC Divergence Estimation 265 15.6 Numerical Studies 269 15.7 Remarks 274 16 Non-Parametric Numerical Stability Analysis 275 16.1 Preliminaries 275 16.2 Relation between KuLSIF and KMM 279 16.3 Condition Number Analysis 282 16.4 Optimality of KuLSIF 286 16.5 Numerical Examples 292 16.6 Remarks 297 Part V Conclusions 17 Conclusions and Future Directions 303 List of Symbols and Abbreviations 307 References 309 Index 327

Foreword Estimating probability distributions is widely viewed as a central question in machine learning. The whole enterprise of probabilistic modeling using probabilistic graphical models is generally addressed by learning marginal and conditional probability distributions. Classification and regression starting with Fisher s fundamental contributions are similarly viewed as problems of estimating conditional densities. The present book introduces an exciting alternative perspective namely, that virtually all problems in machine learning can be formulated and solved as problems of estimating density ratios the ratios of two probability densities. This book provides a comprehensive review of the elegant line of research undertaken by the authors and their collaborators over the last decade. It reviews existing work on density-ratio estimation and derives a variety of algorithms for directly estimating density ratios. It then shows how these novel algorithms can address not only standard machine learning problems such as classification, regression, and feature selection but also a variety of other important problems such as learning under a covariate shift, multi-task learning, outlier detection, sufficient dimensionality reduction, and independent component analysis. At each point this book carefully defines the problems at hand, reviews existing work, derives novel methods, and reports on numerical experiments that validate the effectiveness and superiority of the new methods. A particularly impressive aspect of the work is that implementations of most of the methods are available for download fromthe authors web pages. The last part of the book is devoted to mathematical analyses of the methods. This includes not only an analysis for the case where the assumptions underlying the algorithms hold, but also situations in which the models are misspecified. Careful study of these results will not only provide fundamental insights into the problems and algorithms but will also provide the reader with an introduction to many valuable analytic tools. ix

x Foreword In summary, this is a definitive treatment of the topic of density-ratio estimation. It reflects the authors careful thinking and sustained research efforts. Researchers and students alike will find it an important source of ideas and techniques. There is no doubt that this book will change the way people think about machine learning and stimulate many new directions for research. Thomas G. Dietterich School of Electrical Engineering Oregon State University, Corvallis, OR, USA

Preface Machine learning is aimed at developing systems that learn. The mathematical foundation of machine learning and its real-world applications have been extensively explored in the last decades. Various tasks of machine learning, such as regression and classification, typically can be solved by estimating probability distributions behind data. However, estimating probability distributions is one of the most difficult problems in statistical data analysis, and thus solving machine learning tasks without going through distribution estimation is a key challenge in modern machine learning. So far, various algorithms have been developed that do not involve distribution estimation but solve target machine learning tasks directly. The support vector machine is a successful example that follows this line it does not estimate datagenerating distributions but directly obtains the class-decision boundary that is sufficient for classification. However, developing such an excellent algorithmfor each of the machine learning tasks could be highly costly and difficult. To overcome these limitations of current machine learning research, we introduce and develop a novel paradigmcalled density-ratio estimation instead of probability distributions, the ratio of probability densities is estimated for statistical data processing. The density-ratio approach covers various machine learning tasks, for example, non-stationarity adaptation, multi-task learning, outlier detection, two-sample tests, feature selection, dimensionality reduction, independent component analysis, causal inference, conditional density estimation, and probabilitic classification. Thus, density-ratio estimation is a versatile tool for machine learning. This book is aimed at introducing the mathematical foundation, practical algorithms, and applications of density-ratio estimation. Most of the contents of this book are based on the journal and conference papers we have published in the last couple of years. We acknowledge our collaborators for their fruitful discussions: Hirotaka Hachiya, Shohei Hido, Yasuyuki Ihara, Hisashi Kashima, Motoaki Kawanabe, Manabu Kimura, Masakazu Matsugu, Shin-ichi Nakajima, Klaus-Robert Müller, Jun Sese, Jaak Simm, Ichiro Takeuchi, Masafumi xi

xii Preface Picture taken in Nagano, Japan, in the summer of 2009. From left to right, Taiji Suzuki, Masashi Sugiyama, and Takafumi Kanamori. Takimoto, Yuta Tsuboi, Kazuya Ueki, Paul von Bünau, Gordon Wichern, and Makoto Yamada. Finally, we thank the Ministry of Education, Culture, Sports, Science and Technology; the Alexander von Humboldt Foundation; the Okawa Foundation; Microsoft Institute for Japanese Academic Research Collaboration Collaborative Research Project; IBM Faculty Award; Mathematisches Forschungsinstitut Oberwolfach Research-in-Pairs Program; the Asian Office of Aerospace Research and Development; Support Center for Advanced Telecommunications Technology Research Foundation; and the Japan Science and Technology Agency for their financial support. Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori