Relational Knowledge Discovery What is knowledge and how is it represented? This book focuses on the idea of formalising knowledge as relations, interpreting knowledge represented in databases or logic programs as relational data, and discovering new knowledge by identifying hidden and defining new relations. After a brief introduction to representational issues, the author develops a relational language for abstract machine learning problems. He then uses this language to discuss traditional methods such as clustering and decision tree induction, before moving onto two previously underestimated topics that are again coming to the fore: rough set data analysis and inductive logic programming. Its clear and precise presentation is ideal for undergraduate computer science students. The book will also interest those who study artificial intelligence or machine learning at the graduate level. Exercises are provided and each concept is introduced using the same example domain, making it easier to compare the individual properties of different approaches. M. E. MÜLLER is a Professor of Computer Science at the University of Applied Sciences, Bonn-Rhein-Sieg.
Relational Knowledge Discovery University of Applied Sciences, Bonn-Rhein-Sieg
CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York Information on this title: /9780521190213 c M. E. Müller 2012 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2012 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Müller, M. E., (Martin E.), 1970 Relational knowledge discovery / M.E. Müller. p. cm. ISBN 978-0-521-19021-3 (hardback) 1. Computational learning theory. 2. Machine learning. 3. Relational databases. I. Title. Q325.7.M85 2012 006.3 1 dc23 2011049968 ISBN 978-0-521-19021-3 Hardback ISBN 978-0-521-12204-7 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents About this book Page 1 1 Introduction 4 1.1 Motivation 5 1.2 Related disciplines 8 2 Relational knowledge 17 2.1 Objects and their attributes 18 2.2 Knowledge structures 32 3 From data to hypotheses 38 3.1 Representation 38 3.2 Changing the representation 46 3.3 Samples 53 3.4 Evaluation of hypotheses 57 3.5 Learning 67 3.6 Bias 68 3.7 Overfitting 73 3.8 Summary 74 4 Clustering 76 4.1 Concepts as sets of objects 76 4.2 k-nearest neighbours 78 4.3 k-means clustering 81 4.4 Incremental concept formation 85 4.5 Relational clustering 90 5 Information gain 92 5.1 Entropy 93 5.2 Information and information gain 98 5.3 Induction of decision trees 102 5.4 Gain again 109 5.5 Pruning 111 5.6 Conclusion 119 v
vi Contents 6 Rough set theory 121 6.1 Knowledge and discernability 121 6.2 Rough knowledge 127 6.3 Rough knowledge structures 137 6.4 Relative knowledge 141 6.5 Knowledge discovery 149 6.6 Conclusion 156 7 Inductive logic learning 159 7.1 From information systems to logic programs 160 7.2 Horn logic 167 7.3 Heuristic rule induction 180 7.4 Inducing Horn theories from data 189 7.5 Summary 221 8 Learning and ensemble learning 224 8.1 Learnability 224 8.2 Decomposing the learning problem 234 8.3 Improving by focusing on errors 239 8.4 A relational view on ensemble learning 244 8.5 Summary 249 9 The logic of knowledge 251 9.1 Knowledge representation 251 9.2 Learning 253 9.3 Summary 256 Notation 258 References 261 Index 267