Undergraduate Topics in Computer Science
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions. For further volumes: http://www.springer.com/series/7592
Thomas B. Moeslund Introduction to Video and Image Processing Building Real Systems and Applications
Thomas B. Moeslund Visual Analysis of People Laboratory Department of Architecture, Design, and Media Technology Aalborg University Aalborg Denmark Series editor Ian Mackie Advisory board Samson Abramsky, University of Oxford, Oxford, UK Karin Breitman, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil Chris Hankin, Imperial College London, London, UK Dexter Kozen, Cornell University, Ithaca, USA Andrew Pitts, University of Cambridge, Cambridge, UK Hanne Riis Nielson, Technical University of Denmark, Kongens Lyngby, Denmark Steven Skiena, Stony Brook University, Stony Brook, USA Iain Stewart, University of Durham, Durham, UK ISSN 1863-7310 Undergraduate Topics in Computer Science ISBN 978-1-4471-2502-0 e-isbn 978-1-4471-2503-7 DOI 10.1007/978-1-4471-2503-7 Springer London Dordrecht Heidelberg New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2012930996 Springer-Verlag London Limited 2012 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface One of the first times I ever encountered video and image processing was in a semester project at my fourth year of studying. The aim of the project was to design a system that automatically located the center and size of mushrooms in an image. Given this information a robot should pick the mushrooms. I was intrigued by the notion of a seeing computer. Little did I know that this encounter would shape most parts (so far) of my professional life. I decided to study video and image processing in depth and signed up for a master s program focusing on these topics. I soon realized that I had made a good choice, but was puzzled by the fact that the wonders of digital video and image processing often were presented in a strict mathematical manner. While this is fine for hardcore engineers (including me) and computer scientists, it makes video and image processing unnecessarily difficult for others. I really felt this was a pity and decided to do something about it that was 15 years ago. In this book the concepts and methods are described in a less mathematical manner and the language is in general casual. In order to assist the reader with the math that is used in the book Appendix B is included. In this regards this textbook is selfcontained. Some of the key algorithms are exemplified in C-code. Please note that the code is neither optimal nor complete and merely serves as an additional input for comprehending the algorithms. Another aspect that puzzled me as a student was that the textbooks were all about image processing, while we constructed systems that worked with video. Many of the methods described for image processing can obviously also be applied to video data. But video data add the temporal dimension, which is often the key to success in systems processing video. This book therefore aims at not only introducing image processing but also video processing. Moreover, the last two chapters of the book describe the process of designing and implementing real systems processing video data. On the website for the book you can find detailed descriptions of other practical systems processing video: http://www.vip.aau.dk. I have tried to make the book as concise as possible. This has forced me to leave out details and topics that might be of interest to some readers. As a compromise each chapter is ended by a Further Information section wherein pointers to additional concepts, methods and details are given. v
vi Preface For Instructors Each chapter is ended by a number of exercises. The first exercise after each chapter aims at assessing to what degree the students have understood the main concepts. If possible, it is recommended that these exercises are discussed within small groups. The following exercises have a more practical focus where concrete problems need to be solved using the different methods/algorithms presented in the associated chapters. Lastly one or more so-called additional exercises are present. These aim at topics not discussed directly in the chapters. The idea behind these exercises is that they can serve as self-studies where each student (or a small group of students) finds the solution by investigating other sources. They could then present their findings for other students. Besides the exercises listed in the book I strongly recommend to combine those with examples and exercises where real images/videos are processed. Personally I start with ImageJ for image processing and EyesWeb for video processing. The main motivation for using these programs is that they are easy to learn and hence the students can focus on the video and image processing as opposed to a specific programming language, when solving the exercises. However, when it comes to building real systems I recommend using OpenCV or openframeworks (EyesWeb or similar can of course also be used to build systems, but they do not generalize as well). To this end students of course need to have a course on procedural programming before or in parallel with the image processing course. To make the switch from ImageJ/Eyesweb to a more low-level environment like OpenCV, I normally ask each student to do an assignment where they write a program that can capture an image, make some image processing and display the result. When the student can do this he has a framework for implementing all other image processing methods. The time allocated for this assignment of course depends on the programming experiences of the students. Acknowledgement The book was written primarily at weekends and late nights, and I thank my family for being understanding and supporting during that time! I would also like to thank the following people: Hans Ebert and Volker Krüger for initial discussions on the book project. Moritz Störring for providing Fig. 2.3. Rasmus R. Paulsen for providing Figs. 2.22(a) and 4.5. Rikke Gade for providing Fig. 2.22(b). Tobias Thyrrestrup for providing Fig. 2.22(c). David Meredith, Rasmus R. Paulsen, Lars Reng and Kamal Nasrollahi for insightful editorial comments, and finally a special thanks to Lars Knudsen and Andreas Møgelmose, who provided valuable assistance by creating many of the illustrations used throughout the book. Enjoy! Viborg, Denmark Thomas B. Moeslund
Contents 1 Introduction... 1 1.1 TheDifferentFlavorsofVideoandImageProcessing... 2 1.2 General Framework... 3 1.3 The Chapters in This Book..... 4 1.4 Exercises... 5 2 Image Acquisition... 7 2.1 Energy... 7 2.1.1 Illumination... 8 2.2 TheOpticalSystem... 10 2.2.1 TheLens... 11 2.3 The Image Sensor... 15 2.4 TheDigitalImage... 19 2.4.1 TheRegionofInterest(ROI)... 20 2.5 FurtherInformation... 21 2.6 Exercises... 23 3 Color Images... 25 3.1 WhatIsaColor?... 25 3.2 RepresentationofanRGBColorImage... 27 3.2.1 The RGB Color Space.... 30 3.2.2 Converting from RGB to Gray-Scale...... 30 3.2.3 TheNormalizedRGBColorRepresentation... 32 3.3 OtherColorRepresentations... 34 3.3.1 TheHSIColorRepresentation... 36 3.3.2 TheHSVColorRepresentation... 37 3.3.3 The YUV and YC b C r ColorRepresentations... 38 3.4 FurtherInformation... 40 3.5 Exercises... 42 4 Point Processing... 43 4.1 Gray-Level Mapping... 43 4.2 Non-linear Gray-Level Mapping... 46 4.2.1 Gamma Mapping...... 46 4.2.2 Logarithmic Mapping.... 48 vii
viii Contents 4.2.3 Exponential Mapping.... 48 4.3 TheImageHistogram... 49 4.3.1 HistogramStretching... 51 4.3.2 HistogramEqualization... 53 4.4 Thresholding... 55 4.4.1 ColorThresholding... 57 4.4.2 ThresholdinginVideo... 59 4.5 Logic Operations on Binary Images... 63 4.6 ImageArithmetic... 63 4.7 ProgrammingPointProcessingOperations... 66 4.8 FurtherInformation... 68 4.9 Exercises... 69 5 Neighborhood Processing... 71 5.1 The Median Filter... 71 5.1.1 Rank Filters... 75 5.2 Correlation... 75 5.2.1 TemplateMatching... 78 5.2.2 EdgeDetection... 81 5.2.3 Image Sharpening...... 85 5.3 FurtherInformation... 86 5.4 Exercises... 88 6 Morphology... 91 6.1 Level1:HitandFit... 92 6.1.1 Hit... 93 6.1.2 Fit... 93 6.2 Level2:DilationandErosion... 94 6.2.1 Dilation... 94 6.2.2 Erosion... 95 6.3 Level 3: Compound Operations... 96 6.3.1 Closing... 97 6.3.2 Opening... 98 6.3.3 Combining Opening and Closing... 99 6.3.4 Boundary Detection..... 99 6.4 FurtherInformation...100 6.5 Exercises...100 7 BLOB Analysis...103 7.1 BLOBExtraction...103 7.1.1 The Recursive Grass-Fire Algorithm......104 7.1.2 The Sequential Grass-Fire Algorithm......106 7.2 BLOB Features...107 7.3 BLOBClassification...110 7.4 FurtherInformation...113 7.5 Exercises...114
Contents ix 8 Segmentation in Video Data...117 8.1 Video Acquisition...117 8.2 Detecting Changes in the Video...120 8.2.1 TheAlgorithm...120 8.3 Background Subtraction...123 8.3.1 Defining the Threshold Value...124 8.4 Image Differencing...125 8.5 FurtherInformation...126 8.6 Exercises...127 9 Tracking...129 9.1 Tracking-by-Detection...129 9.2 Prediction...131 9.3 Tracking Multiple Objects......133 9.3.1 Good Features to Track...135 9.4 FurtherInformation...137 9.5 Exercises...137 10 Geometric Transformations...141 10.1AffineTransformations...142 10.1.1Translation...142 10.1.2Scaling...142 10.1.3Rotation...142 10.1.4 Shearing...144 10.1.5CombiningtheTransformations...144 10.2MakingItWorkinPractice...145 10.2.1 Backward Mapping.....146 10.2.2Interpolation...147 10.3Homography...148 10.4FurtherInformation...152 10.5Exercises...152 11 Visual Effects...155 11.1VisualEffectsBasedonPixelManipulation...155 11.1.1PointProcessing...156 11.1.2 Neighborhood Processing...157 11.1.3 Motion............................ 157 11.1.4 Reduced Colors...158 11.1.5 Randomness...159 11.2VisualEffectsBasedonGeometricTransformations...160 11.2.1PolarTransformation...160 11.2.2TwirlTransformation...162 11.2.3 Spherical Transformation...163 11.2.4RippleTransformation...164 11.2.5 Local Transformation....165 11.3FurtherInformation...165 11.4Exercises...167
x Contents 12 Application Example: Edutainment Game...169 12.1 The Concept...170 12.2Setup...171 12.2.1InfraredLighting...171 12.2.2Calibration...173 12.3Segmentation...174 12.4Representation...175 12.5Postscript...176 13 Application Example: Coin Sorting Using a Robot...177 13.1 The Concept...178 13.2 Image Acquisition...180 13.3Preprocessing...181 13.4Segmentation...182 13.5RepresentationandClassification...182 13.6Postscript...185 Appendix A Bits, Bytes and Binary Numbers...187 A.1 ConversionfromDecimaltoBinary...188 Appendix B Mathematical Definitions...191 B.1 AbsoluteValue...191 B.2 minandmax...191 B.3 Converting a Rational Number to an Integer......192 B.4 Summation...192 B.5 Vector...194 B.6 Matrix...195 B.7 Applying Linear Algebra......197 B.8 Right-Angled Triangle...198 B.9 Similar Triangles...198 Appendix C Learning Parameters in Video and Image Processing Systems...201 C.1 Training...201 C.2 Initialization...203 Appendix D Conversion Between RGB and HSI...205 D.1 ConversionfromRGBtoHSI...205 D.2 ConversionfromHSItoRGB...208 Appendix E Conversion Between RGB and HSV...211 E.1 ConversionfromRGBtoHSV...211 E.1.1 HSV:Saturation...212 E.1.2 HSV:Hue...213 E.2 ConversionfromHSVtoRGB...214 Appendix F Conversion Between RGB and YUV/YC b C r...217 F.1 The Output of a Colorless Signal...217
Contents xi F.2 The Range of X 1 and X 2...218 F.3 YUV...218 F.4 YC b C r...219 References...221 Index...223