Machine Learning: Neural Networks. Junbeom Park Radiation Imaging Laboratory, Pusan National University

Machine Learning: Neural Networks Junbeom Park (pjb385@gmail.com) Radiation Imaging Laboratory, Pusan National University

1 Contents 1. Introduction 2. Machine Learning Definition and Types Supervised Learning Regression Gradient Descent Algorithm Classification 3. Neural Networks Structure History and Theoretical background Application in Research Signal Computation 4. Homework

2 Introduction

4 Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed - 1959, Arthur Samuel (1901~1990) Supervised Types of Machine Learning Unsupervised

5 Supervised Learning 1. Regression Test score based on time spent for study Hours 2 4 8 12 Score 30 50 80 90 2. Classification Pass/Fail based on time spent for study Hours 2 4 8 12 Pass/Fail F F P P

6 Regression 1. Hypothesis Learning regression model is equal to formulating a hypothesis for a given data set. The task is to find a better hypothesis. HH xx = WWWW + bb 2. Cost function Introduced to fit the line to a given data set. Using squared cost function Distance can be expressed regardless of sign. Points of the farther distance have a greater effect. mm CCCCCCCC WW, bb = 1 mm ii=1 HH xx ii yy ii 2

7 Gradient Descent Algorithm This algorithm is used in many minimization problems. Formal definition of cost function to minimize cost. CCCCCCCC WW = 1 mm 2mm ii=1 WWxx ii yy ii 2 1. Start with initial guesses 2. Change some parameters a little bit to reduce cost. 3. After modifications, select the gradient which reduces cost the most possible. 4. Repeat the above process until you converge to a local minimum.

8 Gradient Descent Algorithm The gradient can be calculated by differentiating the cost function. WW WW ηη WW CCCCCCCC(WW) WW WW ηη WW 1 mm 2mm ii=1 WWxx ii yy ii 2 WW WW ηη 1 mm 2mm 2 WWxx ii yy ii xx ii ii=1 In case of the original cost function CCCCCCCC WW, bb = 1 mm HH xx ii yy ii 2 ii=1 CCCCCCCC mm mm WW WW ηη 1 mm ii=1 WWxx ii yy ii xx ii WW bb

9 Expend to Multi-Variables Regression using multi-inputs HH xx = WWWW + bb HH xx 1, xx 2,, xx nn = WW 1 xx 1 + WW 2 xx 2 + + WW nn xx nn + bb Using matrix (Implementation) HH XX = XXXX xx 1 xx nn WW 1 WW nn = xx 1 WW 1 + + xx 3 WW 3 Cost function mm CCCCCCCC WW, bb = 1 mm ii=1 HH xx 1 ii, xx 2 ii,, xx nn ii yy ii 2 CCCCCCCC WW = 1 2 XXWW yy mm

10 Classification The linear hypothesis has several disadvantages in the classification. Other appropriate type of hypothesis Since derivative is not possible, gradient descent algorithm can not be applied.

11 Logistic Hypothesis Linear hypothesis HH XX = XXXX Logistic hypothesis (Sigmoid) HH XX = 1 1 + ee XXXX The sigmoid function is differentiable. HHH XX = HH XX 1 HH XX These regression & classification functions are used as the activation function of neural networks.

12 Structure of Neural Networks

13 History of Neural Networks 1943 McCulloch: logical computation model based on simple neural networks A Logical Calculus of The Ideas Immanent in Nervous Activity 1949 Hebb: presentation of learning laws based on synapses. The Organization of Behavior 1957 Rosenblatt: development of perceptron terminology and algorithm. The Perceptron, A Perceiving and Recognizing Automaton Project Para

14 History of Neural Networks 1969 Minsky: XOR problems A long recession Perceptrons?? OR AND XOR 1. To solve XOR problem, we need to use multi-layer perceptrons. 2. No one on earth had found a viable way to train multi-layer perceptrons.

15 History of Neural Networks 1986 Rumelhart: development of error back-propagation algorithm. Learning Representations by Back-Propagating Errors For training networks, performing a cost function minimization via gradient descent algorithm.

16 Application in research Training Result Processing Result Original Radiograph Processing

17 Signal Computation: Feed-Forward Input signal vector : xx 1 NN

18 Signal Computation: Feed-Forward dd = xx UU 1 NN + 1 NN + 1 MM = 1 MM

19 Signal Computation: Feed-Forward aa = gg dd = 1 1 ee dd 1 MM

20 Signal Computation: Feed-Forward zz = aa VV 1 MM + 1 MM + 1 LL = 1 LL

21 Signal Computation: Feed-Forward yy = h zz = ααzz + ββ 1 LL

22 Signal Computation: Feed-Forward ee = 1 (yy tt)2 2 1 LL

23 Signal Computation: Error Back-propagation LL LL LL 1 1 (MM + 1) = LL MM + 1 R ηη Diagonal Matrix of the Derivatives of h The Learning Rate

24 Signal Computation: Error Back-propagation MM MM MM LL LL LL LL 1 1 (NN + 1) = MM NN + 1 Q ηη Diagonal Matrix of the Derivatives of g The Learning Rate

25 Training: Epoch

26 Training: Overfitting

27 Homework Training Result Processing Result Reference Processing

28 Homework Simulation of Material Composition/Decomposition Training Data Set Testing Data Set

29 Homework Error Plot Training Results References Testing Results + REVIEW

30 Into the Deep Learning Vanishing Gradient Problem The deeper? The harder to train! Rectified Linear Unit (ReLU) Prevent overfitting A burning issue Data Augmentation Drop out Regularization Deep Learning Algorithms Convolutional Neural Networks (CNN): Pattern recognition, Classification Recurrent Neural Networks (RNN): Sequence data processing, Translation