Babu Madhav Institute of Information Technology, UTU : Machine Learning

Babu Madhav Institute of Information Technology, UTU 060010907 : Machine Learning 2017 Unit 1. Introduction 1. Define: Machine learning. 2. How machine learning algorithm is applied in facebook? 3. Which machine learning algorithm is used in x-box? 4. Which machine learning algorithm is used in robot dog? 5. What do you mean by data exhaust? 6. Draw the chart representing year wise data generated in the world. 7. "Shivi, 1 year old girl, wants to identify colors." Which kind of learning is used in it? 8. "Shivi identified that Ram is a bad boy." Which kind of learning is used in it? 9. "Kunal has concluded that Hiten's performance in laboratory is finest." Which kind of learning algorith Kunal has used? 10. "Kunal has concluded that Piyush will deserve trophy out of 100 employees based on 6 parameters". Which king of learning algorithm Kunal has used? 11. Define: Training data 12. What is testing data? 13. List out any two solutions for handling missing data. 14. Which python library is used for taking care of missing data? 15. Which python class is used for taking care missing data? 16. What do you mean by categorical variables? Give two examples of the same. 17. Which python library is used for encoding categorical data? 18. Which python class is used for encoding categorical data? 19. x= 5y + z. Identify dependant and indipendant variables. 20. Define: Euclidian Distance. 21. List out any two feature scalling methods. 22. How to calculate standardization for feature scalling? 23. How to calculate normalisation for feature scalling? 24. List out any four applications of machine learning. 25. How machine learning algorithms are useful in below scenario? Amazon, Keyboards, Virtual reality headset, Maps 26. How machine learning algorithms are useful in below scenario? x-box, robot dog, Amazon/Netflix, Space 27. Why machine learning is the future? 28. Using chart, discuss how machine learning is useful for data scientists. 29. Differentiate supervised and unsupervised learning. 30. Differentiate testing data and test data. 31. How to choose training and testing data? 32. Write python snippet code for taking care of missing data with mean of coloum. 33. Write python snippet code for encoding categorical data. 34. What kind of problem occurs if I encode categorical data as below? How to solve it? 35. Write python snippet code for creating dummy variables for categorical data. 36. Write python snippet code for splitting dataset into training and testing part. 37. What is feature scalling? Why we need to do that? 38. Do we need to apply feature scalling on dummy variables? Why? 39. What is dummy variable? Why to create it? Give one example. 40. Write python snippet code for feature scalling. SAPAN NAIK 1

Unit 2. Regression 1. What do you mean by random forest regression? 2. "Random forest regration is non continues model." True or false. Justify. 3. Write python snippet code for fitting random forest regresion to dataset. 4. Write short note on simple linear regression. 5. Write short note on backward elimination method for multiple linear regression. 6. Write short note on bidirectional elimination method for multiple linear regression. 7. In Y=bo + b1x, what is the importance of bo and b1? What do you mean by best fitting line? Discuss using graph. 8. Explain ordinary least squeare. Feature scalling is needed in simple linear regression or not? Why? 9. Write down the sample equestion and assumptions for multiple linear regression. Give two examples where one can use multiple linear regression. Write python snippet code for visualise the regression model. 10. Below are the 10 records for coloum state, how many dummy variables one need to add, in multiple linear regression equations and why? What is dummy variable trap and multiple leaniarity? 11. (State: Gujarat, Maharastra, Bihar, Goa, Gujarat, Goa, Maharastra, Gujarat, Goa, Bihar) 12. Can we add all dummy variables in multiple linear regression equations? Why? What do you mean by stepwise regression. 13. Why one need to remove some of the independent variables while creating model? Write down 5 methods for building a model in multiple linear regression. When one need to use "All In" method for building model all multiple linear regression? 14. List out steps for backward elimination and forward selection method for building a multiple linear regression model. 15. List out steps for bidirectional elimination method for building a multiple linear regression model. If we will consider all possible multiple linear regression models, how many models are possible for dataset having 10 coloums? 16. Which one is the fastest method to build multiple linear regression model? Why you need backward elimination? What is the importance of below lines of python script in backward elimination? 17. If p value of X variable is 0.06 and your significance level is 5%, will you keep X variable in model? Why? When one need to use polynomial regression? Why it is called linear? 18. What is CART? What is information entropy? In which situations, decesion tree regression model is best suited and why? 19. Using chart, explain the creation of decesion tree regression model. 20. What do you mean by ensemble learning? Is it stable? Why? Random forest is continues or non continues regression model? 21. Discuss random forest regression model with example. 22. Prepare a regression template using python script which can be used for Decesion Tree and Random Forest regression model.

Babu Madhav Institute of Information Technology, UTU 060010907 : Machine Learning 2017 Unit 3. Classification 1. Prepare a classification template using python script which can be used for K-NN and logistic regression. 2. Write snippet code for K-NN and logistic regression. Also write python script for creating confusion metrix. 3. What is euclidian distance? Write down and discuss steps of KNN algorithm. 4. How support vector machine works? What is support vectors? Why SVM is different from other classifiers? 5. Write short note on K-NN classifier. 6. Write short note on SVM classifier. 7. Write short note on logistic regression classifier. 8. Write down equestions of sigmoid function and logistic regression. Also draw chart representing difference of linear regression and logistic regression. 9. Write snippet code for logistic regression classifier and visulizing training/testing results. Is logistic regression is linear classifier? Why? 10. Discuss bayes' theorem with examples. 11. Write bayes' theorem. Explain it with example. 12. Ram is having 700 Kesar and 300 Rajapuri Mango. Out of all, 600 ripe and 400 unripe. Out of all unripe mango, Kesar mango are 20%. 13. Find the probability that the selected Kesar mango is Ripe using Bayes' theorem. 14. In Eru village, 40 Neem trees and 60 Peepal trees are available. Out of 100 trees, 20% leaves are green and other are brown. Out of all brown leaves, 30% are from Peepal trees. Calculate the probability that the selected leaf is green and of Neem tree. 15. Write short note on naive bayes classification method. 16. What do you mean by prior probability, marginal likelihood, likelihood and posterior probability. Show the calculation for all of them using one example. 17. Why the term naive is used in bayes classification method? What is P(X) and what if more than two features are available in naive bayes classification method? 18. Write snippet code for naive bayes classifier. SAPAN NAIK 3

Unit 4. Clustering 1. What is the usage of K-mean clustering? 2. List out steps performed during k-mean clustering. 3. Selection of centroid in k-mean clustering is from given points. True/false? Why? 4. In below figure, points on green line are nearer to blue point or red point? 5. What is 'K' in k-mean clustering? 6. Apply K-mean clustering on above figure. Assume data and take approximation whenever needed. 7. What would happen if we had bad random initialisation in k-mean clustering? 8. What is random initialisation trap? 9. Can random initialisation affect your clustering? How? 10. What is the solution of random initialisation trap? 11. How to decide number of cluster for k-mean algorithm? 12. What is the full form of WCSS? How to find optimal value for it? 13. Define WCSS. How to calculate it? 14. What is the use of WCSS? Write down the equation of of it. 15. For same scattered plot, show two different WCSS calculations. 16. What is the use of Elbow method? 17. What HC does for user? 18. Compare k-mean and HC. 19. List out two types of HC. Give one difference of both. 20. Write down steps of agglomerative HC algorithm. 21. What do mean by closest cluster in agglomerative HC? 22. What do you mean by distance between two clusters? 23. What is dendograms? 24. What will be the value of x axis and y axis on dendogram? 25. Using one example, demonstrates dendogram construction. 26. Define : dissimilarity threshold. 27. What is the usage of dendograms? Give example of the same. 28. How one can decide optimal number of clusters using Dendograms? 29. Find out number of clusters from below Dendograms. 30. Give an example where Apriory algorithm can be used. 31. "people who bought also bought". Which algorithm can be used which supports given statement? 32. What do you mean by support in the context of Apriory? 33. What do you mean by confidence in the context of Apriory? 34. What do you mean by lift in the context of Apriory? 35. Using one example, describe support, confidence and lift in the context of Apriory algorithm. 36. List out steps of Apriory algorithm.

Babu Madhav Institute of Information Technology, UTU 060010907 : Machine Learning 2017 Unit 5. Reinforcement and Deep Learning 1. Why deep learning was not apriciated initially? Describe read-write speed, data retention, power usage and data density for different storage media. Discuss processing capacity in the context of time line. 2. Who is Geoffrey Hinton? Discuss the popularity reasons of deep learning now-a-days. 3. What is neurons? Discuss in detail. Differentiate standardization and normalization. 4. Draw the structure of single neuron. WHat is weight in the context of neural network? 5. What is activation function? List out any four of them and explain any two in detail. 6. If dependant variables having value 0 or 1, which activation function is more suitable? Why? 7. Which activation functions are commanly applied in hidden layer and output layer? Discuss them in detail. 8. How do the nueral networks work? Discuss with example. 9. What is perceptron? What is cost function? Give one example of it. What is one epoc? 10. How do the nueral network learn? Discuss with example. 11. Write short note on gradient descent. 12. WHat do you mean by curse of dimentionality? 13. In which situation stochastic gradient descent needed? Write two basic differences between normal gradient descent and stochastic gradient descent. How mini batch gradient descent menthod works? 14. What do you mean by backpropogation? What are the advantages of it? List out steps for taining ANN. 15. Discuss Tensorflow, Theano and Keras libraries of python in detail. 16. Discuss data prepocessing for ANN and it's importance. 17. Write python snippet code for improting Keras libraries and packages. Also discuss classifier.add() method with all its arguments for ANN as classifier. 18. One need to classify output in more then 2 categories. What are the changes one need to do in python script of ANN's output layer generation? Discuss all parameters in detail. 19. Write python script for making ANN. 20. What do you mean by compiling an ANN? Write and discuss python script for the same. 21. Write and discuss python script for predicting test results and making the confusion matrix in context of ANN. 22. Differentiate gyar/black and white images with color images. List out steps of convolutional NN. 23. Explain convolution operation in detail with one 7x7 binary image and 3x3 kernal/feature detector. 24. How feature map is useful for image understanding? Also discuss any four filters/kernals/feature detectors. 25. Discuss ReLU layer of CNN in detail. 26. What is pooling and why one need it? Discuss in detail. 27. Discuss pooling with the help of example feature map. 28. Discuss Flattening and Full connection steps of CNN. How hidden layer of CNN is different than of ANN? 29. Draw the basic architecture of CNN and explain it in detail. 30. What do you mean by softmax and cross-entropy? How it is useful in CNN? 31. List out Keras libraries and packages needed for CNN. Write function to add convolution layer with all arguments and explain it. 32. Why we need flattening? Why can't we apply flattening directly on input image? Write and discuss python script for adding pooling and flattening layers. 33. What do you mean by compiling an CNN? Write and discuss python script for the same. 34. What is image augmentation? Why one need to use it? 35. Write and discuss python script for fitting the CNN to the images. 36. How one can improve test result accuracy in CNN? Write down python script for the same and discuss. SAPAN NAIK 5

Unit 6. Dimensionality Reduction 1. List out any two dimentionality reduction technique and explain any one in detail. 2. What makes PCA, an unsupervised model? What are the advantages of PCA? At which position one need to apply PCA for classification problem? 3. Write python script for implementing PCA and discuss. 4. what makes LDA, a supervised model? Differentiate PCA and LDA. 5. Write python script for implementing Linear Discriminant Analysis and discuss it. 6. Which one is better dimetionality reduction technique and why? Write python script of it. 7. Which feature extracion technique for dimentionality reduction, works on non linear data? Explain it in detail. 8. Differentiate PCA and Kernel PCA. 9. Write python script for implementing Kernel PCA and discuss it. 10. Compare PCA, LDA and Kernel PCA. 11. Differentiate LDA and Kernel PCA. 12. How Kernel PCA works? Why can't we use simple PCA in place of Kernel PCA? 13. Compare and explain python script for PCA and LDA. 14. Compare and explain python script for PCA and Kernel PCA. 15. Why one need dimentionality reduction? List out linear and non linear methods. Which one is better in which situation and why?