# Lecture 1: Machine Learning Basics

Size: px
Start display at page:

Transcription

1 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab May 1, 2017

2 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3 Hyperparameters and Validation Sets 4 Estimators, Bias and Variance 5 ML and MAP Estimators 6 Gradient Based Optimization 7 Challenges That Motivate Deep Learning

3 Learning Algorithms 3/69 Section 1 Learning Algorithms

4 Learning Algorithms 4/69 A machine learning algorithm is an algorithm that is able to learn from data. A machine is said to have learned from Experience E with respect to some Task T, as measured by a Performance Measure P, if its performance on T as measured by P, improves with E.

5 Learning Algorithms 5/69 The Task T Example T : Vehicle Detection In Lidar Data. Approach 1: Hard code what a vehicle is in Lidar data based on Human experience. Approach 2: Learn what a vehicle is in Lidar data. Machine learning allows us to tackle tasks that are too difficult to be hard coded by humans.

6 Learning Algorithms 6/69 The Task T Machine learning algorithms are usually described in terms of how the algorithm should process an example x R n. Each entry x j of x is called a feature. Example : Features in an image can be its pixel values.

7 Learning Algorithms 7/69 Common Machine Learning Tasks Classification: Find f (x) : R n {1,..., k} that maps examples x to one of k classes. Regression: Find f (x) : R n R that maps examples to the real line.

8 Learning Algorithms 8/69 The Performance Measure P A quantitative measure of performance is required in order to evaluate a machine s ability to learn. P depends on task T. Classification: P is usually the accuracy of the model. Another equivalent measure is the error rate (also called the expected 0-1 loss).

9 Learning Algorithms 9/69 The Experience E Machine learning algorithms can be classified into two classes: supervised and unsupervised based on what kind of experience they are allowed to have during the learning process. Machine learning algorithms are usually allowed to experience an entire dataset.

10 Learning Algorithms 10/69 Categorizing Algorithms Based On E Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset. Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.

11 Learning Algorithms 11/69 Dataset Splits We usually split our dataset to three subsets: train, val, test. E is usually experiencing train and val sets. P is usually evaluated on test set.

12 Capacity, Overfitting, and Underfitting 12/69 Section 2 Capacity, Overfitting, and Underfitting

13 Capacity, Overfitting, and Underfitting 13/69 The main challenge in machine learning is that the algorithm must perform well on new, unseen input data. This ability is called generalization. We usually have access to the training set, and we try to minimize some error measure called the training error. This is standard optimization. What differentiates machine learning from standard optimization is that we care to minimize the generalization error, the error evaluated on the test set.

14 Capacity, Overfitting, and Underfitting 14/69 The Data Generating Distribution p data Is minimizing over training set error guaranteed to provide parameters that minimize the test set error? Under the i.i.d assumption on train and test examples, the answer is Yes.

15 Capacity, Overfitting, and Underfitting 15/69 The factors that determine how well a machine learning algorithm performs is its ability to: Make the training error small. Make the gap between training and test error small.

16 Capacity, Overfitting, and Underfitting 16/69 Overfitting, Underfitting, and Capacity Underfitting occurs when the model is not able to obtain a sufficiently low error value on the training set. Overfitting occurs when the gap between the training error and test error is too large. Capacity is a model s ability to fit a wide variety of functions.

17 Capacity, Overfitting, and Underfitting 17/69 Overfitting, Underfitting, and Capacity There is a direct relation between the model s capacity and whether it will overfit or underfit. Models with low capacity may struggle to fit the training set. Models with high capacity can overfit by memorizing properties of the training set that do not serve them well on the test set.

18 Capacity, Overfitting, and Underfitting 18/69 Controlling Capacity: The Hypothesis Space Hypothesis Space : the set of functions that the learning algorithm is allowed to select as being the solution. Increase the model s capacity by expanding the hypothesis space.

19 Capacity, Overfitting, and Underfitting 19/69 Controlling Capacity: The Hypothesis Space

20 Capacity, Overfitting, and Underfitting 20/69 Controlling Capacity: The Hypothesis Space From statistical learning theory: The discrepancy between training error and generalization error is bounded from above by a quantity that grows as the model capacity grows but shrinks as the number of training examples increases (Vapnik and Chervonenkis, 1971). Intellectual justification that machine learning algorithms can work! Note: We must remember that while simpler functions are more likely to generalize (to have a small gap between training and test error) we must still choose a sufficiently complex hypothesis to achieve low training error.

21 Capacity, Overfitting, and Underfitting 21/69 Controlling Capacity: The Hypothesis Space

22 Capacity, Overfitting, and Underfitting 22/69 Bayes Error The ideal model is an oracle that simply knows the true probability distribution that generates the data. The error incurred by an oracle making predictions from the true distribution p(x, y) is called the Bayes error. Example: In the case of supervised learning, the mapping from x to y may be inherently stochastic, or y may be a deterministic function that involves other variables besides those included in x.

23 Capacity, Overfitting, and Underfitting 23/69 The No Free Lunch Theorem Averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. What are the consequences of this theorem?

24 Capacity, Overfitting, and Underfitting 24/69 Controlling The Capacity: Regularization The behavior of our algorithm is strongly affected not just by how large we make the set of functions allowed in its hypothesis space, but by the specific identity of those functions. Regularization can be used as a way to give preference to one solution in our hypothesis space (more general than restricting the space itself). Weight Decay: λw T w

25 Capacity, Overfitting, and Underfitting 25/69 Controlling The Capacity: Regularization More formally, Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.

26 Hyperparameters and Validation Sets 26/69 Section 3 Hyperparameters and Validation Sets

27 Hyperparameters and Validation Sets 27/69 Hyperparameters Hyperparameters are any variables that affect the behavior of the learning algorithm, but are not adapted by the algorithm itself.

28 Hyperparameters and Validation Sets 28/69 Importance of the Validation Set In a test-train-val split, learning is performed on the train set. The choice of hyperparameters is done by evaluation on the val set. Construction of a train-val-test split: Split the data set to train-test at a 1 : 1 ratio. Then, split the train set to train-val at a 4 : 1 ratio.

29 Hyperparameters and Validation Sets 29/69 What happens when the same test set has been used repeatedly to evaluate performance of different algorithms over many years?

30 Estimators, Bias and Variance 30/69 Section 4 Estimators, Bias and Variance

31 Estimators, Bias and Variance 31/69 Point Estimation Point estimation is an attempt to provide the single best prediction ˆθ of some quantity of interest θ. This quantity might be a scalar, vector, matrix, or even a function. Usually, point estimation is done using a set of data points: ˆθ = g(x (1),..., x (m) ) Note that g does not need to return a value close to θ, it even might not have the same set of allowable values.

32 Estimators, Bias and Variance 32/69 Bias The bias of an estimator is: bias(ˆθ) = E(ˆθ) θ Bias measures the expected deviation of the estimate from the true value of the function or parameter. We say an estimator is unbiased if its bias is 0. We say an estimator is asymptotically unbiased if lim m bias(ˆθ) = 0.

33 Estimators, Bias and Variance 33/69 Variance The variance Var(ˆθ) of an estimator provides a measure of how we would expect the estimate we compute from data to vary as we independently resample the dataset from the underlying data generating process.

34 Estimators, Bias and Variance 34/69 The Bias-Variance Trade Off How to choose between two estimators, one with large bias and the other with large variance? Mean-Square Error of the estimates: MSE = E[(ˆθ θ) 2 ] = Bias(ˆθ) 2 + Var(ˆθ) MSE incorporates both bias and variance components.

35 Estimators, Bias and Variance 35/69 Relation To Machine Learning The relationship between bias and variance is tightly linked to the machine learning concepts of capacity, underfitting and overfitting. How?

36 Estimators, Bias and Variance 36/69 Consistency Consistency is a desirable property of estimators. It insures that as the number of data points in our data set increase, our point estimate converges to the true value of θ. More formally, consistency states that: lim ˆθ p θ m The convergence here is in probability. Consistency of an estimator ensures that the bias will diminish as our training data set grows. It is better to choose consistent estimators with large bias over estimators with small bias and large variance. Why?

37 ML and MAP Estimators 37/69 Section 5 ML and MAP Estimators

38 ML and MAP Estimators 38/69 Maximum Likelihood Estimation Maximum likelihood (ML) is a principle used to derive estimators. Given m examples X = x (1),..., x (m) drawn independently form data generating distribution p data : θ ML = argmax p model (X; θ) θ p model (x; θ) maps any configuration x to a real number, hence tries to estimate the true data distribution p data.

39 ML and MAP Estimators 39/69 Maximum Likelihood Estimation After some mathematical manipulation: θ ML = argmax E x ˆpdata log p model (x, θ) θ Ideally, we would like to have this expectation over p data. Unfortunately, we only have access to the empirical distribution ˆp data from training data. Maximum likelihood can be viewed as a minimization of the dissimilarity between ˆp data and p model. How?

40 ML and MAP Estimators 40/69 Maximum Likelihood Estimation Maximum likelihood can be shown to be the best estimator, asymptotically in terms of its rate of convergence as m. The estimator derived by ML is consistent. However, certain conditions are required for consistency to hold: The true distribution p data must lie within the model family p model (.; θ). Otherwise, no estimator can recover p data even with infinite training examples. There needs to exist a unique θ. Otherwise, ML will recover p data but will not be able to determine the true value of θ used in the data generation process. Under these conditions, you are guaranteed to improve the performance of your estimator with more training data.

41 ML and MAP Estimators 41/69 Maximum A Posteriori Estimation

42 ML and MAP Estimators 42/69 Maximum A Posteriori Estimation Bayesian Statistics: The dataset is directly observed and so is not random. On the other hand, the true parameter θ is unknown or uncertain and thus is represented as a random variable. Before observing data, we represent our knowledge of θ using the prior probability distribution p(θ). After observing data, we use bayes rule to compute the posterior distribution p(θ x (1)...x (m) ).

43 ML and MAP Estimators 43/69 Maximum A Posteriori Estimation Usually, priors are chosen to be high entropy distributions such as uniform or Gaussian distributions. These distributions are described as broad. From Bayes rule we have: p(θ x (1)...x (m) ) = p(x (1)...x (m) θ)p(θ) p(x (1)...x (m) )

44 ML and MAP Estimators 44/69 Maximum A Posteriori Estimation To predict the distribution over new input data, marginalize over θ: p(x new x (1)...x (m) ) = p(x new θ)p(θ x (1)...x (m) )dθ Example: Bayesian Linear Regression.

45 ML and MAP Estimators 45/69 Maximum A Posteriori Estimation Maximum a posteriori estimation (MAP) tries to overcome the intractability of the full Bayesian treatment, by providing point estimates using the posterior probability: θ MAP = argmax p(θ x) θ = argmax log p(x θ) + log p(θ) θ MAP Bayesian inference has the advantage of leveraging information that is brought by the prior and cannot be found in the training data.

47 Gradient Based Optimization 47/69 Optimization Optimization refers to the task of either minimizing or maximizing some function f (x) by altering the value of x. f (x) is called an objective function. In context of machine learning, it is also called the loss, cost, or error function. Notation: x = argmin f (x) is the value of x that minimizes f (x). x

48 Gradient Based Optimization 48/69 Using The Derivative For Optimization The derivative of a function specifies how to scale a small change in input in order to obtain the corresponding change in output. f (x + ɛ) f (x) + ɛ x f (x) The derivative is useful for optimization because it allows knowledge of how to change x to improve f (x). Example: f (x ɛ sign( x f (x))) f (x) for small enough ɛ.

49 Gradient Based Optimization 49/69 Critical Points A critical point or stationary point is a point x with x f (x) = 0.

50 Gradient Based Optimization 50/69 Global vs Local Optimal Points

51 Gradient Based Optimization 51/69 Gradient Descent Gradient descent proposes to update the parameter according to: x x ɛ x f (x) ɛ is referred to as the learning rate. Gradient descent converges when all the elements in the gradient are almost equal to zero.

53 Gradient Based Optimization 53/69 Stochastic Gradient Descent Nearly all of deep learning is powered by one optimization algorithm: SGD. Motivation behind SGD: The cost function used by a machine learning algorithm often decomposes as a sum over training examples of some per-example loss function: J(θ) = E x,y ˆpdata L(x, y, θ) = 1 m L(x (i), y (i), θ) m i=1

54 Gradient Based Optimization 54/69 Stochastic Gradient Descent To minimize the loss over θ, the gradient needs to be computed. θ J(θ) = 1 m θ L(x (i), y (i), θ) m i=1 What is the computational cost for computing the gradient above?

55 Gradient Based Optimization 55/69 Stochastic Gradient Descent SGD relies on the fact that the gradient is an expectation, hence can be approximated with a small set of samples. let m be a minibatch uniformly drawn from our training data. θ J(θ) = 1 m m θ L(x (i), y (i), θ) i=1 The SGD update rule becomes : θ θ + ɛ θ J(θ)

56 Challenges That Motivate Deep Learning 56/69 Section 7 Challenges That Motivate Deep Learning

57 Challenges That Motivate Deep Learning 57/69 Major Obstacles For Traditional Machine Learning The development of deep learning was motivated by the failure of traditional ML algorithms when applied to central problems in AI due to: The mechanisms used to achieve generalization in traditional machine learning are insufficient to learn complicated functions in high-dimensional spaces. The challenge of generalizing to new examples becomes exponentially more difficult when working with high-dimensional data.

58 Challenges That Motivate Deep Learning 58/69 The Curse Of Dimensionality Many machine learning problems become exceedingly difficult when the number of dimensions in the data is high. This is because the number of distinct configurations of a set of variables increase exponentially as the number of variables increase. How does that affect ML algorithms?

59 Challenges That Motivate Deep Learning 59/69 The Curse Of Dimensionality

60 Challenges That Motivate Deep Learning 60/69 Local Constancy And Smoothness Regularization In order to generalize well, machine learning algorithms need to be guided by prior beliefs about what kind of function they should learn. Among the most widely used priors is the smoothness or local constancy prior. A function is said to have local constancy if it does not change much within a small region of space. As the machine learning algorithm becomes simpler, it tends to rely extensively on this prior. Example: K nearest neighbors.

61 Challenges That Motivate Deep Learning 61/69 Local Constancy And Smoothness Regularization In general, traditional learning algorithms require O(k) examples to distinguish O(k) regions in space. Is there a way to represent a complex function that has many more regions to be distinguished than the number of training examples?

62 Challenges That Motivate Deep Learning 62/69 Local Constancy And Smoothness Regularization Key insight: Even though the number of regions of a function can be very large, say O(2 k ), the function can be defined with O(k) examples as long as we introduce additional dependencies between regions via generic assumptions. Result: Non local generalization is actually possible.

63 Challenges That Motivate Deep Learning 63/69 Local Constancy And Smoothness Regularization Example assumption: The data was generated by the composition of factors or features, potentially at multiple levels in a hierarchy. (core idea in deep learning) To a certain point, the exponential advantages conferred by the use of deep, distributed representations counter the exponential challenges posed by the curse of dimensionality. Many other generic mild assumptions allow an exponential gain in the relationship between the number of examples and the number of regions that can be distinguished.

64 Challenges That Motivate Deep Learning 64/69 Manifold Learning A manifold is a connected region in space. Mathematically, it is a set of points, associated with a neighborhood around each points. From any point, the surface of the manifold appears as a euclidean space. Example: We observe the world as a 2-D plane, whereas in fact it is a spherical manifold in 3-D space.

65 Challenges That Motivate Deep Learning 65/69 Manifold Learning

66 Challenges That Motivate Deep Learning 66/69 Manifold Learning Most AI problems seem hopeless if we expect algorithms to learn interesting variations over all of R n. Manifold Learning: Most of R n consists of invalid input. Interesting input occurs only along a collection of manifolds embedded in R n. Conclusion: probability mass is highly concentrated.

67 Challenges That Motivate Deep Learning 67/69 Manifold Learning Fortunately, there is evidence to support the above assumptions. Observation 1: Probability distributions in natural data (images, text strings, and sound) is highly concentrated. Observation 2: Examples encountered in natural data are connected to each other by other examples, with each example being surrounded by similar data.

68 Challenges That Motivate Deep Learning 68/69 Manifold Learning Training examples from the QMULMultiview Face Dataset.

69 Challenges That Motivate Deep Learning 69/69 Conclusion Deep learning present a framework to solve tasks that cannot be solved by traditional ML algorithms. Next lecture: Feed Forward Neural Networks.

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

### Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

### Generative models and adversarial training

Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

### CSL465/603 - Machine Learning

CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

### Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

### Semi-Supervised Face Detection

Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

### CS Machine Learning

CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

### Artificial Neural Networks written examination

1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

### Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

### Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

### Probability and Statistics Curriculum Pacing Guide

Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

### Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

### The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

### Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

### Statewide Framework Document for:

Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

### Lecture 10: Reinforcement Learning

Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

### System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

### Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

### Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

### Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

### Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

### A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

### Learning to Rank with Selection Bias in Personal Search

Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT

### Switchboard Language Model Improvement with Conversational Data from Gigaword

Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

### Calibration of Confidence Measures in Speech Recognition

Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

### Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Instructor Syed Zahid Ali Room No. 247 Economics Wing First Floor Office Hours Email szahid@lums.edu.pk Telephone Ext. 8074 Secretary/TA TA Office Hours Course URL (if any) Suraj.lums.edu.pk FINN 321 Econometrics

### Software Maintenance

1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

### Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

### School Size and the Quality of Teaching and Learning

School Size and the Quality of Teaching and Learning An Analysis of Relationships between School Size and Assessments of Factors Related to the Quality of Teaching and Learning in Primary Schools Undertaken

### The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

### Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

### Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling

### STA 225: Introductory Statistics (CT)

Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic

### arxiv: v1 [cs.lg] 15 Jun 2015

Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

### OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

### Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

### Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

### OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

### Machine Learning and Development Policy

Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes

### Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract

### Mathematics subject curriculum

Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June

### Detailed course syllabus

Detailed course syllabus 1. Linear regression model. Ordinary least squares method. This introductory class covers basic definitions of econometrics, econometric model, and economic data. Classification

### AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

### Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

### Why Did My Detector Do That?!

Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,

### WHEN THERE IS A mismatch between the acoustic

808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

### Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

### Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

### Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

### arxiv: v2 [cs.cv] 30 Mar 2017

Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

### Universityy. The content of

WORKING PAPER #31 An Evaluation of Empirical Bayes Estimation of Value Added Teacher Performance Measuress Cassandra M. Guarino, Indianaa Universityy Michelle Maxfield, Michigan State Universityy Mark

### A survey of multi-view machine learning

Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported

### Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

### Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

### Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

### AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Calculus AB Priority Keys Aligned with Nevada Standards MA I MI L S MA represents a Major content area. Any concept labeled MA is something of central importance to the entire class/curriculum; it is a

### Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison

### Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See

### Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

### Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling. Bengt Muthén & Tihomir Asparouhov In van der Linden, W. J., Handbook of Item Response Theory. Volume One. Models, pp. 527-539.

### Toward Probabilistic Natural Logic for Syllogistic Reasoning

Toward Probabilistic Natural Logic for Syllogistic Reasoning Fangzhou Zhai, Jakub Szymanik and Ivan Titov Institute for Logic, Language and Computation, University of Amsterdam Abstract Natural language

### Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

### Attributed Social Network Embedding

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

### AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

### Twitter Sentiment Classification on Sanders Data using Hybrid Approach

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

### A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2006 A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements Donna S. Kroos Virginia

### Introduction to Simulation

Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

### Time series prediction

Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing

### SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

### Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

### Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

### Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

### Human Emotion Recognition From Speech

RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

### QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

### On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

### Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

### Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

### Word learning as Bayesian inference

Word learning as Bayesian inference Joshua B. Tenenbaum Department of Psychology Stanford University jbt@psych.stanford.edu Fei Xu Department of Psychology Northeastern University fxu@neu.edu Abstract

### Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

### arxiv: v2 [cs.ir] 22 Aug 2016

Exploring Deep Space: Learning Personalized Ranking in a Semantic Space arxiv:1608.00276v2 [cs.ir] 22 Aug 2016 ABSTRACT Jeroen B. P. Vuurens The Hague University of Applied Science Delft University of

### Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

### Data Fusion Through Statistical Matching

A research and education initiative at the MIT Sloan School of Management Data Fusion Through Statistical Matching Paper 185 Peter Van Der Puttan Joost N. Kok Amar Gupta January 2002 For more information,

### Extending Place Value with Whole Numbers to 1,000,000

Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit

### Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Adam Abdulhamid Stanford University 450 Serra Mall, Stanford, CA 94305 adama94@cs.stanford.edu Abstract With the introduction

### Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

### Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

### Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

A Correlation of to the Grades K - 6 G/M-223 Introduction This document demonstrates the high degree of success students will achieve when using Scott Foresman Addison Wesley Mathematics in meeting the

### Math Pathways Task Force Recommendations February Background

Math Pathways Task Force Recommendations February 2017 Background In October 2011, Oklahoma joined Complete College America (CCA) to increase the number of degrees and certificates earned in Oklahoma.

### FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

### arxiv:cmp-lg/ v1 22 Aug 1994

arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and

### Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

### Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

### Australian Journal of Basic and Applied Sciences

AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

### Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and