When Dictionary Learning Meets Classification

Size: px
Start display at page:

Download "When Dictionary Learning Meets Classification"

Transcription

1 When Dictionary Learning Meets Classification Bufford, Teresa Chen, Yuxin Horning, Mitchell Shee, Liberty Supervised by: Prof. Yohann Tero August 9, 213 Abstract This report details and exts the implementation of the methods proposed by Sprechmann et al. [16], which aims at clustering signals. Two set ups are considered: supervised and unsupervised clustering. For unsupervised clustering, the algorithms proposed in [16] combine spectral clustering and dictionary learning. Two unsupervised algorithms are proposed, for each a variant is proposed. Thus, we have implemented the five algorithms of [16], as well as a k-means variant. Five of them are described thoroughly in this report, and a complete Matlab code is available at ~tero/code_archive_[8.9.13].zip, which should allow one to easily reproduce all our results. Our experiments agree with [16] for the supervised clustering case. Despite our efforts, we were unable to reproduce the unsupervised clustering results on MNIST, using the similarity measure S 1. This fact is discussed in section 7. Overall, our experiments agree with [16]. The unsupervised dictionary learning with kmeans algorithm gave slightly better results than [16] for the unsupervised MNIST experiment ({,..., 4} digits). We have also shown, through experimentation, that the supervised clustering is robust with respect to Gaussian noise. UCLA. tdbufford@ucla.edu Dalhousie University. yuxinchen612@gmail.com Harvey Mudd College. mhorning@hmc.edu UCLA. lshee@g.ucla.edu UCLA. tero@math.ucla.edu 1

2 CONTENTS CONTENTS Contents 1 Introduction 3 2 Dictionary Learning 3 3 Related Work 4 4 Algorithms Supervised Clustering Semisupervised Clustering Unsupervised Clustering (by Signals) Unsupervised Clustering (by Atoms) Unsupervised Clustering (by Atoms/Split Initialization) Unsupervised Clustering (by Signals), kmeans Experiments and Results Sprechmann Results Supervised Non-centered and Non-normalized Centered and Semisupervised Unsupervised - Signals Non-centered and Non-normalized Centered and Changes over Refinements Unsupervised - Atoms Non-centered and Non-normalized Centered and Changes over Refinements Unsupervised - Atoms, split in 2 each iteration Non-centered and Non-normalized Centered and Unsupervised (kmeans) - Signals Non-centered and Non-normalized Centered and Gaussian Noise Experiments Classifying Pure Gaussian Noise Adding Noise to Test Images Adding Noise to Training and Test Images Code and Toolboxes 39 7 Discussion 39 8 Conclusion 42 2

3 2 DICTIONARY LEARNING 1 Introduction Image recognition and classification is a common problem studied in computer vision research. The disparity between a human and a computer s ability to recognize and classify images is the basis behind CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), which determine whether or not a user is human based on the user s ability to identify objects in an image. The Asirra (Animal Species Image Recognition for Restricting Access) challenge, which was proposed at ACM CCS 27, is a CAPTCHA that specifically uses cat and dog image to determine whether the user is human, by testing their ability to distinguish between the two classes of animals. According to [4], Asirra can be solved by humans 99.6% of the time in under 3 seconds. The challenge of teaching a computer to classify cats and dogs is a difficult one, due to both large innerclass variation (e.g. the physical differences between breeds of cats and dogs; the flexibility of the animals, which allows even one animal to appear in all types of shapes and sizes) and small inter-class variations (e.g. the similarity of cats and dogs in general shape and size). In [6], the authors describe a classifier that combines support-vector machines (SVMs) to obtain an 82.7% rate of accuracy in distinguishing and classifying the images of cats and dogs used in Asirra. Our research has been centered around using dictionary learning methods to classify cat and dog image signals. In dictionary learning methods, a dictionary is constructed with training signals and used to classify test signals. To measure the viability of various methods, we first concentrated on classifying a simpler data set: the MNIST dataset of handwritten digits. This report details the experiments that have been reimplemented so far, following the algorithms published in [16] for supervised dictionary learning (training signals labeled) and unsupervised dictionary learning (training signals unlabeled). 2 Dictionary Learning The cornerstone in dictionary learning is to find a sparse signal representation α R K of a signal x R n for a dictionary D R n K such that the reconstructed image Dα is as close as possible to x. In the following, this is done by solving the problem arg min α R K x Dα λ α 1 (1) where x R n is the signal being classified, D R n K is the dictionary whose columns are called atoms (or features) R n, and α R K is the coefficient vector of the signal. The existence of this min will be shortly justified. The parameter λ balances the trade-off between the reconstructed error x Dα 2 2 and the sparsity of the decomposition measured by α 1. Given a collection of dictionaries D 1,..., D N R n K, classifying a signal x R n consists of: 1. Using sparse coding to compute α 1,..., α N RK, the representation of x in each dictionary D i ; 2. Comparing the cost of the representations α i in each dictionary and assigning x to the least costly dictionary D i in the sense of (1). In other words, in order to classify a signal x as one of N possible classes using a trained collection of dictionaries {D 1,..., D N }, we calculate for i = 1,..., N. Assuming that class i. E i (x) = min α R K x D i α λ α 1. (2) arg min E i (x) is unique, we define i = arg min E i (x) and assign x to i {1,...,N} i {1,...,N} Now to show that the energy defined in (2) E(x) = min α R k ( x Dα λ α 1 ) : R n [, ), where λ, is a well-defined function. 3

4 3 RELATED WORK Proof. Since x Dα 2 and α 1, α R k and x R n, then E(R n ) [, ). To show that G(α) = x Dα λ α 1 : R k [, ), where λ, has a global minimum. Because any locally optimal point of a convex optimizing problme is globally optimal, we want to show that G(α) is convex. First we note that the domain of G(α) is R k which is closed and convex. Let x R n, α 1, α 2 R k, t [, 1], by definition, we need to show that G(tα 1 + (1 t)α 2 ) tg(α 1 ) + (1 t)g(α 2 ). G(tα 1 + (1 t)α 2 ) = x tdα 1 (1 t)dα λ tα 1 + (1 t)α 2 1 = tx tdα 1 + x(1 t) (1 t)dα λ tα 1 + (1 t)α 2 1 t x Dα (1 t) x Dα λt α (1 t) α 2 1 (By positive scalability and triangular inequality of norms, and t [, 1], λ ) =t ( x Dα λ α 1 1 ) + (1 t) ( x Dα λ α 2 1 ) =tg(α 1 ) + (1 t)g(α 2 ) Because we are minimizing a convex function over a convex and closed set R k, so E(x) = min α R k ( x Dα λ α 1 ) exists and is a well-defined function. 3 Related Work Several works towards the accurate classification of cats and dogs have been published in the last half of a decade. Here we detail some of the more interesting approaches that we have encountered. [9] introduces a new approach by localizing the features used for classification at object parts and applying appearancebased sliding window detectors and probabilistic consensus of geometric models to detect fine-grained images where instances from different classes share common parts but have various shape and appearance. [14] is responsible for the creation of the Oxford-IIIT-Pet dataset that we have been using, and concerns itself with the problem of fine-grained object categorization. They create a model that combines shape and appearance/texture features for the discrimination. [17] explains the process of object detection with simple rectangular Haar-like features, rather than with pixels, to provide rapid and high detection. The reason that they use rectangular features over pixels are that information is lost when only using pixels and the system operates faster using features rather than pixels. In [5], a discriminative approach to object detection is introduced. This approach induces classifiers directly from training data without a data model. In general, the system learns a pose-specific binary classifier and applies it many times to different sub-windows in image, checking if the target is in the sub-window. [1] introduces Nearest-Neighbor image classification which is faster than other image classification methods. The authors propose a Nearest-Neighbor-based classifier classifier called Naive-Bayes Nearest-Neighbor, which uses image-to-class distances without descriptor quantization. We are also interested in previous research in dictionary learning. [13] exts the generalization of discriminative image understanding tasks, such as texture segmentation and feature selection, by proposing a multi-scale method to minimize least-squares reconstruction errors and discriminative cost functions under l or l 1 regularization constraints. [13] learns multiple dictionaries which are simultaneously re-constructive and discriminative and uses the reconstruction errors of these dictionaries on image patches to deprive a pixelwise classification. The method proposes the following novelty: First, redundant non-parametric dictionaries are learned. Second, the sparse local representations are learned with an explicit discriminative goal. [8] presents a label consistent K-SVD algorithm to learn a discriminative dictionary for sparse coding. K-SVD is a supervised algorithm to incorporate a discriminative sparse coding error criterion and an optimal classification performance criterion into the objective function and optimize it. Because the learned dictionary provides discriminative sparse representations of signals, so good accuracy on object classification even with a simple multi-class linear classifier is achieved. [19] exts the K-SVD algorithm to learn an overcomplete dictionary from a set of labeled training face images. The authors also propose a corresponding classification algorithm based on the learned dictionary by incorporating the classification stage directly into the dictionary-learning procedure. [1] provides formulations for dictionary learning algorithms that are adapted to performing tasks other than data reconstruction. Specifically, they provide a dictionary 4

5 4 ALGORITHMS learning algorithm for multi-class classification which provides state of the art results. Along with this, they investigate a theory for dictionary learning algorithms and show that the problem is smooth under a set of three assumptions. Thirdly, they support the use of semi-supervised dictionary learning algorithms which can make use of unlabeled data along with data that does have labels when learning the dictionaries. [7] applies modularity to classifying MNIST to 9 digits and achieve 3.6% misclassification rate. 4 Algorithms 4.1 Supervised Clustering For supervised dictionary learning, the MNIST training images were clustered according to their digit label {,..., 9}. Each cluster was used to construct its respective dictionary {D,..., D 9 }, all of which were then used to classify the test images. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1 }; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see SPAMS documentation): param.k (number of atoms in the dictionaries) param.lambda = λ 2 param.iter = 1 param.mode = 2 param.lambda2 = Set digitvector (digits to be classified); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; Step 2 Train dictionaries {D,..., D N 1 } R n K indepently; Step 3 Classify each test image y h : for h = 1 to l do Compute E i (y h ) = min y h D i α 2 α R K 2 + λ α 1 for i =,..., N 1; Take i = arg min i {,...,N 1} Step 4 Get results: Compute misclassification rate; Show error histograms; E i (y h ), y h is classified as class i ; Algorithm 1: Supervised Dictionary Learning on MNIST images 5

6 4.2 Semisupervised Clustering 4 ALGORITHMS 4.2 Semisupervised Clustering For semisupervised dictionary learning, a percentage of the MNIST training images had labels which were known to be correct, and the remaining percentage was assigned random labels. The training images were then clustered according to their digit label {,..., 9}. Each cluster was used to construct its respective dictionary {D,..., D 9 }, all of which were then used to classify the test images. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1 }; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see SPAMS documentation): param.k (number of atoms in the dictionaries) param.lambda = λ 2 param.iter = 1 param.mode = 2 param.lambda2 = Set digitvector (digits to be classified); Set perturbation percent (percentage of training images to be assigned random labels); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; Assign random labels from digitvector to perturbation percent of the MNIST training images; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; Step 2 Train dictionaries {D,..., D N 1 } R n K indepently; Step 3 Refine the initial set of dictionaries: for iteration = 1 to max iter refinement do Classify the m training images using current dictionaries by minimizing Equation (1); for j = 1 to m do Compute E i (x j ) = min x j D i α 2 α R K 2 + λ α 1 for i =,..., N 1; Take i = arg min i {,...,N 1} E i (x j ), x j is classified as class i ; For each i =,..., N 1, train D i R n k2 using training images classified to class i ; Step 4 Get results: Classify test images. Compute misclassification rate; Show error histograms; Algorithm 2: Semisupervised Dictionary Learning on MNIST images 6

7 4.3 Unsupervised Clustering (by Signals) 4 ALGORITHMS 4.3 Unsupervised Clustering (by Signals) For unsupervised dictionary learning (by image signals), spectral clustering was used to cluster the MNIST training images. In the initialization step, A = [α 1,..., α m ] R K m, where K denotes the number of atoms in D and m denotes the number of training images. If two signals belong to the same cluster, they are expected to have decompositions that use similar atoms. For unsupervised clustering by signals, the similarity matrix is defined S 1 := A T A R m m, where A denotes the element-wise absolute value of A, and A T its transpose. Each cluster was used to train its respective dictionary. These dictionaries were then refined and used to classify the test images. Since the unsupervised algorithm does not guarantee that the dictionaries are correctly ordered upon creation, we associate each dictionary to the class (label) that produces the highest number of correct labeled training image occurrences. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1}; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see spams documentation): param.k (number of atoms in initial dictionary), param.lambda = λ, param.iter = 1, param.mode, param.lambda2 = ; 2 Set max iter refinement (number of refinements to initial set of dictionaries), and dictionary sizes refinement (number of atoms in refined dictionaries, k 2); Set digitvector (digits to be classified); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; Step 2 Create an initial set of dictionaries: Train dictionary D R n K from all training images; Construct A = [α 1,..., α m], where α j is the minimum-energy sparse representation of x j; Construct similarity matrix to be used in spectral clustering S 1 := A T A ; Perform spectral clustering on G 1 := {X, S 1} to assign each signal to one of the N classes; For each i =,..., N 1, train D i R n k 2 using training images classified to class i for our initial set of dictionaries; Step 3 Refine the initial set of dictionaries: for iteration = 1 to max iter refinement do Classify the m training images using current dictionaries by minimizing Equation (1); for j = 1 to m do Compute E i(x j) = min α R K xj Diα λ α 1 for i =,..., N 1; Take i = arg min E i(x j), x j is classified as class i ; i {,...,N 1} For each i =,..., N 1, train D i R n k 2 using training images classified to class i ; Step 4 Reorder dictionaries. Classify test images. Compute misclassification rate. Produce error histograms; Algorithm 3: Unsupervised Dictionary Learning by Signal Clusterization on MNIST images 7

8 4.4 Unsupervised Clustering (by Atoms) 4 ALGORITHMS 4.4 Unsupervised Clustering (by Atoms) For unsupervised dictionary learning (by atoms, i.e., dictionary columns), spectral clustering was used to cluster the dictionary atoms. In the initialization step, A = [α 1,..., α m ] R K m, where K denotes the number of atoms in D and m denotes the number of training images. If two signals belong to the same cluster, they are expected to have decompositions that use similar atoms. For unsupervised clustering by atoms, the similarity matrix is defined S 2 := A A T R K K, where A denotes the element-wise absolute value of A, and A T its transpose. Each cluster was used to train its respective dictionary. These dictionaries were then refined and used to classify the test images. Since the unsupervised algorithm does not guarantee that the dictionaries are correctly ordered upon creation, we associate each dictionary to the class (label) that produces the highest number of correct labeled training image occurrences. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1}; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see spams documentation): param.k (number of atoms in initial dictionary), param.lambda = λ, param.iter = 1, param.mode, param.lambda2 = ; 2 Set max iter refinement (number of refinements to initial set of dictionaries); Set digitvector (digits to be classified); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; Step 2 Create an initial set of dictionaries: Train dictionary D R n K from all training images; Construct A = [α 1,..., α m], where α j is the minimum-energy sparse representation of x j; Construct similarity matrix to be used in spectral clustering S 2 := A A T R K K ; Perform spectral clustering on G 2 := {D, S 2} to extract N classes of atoms; Collect the atoms of classes i =,..., N 1 into D i R n k i to form initial set of dictionaries; Step 3 Refine the initial set of dictionaries: for iteration = 1 to max iter refinement do Classify the m training images using current dictionaries by minimizing Equation (1); for j = 1 to m do Compute E i(x j) = min α R K xj Diα λ α 1 for i =,..., N 1; Take i = arg min E i(x j), x j is classified as class i ; i {,...,N 1} For each i =,..., N 1, train D i R n k 2 using training images classified to class i ; Step 4 Reorder dictionaries. Classify test images. Compute misclassification rate. Produce error histograms; Algorithm 4: Unsupervised Dictionary Learning by Atom Clusterization on MNIST images 8

9 4.5 Unsupervised Clustering (by Atoms/Split Initialization) 4 ALGORITHMS 4.5 Unsupervised Clustering (by Atoms/Split Initialization) Here we detail an alternative initialization proposed by [16], in which we start with no partitions in our dictionary, then iteratively cluster each of our current partitions into two new ones. We choose the partition split which causes the largest decrease in energy, giving us one more partition than we had before. This process is continued until the desired number of clusters is reached. The algorithm is here detailed. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1}; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see spams documentation): param.k (number of atoms in initial dictionary), param.lambda = λ, param.iter = 1, param.mode, param.lambda2 = ; 2 Set max iter refinement (number of refinements to initial set of dictionaries); Set digitvector (digits to be classified); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; Step 2 Cluster atoms and refine dictionaries: Train dictionary D R n K from all training images; for i = 1 to the total number of digit classes - 1 do for p = 1 to i do A = [α 1,..., α k ], where α j is the minimum-energy sparse representation of x j for D p; Find similarity matrix S 2 = A A T ; Apply spectral clustering on G 2 := {D p, S 2} to extract atoms D i+1, D i+2; Compute E p (total energy for split p); Take p = arg min p {1,...,i} E j; Set D 1,..., D p 1, D p +1,..., D i, D i+1, D i+2 as D 1,..., D i+1; for iteration = 1 to max iter refinement do Classify the m training images using current dictionaries by minimizing Equation (1); for j = 1 to m do Compute E i(x j) = min α R K xj Diα λ α 1 for i =,..., N 1; Take i = arg min E i(x j), x j is classified as class i ; i {,...,N 1} For each i =,..., N 1, train D i R n k 2 using training images classified to class i ; Step 4 Reorder dictionaries. Classify test images. Compute misclassification rate. Produce error histograms; Algorithm 5: Unsupervised Dictionary Learning by Atoms/Split Initialization on MNIST images 9

10 4.6 Unsupervised Clustering (by Signals), kmeans 4 ALGORITHMS 4.6 Unsupervised Clustering (by Signals), kmeans We have found that the reported performance of the unsupervised algorithm by [16] can be improved by simply doing away with the initialization given by spectral clustering and instead performing simple K-means clustering. Input: labeled MNIST training images x j R n, j = 1,..., m; labeled MNIST test images y h R n, h = 1,..., l; N, the number of classes; Output: Dictionaries {D,..., D N 1}; Misclassification rate; Error histogram; Step 1 Set parameters and extract data: Set SPAMS parameters (see spams documentation): param.k (number of atoms in initial dictionary), param.lambda = λ, param.iter = 1, param.mode, param.lambda2 = ; 2 Set max iter refinement (number of refinements to initial set of dictionaries); Set digitvector (digits to be classified); Set decision on center data and l2 normalize data as true or false ; Load images from MNIST dataset; Use MNIST labels to extract data corresponding to indicated digitvector; Step 2 Create an initial set of dictionaries: Perform k-means clustering on the set of training images to obtain an initial N clusters of images; if center data == true then Each x j and y h is subtracted by its respective mean, so the mean of each signal is ; if l2 normalize data == true then Each x j and y h is divided by its respective l 2 norm, so x j 2 = 1 and y h 2 = 1; For each class i = 1 to N 1, train dictionary D i on the images in that class to obtain the initial dictionaries D,..., D N 1; Step 3 Refine the initial set of dictionaries: for i = 1 to max iter refinement do Classify the m training images using current dictionaries by minimizing Equation (1); for j = 1 to m do Compute E i(x j) = min α R K xj Diα λ α 1 for i =,..., N 1; Take i = arg min E i(x j), x j is classified as class i ; i {,...,N 1} For each i =,..., N 1, train D i R n k 2 using training images classified to class i ; Step 4 Reorder dictionaries. Classify test images. Compute misclassification rate. Produce error histograms; Algorithm 6: Unsupervised Dictionary Learning by Atom Clusterization on MNIST images 1

11 5 EXPERIMENTS AND RESULTS 5 Experiments and Results We performed experiments on the MNIST handwritten digits dataset using three types of algorithms: supervised dictionary learning, semisupervised dictionary learning, and unsupervised dictionary learning. Unsupervised dictionary learning was further split into two methods: spectral clustering (clustering signals, clustering atoms in one step, clustering atoms in multiple steps) and kmeans. We ran the test for the digit sets {, 1}, {2, 3}, {,..., 4}, {,..., 5}, and {,..., 9}. For each algorithm (unless otherwise specified), we repeated the experiment with: param.k = 5, non-centered and non-normalized image signals param.k = 8, non-centered and non-normalized image signals param.k = 5, centered and normalized image signals param.k = 8, centered and normalized image signals Following [16], for Algorithm 3 we set dictionary sizes refinement = 2 for the dictionaries once we are in the refinement step. The following parameters were used for all of our experiments: param.λ =.5 param.λ 2 = param.iter = 1 (unless otherwise specified) param.mode = 2 (Note: Our param.λ =.5 is equivalent to [16] λ =.1. Explanation in Section 6.) 5.1 Sprechmann Results As a point of comparison, consider the following numbers from Sprechmann [16] and Ramirez s [15] results: Table 1: Sprechmann [16] and Ramirez s [15] results from Supervised Dictionary Learning (Algorithm 1) and Unsupervised Dictionary Learning with signal clustering (Algorithm 3). We were unable to replicate their results with unsupervised dictionary learning. Cluster Type Centered & Digits K misclassification Supervised Unknown {,..., 9} % Unsupervised - Signals Unknown {,..., 4} % Unsupervised - Signals Unknown {,..., 5} 3 6.9% Explanation of Table Numbers After explaining the algorithm for supervised dictionary learning, the algorithm is applied in the following: Table 1 of [16] reports a misclassification rate of 1.26% and refers to Section 3 of the same document, which reads MNIST... the usual training/testing split. In Table 1 we present the obtained results.... used a penalty parameter λ =.1... for a dictionary with k = 8. After proposing the algorithm for unsupervised dictionary learning via signal clustering, the algorithm is applied in the following: Section 5 of [16] reads We clustered the digits [from] to 4 (K = 5) using the testing set of MNIST... used an initial dictionary of k = 5 atoms... We used G 1 for initialization... using during the iterations dictionaries of 2... had a misclassification rate of 1.44% Table 2 of [15] reports a misclassification rate of 6.9% and refers to section 4.3 of the same document, which reads We first clustered the digits [from] to 5 (K = 6) from the testing set of MNIST... The size of the initial dictionaries are... k = 3 for MNIST... The initial clustering of the data was done using spectral clustering on the graph G 1. 11

12 5.2 Supervised 5 EXPERIMENTS AND RESULTS 5.2 Supervised Non-centered and Non-normalized Table 2: Results from Supervised Dictionary Learning (Algorithm 1) with K = 5, non-centered and non-normalized image signals. 3.33% is far from Sprechmann s supervised results of 1.26% with K = 8 for digits {,..., 9} (see Table 1). The gap is slightly smaller when we run our experiments with K = 8 (see Table 3), and significantly smaller when we run the experiments with centered and normalized data (see Tables 4 and 5). It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification rate. Cluster Type Centered & Digits K misclassification Supervised False {, 1} 5.19% Supervised False {2, 3} 5.54% Supervised False {,..., 4} 5.95% Supervised False {,..., 5} % Supervised False {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 1: Misclassification rate of results from Supervised Dictionary Learning (Algorithm 1) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 2 is misclassified significantly more often than the other digits, roughly a 75% increase from the next highest. 12

13 5.2 Supervised 5 EXPERIMENTS AND RESULTS Table 3: Results from Supervised Dictionary Learning (Algorithm 1) with K = 8, non-centered and non-normalized image signals. 3.11% is far from Sprechmann s results of 1.26% with K = 8 (see Table 1). The gap is significantly smaller when we run the experiments with centered and normalized data (see Tables 4 and 5). It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification rate. Cluster Type Centered & Digits K misclassification Supervised False {, 1} 8.24% Supervised False {2, 3} 8.39% Supervised False {,..., 4} 8.78% Supervised False {,..., 5} % Supervised False {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 2: Misclassification rate of results from Supervised Dictionary Learning (Algorithm 1) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 2 is misclassified significantly more often than the other digits, roughly two times more than the next highest. 13

14 5.2 Supervised 5 EXPERIMENTS AND RESULTS Centered and Table 4: Results from Supervised Dictionary Learning (Algorithm 1) with K = 5, centered and normalized image signals. 1.96% is close to Sprechmann s supervised results of 1.26% with K = 8 for digits {,..., 9} (see Table 1). It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification rate. Cluster Type Centered & Digits K misclassification Supervised True {, 1} 5.% Supervised True {2, 3} 5.29% Supervised True {,..., 4} 5.41% Supervised True {,..., 5} 5.63% Supervised True {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 3: Misclassification rate of results from Supervised Dictionary Learning (Algorithm 1) with K = 5, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 is misclassified significantly more often than the other digits, at least two times more. When centering and normalizing image signals, the comparative rate of misclassification for increases, almost doubling (see Figure 1). 14

15 5.2 Supervised 5 EXPERIMENTS AND RESULTS Table 5: Results from Supervised Dictionary Learning (Algorithm 1) with K = 8, centered and normalized image signals. 1.89% is close to Sprechmann s supervised results of 1.26% with K = 8 for digits {,..., 9} (see Table 1). It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification. In fact, there is no misclassification in {, 1}. Cluster Type Centered & Digits K misclassification Supervised True {, 1} 8.% Supervised True {2, 3} 8.24% Supervised True {,..., 4} 8.41% Supervised True {,..., 5} 8.6% Supervised True {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 4: Misclassification rate of results from Supervised Dictionary Learning (Algorithm 1) with K = 8, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 is misclassified significantly more often than the other digits, at least two times more. When centering and normalizing image signals, the comparative rate of misclassification for increases, almost doubling (see Figure 2). 15

16 5.3 Semisupervised 5 EXPERIMENTS AND RESULTS 5.3 Semisupervised Table 6: Misclassification of the dictionary set in the final iteration for digits 4, perturbation percentages 2 to 9 (increments of 1). All misclassification rates are low, ranging between.85% and 1.5%. Misclassification rates mostly increase as perturbation percent increases, though there are times when it decreases sharply (3% and 7%), likely due to the random nature of the perturbation. Cluster Type Centered & Digits K perturbation % misclassification Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} % Semisupervised True {,...,4} %.1298 Energy at Each Iteration, digits 4, 4% 1.35 Average misclassification Rate at Each Iteration, digits 4, 4% 45 Changes in Training Signals at Each Iteration, digits 4, 4% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement).133 Energy at Each Iteration, digits 4, 7% 5 Average misclassification Rate at Each Iteration, digits 4, 7% 14 Changes in Training Signals at Each Iteration, digits 4, 7% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement).142 Energy at Each Iteration, digits 4, 9% 35 Average misclassification Rate at Each Iteration, digits 4, 9% 8 Changes in Training Signals at Each Iteration, digits 4, 9% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Figure 5: Semisupervised results for digits 4. Iteration is the initial dictionary set; each iteration i (i > ) is the ith refinement. Change i is the change between dictionary sets i 1 and i (dictionary set i is the ith refinement, dictionary set is the initial set). Left to right: energy, misclassification, changes in training image classification. Note that as perturbation increases, the payoff from the refinement process increases. 16

17 5.3 Semisupervised 5 EXPERIMENTS AND RESULTS Table 7: Misclassification of the dictionary set in the final iteration for digits 9, perturbation percentages 2 to 9 (increments of 1). The misclassification rates fall within a consistent range of 4% 6%, mostly increasing as perturbation percent increases, though there was a decrease in misclassification for a perturbation of 7%. Cluster Type Centered & Digits K perturbation misclassification Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} % Semisupervised True {,...,9} %.1334 Energy at Each Iteration, digits 9, 4% 5.2 Average misclassification Rate at Each Iteration, digits 9, 4% 22 Changes in Training Signals at Each Iteration, digits 9, 4% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement).137 Energy at Each Iteration, digits 9, 7% 8.5 Average misclassification Rate at Each Iteration, digits 9, 7% 45 Changes in Training Signals at Each Iteration, digits 9, 7% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement).148 Energy at Each Iteration, digits 9, 9% 4 Average misclassification Rate at Each Iteration, digits 9, 9% 18 Changes in Training Signals at Each Iteration, digits 9, 9% Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Figure 6: Semisupervised results for digits 9. Iteration is the initial dictionary set; each iteration i (i > ) is the ith refinement. Change i is the change between dictionary sets i 1 and i (dictionary set i is the ith refinement, dictionary set is the initial set). Left to right: energy, misclassification, changes in training image classification. Misclassification corresponds better to changes in training image classifications than energy. Note that as perturbation increases, the payoff from the refinement process increases. 17

18 5.4 Unsupervised - Signals 5 EXPERIMENTS AND RESULTS 5.4 Unsupervised - Signals Non-centered and Non-normalized Table 8: Results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 5, non-centered and non-normalized image signals. 31.4% is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification rate. Interestingly, it is also more difficult to distinguish {,..., 5} than {,..., 9}, though the latter has more classes. Cluster Type Centered & Digits K misclassification Unsupervised - Signals False {, 1} % Unsupervised - Signals False {2, 3} % Unsupervised - Signals False {,..., 4} % Unsupervised - Signals False {,..., 5} % Unsupervised - Signals False {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 7: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over three times as often. This is also seen in clustering atoms (see Figure 12). 18

19 5.4 Unsupervised - Signals 5 EXPERIMENTS AND RESULTS Table 9: Results from Unsupervised Dictionary Learning with signals clustering (Algorithm 3) with K = 8, non-centered and non-normalized image signals % is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). Misclassification rate for 2 and 3 is significantly larger than the one for and 1. Interestingly, it is also more difficult to distinguish {,..., 5} than {,..., 9}, though the latter has more classes. Cluster Type Centered & Digits K misclassification Unsupervised - Signals False {, 1} % Unsupervised - Signals False {2, 3} % Unsupervised - Signals False {,..., 4} % Unsupervised - Signals False {,..., 5} % Unsupervised - Signals False {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 8: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 8, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 3 is misclassified significantly more often than the other digits. When classifying digits to 5, is misclassified significantly more often than the other digits, over three times as often. When classifying digits to 9, 3 and 8 are misclassified significantly more often than the other digits, over around twice as often. 19

20 5.4 Unsupervised - Signals 5 EXPERIMENTS AND RESULTS Centered and Table 1: Results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 5, centered and normalized image signals % is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). Misclassification rate for 2 and 3 is significantly larger than the one for and 1. There is a high jump in misclassification rate when the number of digits classified increases from two to five. Also the misclassification rate jumps by around 15% when the digits being classified increase from {,..., 4} to {,..., 5}. Interestingly, it is also more difficult to distinguish {,..., 5} than {,..., 9}, though the latter has more classes. Cluster Type Centered & Digits K misclassification Unsupervised - Signals True {, 1} 5.23% Unsupervised - Signals True {2, 3} % Unsupervised - Signals True {,..., 4} % Unsupervised - Signals True {,..., 5} % Unsupervised - Signals True {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 9: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 5, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over more than three times as often. When classifying digits to 5, 3 is misclassified significantly more often than the other digits, over twice as often. When classifying digits to 9, 8 is misclassified significantly more often than the other digits, over twice as often 2

21 5.4 Unsupervised - Signals 5 EXPERIMENTS AND RESULTS Table 11: Results from Unsupervised Dictionary Learning with signal clustering (Algorithm 3) with K = 8, centered and normalized image signals. 27.4% is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. It can also be seen that {2, 3} is more difficult to distinguish than {, 1}, from the former s higher misclassification rate. Interestingly, it is also more difficult to distinguish {,..., 5} than {,..., 9}, though the latter has more classes. Cluster Type Centered & Digits K misclassification Unsupervised - Signals True {, 1} 8.28% Unsupervised - Signals True {2, 3} % Unsupervised - Signals True {,..., 4} % Unsupervised - Signals True {,..., 5} % Unsupervised - Signals True {,..., 9} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 1: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 5, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over three times as often. This is also seen in clustering signals (see Figure 9). When classifying digits to 9, 3 is misclassified significantly more often than the other digits, over twice as often. 21

22 5.4 Unsupervised - Signals 5 EXPERIMENTS AND RESULTS Changes over Refinements Figure 11: Results from multiple refinement iterations for Unsupervised Dictionary Learning with signal clustering (Algorithm 3). Top to bottom: K = 5, non-centered and non-normalized image signals; K = 8, non-centered and non-normalized image signals; K = 5, centered and normalized image signals; K = 8, centered and normalized image signals. Left to right: Energy, misclassification, changes in training image classification. Energy decreases gradually exponentially with refinements. Misclassification increases slightly with refinements for all cases except K = 8, centered and normalized image signals. Changes in training images are small (one or no changes per refinement). Energy at Each Iteration, digits 4, NCNN Average misclassification Rate at Each Iteration, digits 4, NCNN 1 Changes in Training Signals at Each Iteration, digits 4, NCNN Energy Average misclassification Rate 45 4 Number of signals that changed Change to refinement # (change 1 = change to first refinement) Energy at Each Iteration, digits 4, NCNN Average misclassification Rate at Each Iteration, digits 4, NCNN 1 Changes in Training Signals at Each Iteration, digits 4, NCNN Energy Average misclassification Rate 45 4 Number of signals that changed Change to refinement # (change 1 = change to first refinement) Energy at Each Iteration, digits 4, CN Average misclassification Rate at Each Iteration, digits 4, CN 1 Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Energy at Each Iteration, digits 4, CN Average misclassification Rate at Each Iteration, digits 4, CN 1 Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) 22

23 5.5 Unsupervised - Atoms 5 EXPERIMENTS AND RESULTS 5.5 Unsupervised - Atoms Non-centered and Non-normalized Table 12: Results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 5, non-centered and non-normalized image signals % is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. Cluster Type Centered & Digits K misclassification Unsupervised - Atoms False {, 1} % Unsupervised - Atoms False {2, 3} % Unsupervised - Atoms False {,..., 4} % Unsupervised - Atoms False {,..., 5} % Unsupervised - Atoms False {,..., 9} % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in %.45 Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 12: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, up to four times more often. This is also seen in clustering signals (see Figure 7). When classifying digits to 5, 4 is misclassified significantly more often than the other digits, approximately an 85% increase from the next highest (compare to centered and normalized signals in Table 14, where 2 is significantly more misclassified). When classifying digits to 9, 6 is misclassified significantly less often than the other digits, less than half as often of the next lowest. 23

24 5.5 Unsupervised - Atoms 5 EXPERIMENTS AND RESULTS Table 13: Results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 8, non-centered and non-normalized image signals. 29.5% is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. Cluster Type Centered & Digits K misclassification Unsupervised - Atoms False {, 1} % Unsupervised - Atoms False {2, 3} % Unsupervised - Atoms False {,..., 4} % Unsupervised - Atoms False {,..., 5} % Unsupervised - Atoms False {,..., 9} % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in %.4 Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 13: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 8, non-centered and non-normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over twice as often. When classifying digits to 5, 4 is misclassified significantly more often than the other digits, over twice as often. This is also seen in clustering signals (see Figure 8). When classifying digits to 9, is misclassified significantly more often than the other digits, over three times as often. Meanwhile, 3 5 are misclassified significantly less ofen than with K = 5 (see Table 12). 24

25 5.5 Unsupervised - Atoms 5 EXPERIMENTS AND RESULTS Centered and Table 14: Results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 5, centered and normalized image signals % is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. Cluster Type Centered & Digits K misclassification Unsupervised - Atoms True {, 1} 5.5% Unsupervised - Atoms True {2, 3} 5 1.8% Unsupervised - Atoms True {,..., 4} % Unsupervised - Atoms True {,..., 5} % Unsupervised - Atoms True {,..., 9} % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 14: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 5, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over four times as often. This is also seen in clustering signals (see Figure 9). When classifying digits to 5, 2 is misclassified significantly more often than the other digits, over twice as often (compare to non-centered and non-normalized signals in Table 12, where 4 is significantly more misclassified). 25

26 5.5 Unsupervised - Atoms 5 EXPERIMENTS AND RESULTS Table 15: Results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 8, centered and normalized image signals % is significantly different from Sprechmann s unsupervised results of 1.44% with K = 5 for digits {,..., 4} (see Table 1). There is a high jump in misclassification rate when the number of digits classified increases from two to five. Cluster Type Centered & Digits K misclassification Unsupervised - Atoms True {, 1} 8.9% Unsupervised - Atoms True {2, 3} 8 1.8% Unsupervised - Atoms True {,..., 4} % Unsupervised - Atoms True {,..., 5} % Unsupervised - Atoms True {,..., 9} % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate 32.9 in % Figure 15: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering (Algorithm 4) with K = 8, centered and normalized image signals, w.r.t. classes. When classifying digits to 4, 2 and 3 are misclassified significantly more often than the other digits, over four times as often. This is also seen in clustering signals (see Figure 1). When classifying digits to 5, 2 is misclassified significantly more often than the other digits, over twice as often (compare to non-centered and non-normalized signals in Table 12, where 4 is significantly more misclassified). 26

27 5.5 Unsupervised - Atoms 5 EXPERIMENTS AND RESULTS Changes over Refinements Figure 16: Results from multiple refinement iterations for Unsupervised Dictionary Learning with atom clustering (Algorithm 4). Top to bottom: K = 5, non-centered and non-normalized image signals; K = 8, non-centered and non-normalized image signals; K = 5, centered and normalized image signals; K = 8, centered and normalized image signals. Left to right: Energy, misclassification, changes in training image classification. Energy decreases gradually exponentially with refinements. Misclassification increases with refinements for cases all except K = 8, non-centered and non-normalized image signals. Changes in training images decrease sporadically with refinements. Energy at Each Iteration, digits 4, NCNN Average misclassification Rate at Each Iteration, digits 4, NCNN Changes in Training Signals at Each Iteration, digits 4, NCNN Energy Average misclassification Rate Number of signals that changed Energy at Each Iteration, digits 4, NCNN Average misclassification Rate at Each Iteration, digits 4, NCNN Change to refinement # (change 1 = change to first refinement) Changes in Training Signals at Each Iteration, digits 4, NCNN Energy Average misclassification Rate Number of signals that changed Energy at Each Iteration, digits 4, CN Average misclassification Rate at Each Iteration, digits 4, CN Change to refinement # (change 1 = change to first refinement) Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Energy at Each Iteration, digits 4, CN Average misclassification Rate at Each Iteration, digits 4, CN Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) 27

28 5.6 Unsupervised - Atoms, split in 2 each iteration 5 EXPERIMENTS AND RESULTS 5.6 Unsupervised - Atoms, split in 2 each iteration Non-centered and Non-normalized Table 16: Results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 5, non-centered and non-normalized image signals. Performs worse than other unsupervised algorithms (Tables 8 and 12). Cluster Type Centered & Digits K misclassification Unsupervised - Atoms False {,1} % (split initialization) Unsupervised - Atoms False {2,3} % (split initialization) Unsupervised - Atoms False {,...,4} % (split initialization) Unsupervised - Atoms False {,...,5} % (split initialization) Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 17: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. 28

29 5.6 Unsupervised - Atoms, split in 2 each iteration 5 EXPERIMENTS AND RESULTS Table 17: Results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 8, non-centered and non-normalized image signals. Performs worse than other unsupervised algorithms (Tables 9 and 13). Cluster Type Centered & Digits K misclassification Unsupervised - Atoms False {,1} % (split initialization) Unsupervised - Atoms False {2,3} 8 4.5% (split initialization) Unsupervised - Atoms False {,...,4} % (split initialization) Unsupervised - Atoms False {,...,5} % (split initialization) Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 18: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 8, non-centered and non-normalized image signals, w.r.t. classes. 29

30 5.6 Unsupervised - Atoms, split in 2 each iteration 5 EXPERIMENTS AND RESULTS Centered and Table 18: Results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 5, centered and normalized image signals % is significantly different from Sprechmann s results of 1.44% with K = 5 (see Table 1). Performs worse than other unsupervised algorithms (Tables 1 and 14). Cluster Type Centered & Digits K misclassification Unsupervised - Atoms True {,1} 5.19% (split initialization) Unsupervised - Atoms True {2,3} % (split initialization) Unsupervised - Atoms True {,...,4} % (split initialization) Unsupervised - Atoms True {,...,5} % (split initialization) Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 19: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 5, centered and normalized image signals, w.r.t. classes. is the most misclassified digit; misclassified over three times as often as other digits. 1 is the least misclassified, at least a quarter less often. 3

31 5.6 Unsupervised - Atoms, split in 2 each iteration 5 EXPERIMENTS AND RESULTS Table 19: Results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 8, centered and normalized image signals % is significantly different from Sprechmann s results of 1.44% with K = 5 (see Table 1). Performs worse than other unsupervised algorithms (Tables 11 and 15). Cluster Type Centered & Digits K misclassification Unsupervised - Atoms True {,1} 8.19% (split initialization) Unsupervised - Atoms True {2,3} % (split initialization) Unsupervised - Atoms True {,...,4} % (split initialization) Unsupervised - Atoms True {,...,5} % (split initialization) Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Repartition of the errors in function of the class, misclassification rate in % Figure 2: Misclassification rate of results from Unsupervised Dictionary Learning with atom clustering, one dictionary split each step (Algorithm 5) with K = 8, centered and normalized image signals, w.r.t. classes. is the most misclassified digit; misclassified over three times as often as other digits. 1 is the least misclassified, at least a quarter less often. 31

32 5.7 Unsupervised (kmeans) - Signals 5 EXPERIMENTS AND RESULTS 5.7 Unsupervised (kmeans) - Signals Non-centered and Non-normalized Table 2: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and param.iter = 2, non-centered and non-normalized image signals. 1.67% comes close to Sprechmann s unsupervised results of 1.44% with K = 5 (see Table 1). Cluster Type Centered & Digits K iterations misclassification Unsupervised - Signals False {,...,4} % (kmeans) Unsupervised - Signals Flase {,...,5} % (kmeans) Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 21: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5, non-centered and non-normalized image signals, w.r.t. classes. Left: 2 is the most misclassified, approximately twice more often than the rest. Right: 3 and 5 are the most misclassified digits, approximately eight times more often than the rest. (Compare to centered and normalized case, Figure 23.) 32

33 5.7 Unsupervised (kmeans) - Signals 5 EXPERIMENTS AND RESULTS Table 21: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and param.iter = 2, non-centered and non-normalized image signals. 2.14% approaches Sprechmann s unsupervised results of 1.44% with K = 5 (see Table 1). Cluster Type Centered & Digits K iterations misclassification Unsupervised - Signals False {,...,4} % (kmeans) Unsupervised - Signals False {,...,5} % (kmeans) Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 22: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5, centered and normalized image signals, w.r.t. classes. Left: 2 is the most misclassified, approximately 3% more than the second most. Right: 3 and 5 misclassified often. Unlike other cases of classifying {,..., 5}, 1 is also misclassified often. 33

34 5.7 Unsupervised (kmeans) - Signals 5 EXPERIMENTS AND RESULTS Centered and Table 22: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and param.iter = 2, centered and normalized image signals. 1.13% beats Sprechmann s unsupervised results of 1.44% with K = 5 (see Table 1). There is a large increase from adding one class (going from classifying digits {,..., 4} to {,..., 5}). Cluster Type Centered & Digits K iterations misclassification Unsupervised - kmeans True {,...,4} % Unsupervised - kmeans True {,...,5} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 23: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5, centered and normalized image signals, w.r.t. classes. Left: 2 is the most misclassified, approximately twice more often than the rest. Right: 3 and 5 are the most misclassified digits, approximately eight times more often than the rest. (Compare to non-centered and non-normalized case, Figure 21.).1298 Energy at Each Iteration, digits 4, CN 5 Average misclassification Rate at Each Iteration, digits 4, CN 8 Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Figure 24: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and param.iter = 2, centered and normalized image signals. Left to right: energy, misclassification, changes in training image classification. Iteration is the initial dictionary set; each iteration i (i > ) is the ith refinement. Change i is the change between dictionary sets i 1 and i (dictionary set i is the ith refinement, dictionary set is the initial set). Left to right: energy, misclassification, changes in training image classification. Misclassification corresponds better to changes in training image classifications than energy, unlike spectral clustering cases. 34

35 5.7 Unsupervised (kmeans) - Signals 5 EXPERIMENTS AND RESULTS Table 23: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and iparam.iter = 2, centered and normalized image signals..54% beats Sprechmann s results of 1.44% with K = 5 (see Table 1) Cluster Type Centered & Digits K iterations misclassification Unsupervised - kmeans True {,...,4} % Unsupervised - kmeans True {,...,5} % Repartition of the errors in function of the class, misclassification rate Repartition of the errors in function of the class, misclassification rate Figure 25: Misclassification rate of results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5, centered and normalized image signals, w.r.t. classes. Left: 2 is the most misclassified, approximately twice more often than the rest. Right: 3 and 5 are the most misclassified digits, approximately eight times more often than the rest..115 Energy at Each Iteration, digits 4, CN 7 Average misclassification Rate at Each Iteration, digits 4, CN 6 Changes in Training Signals at Each Iteration, digits 4, CN Energy Average misclassification Rate Number of signals that changed Change to refinement # (change 1 = change to first refinement) Figure 26: Results from Unsupervised Dictionary Learning with signal clustering, clustering via kmeans (Algorithm 6) with K = 5 and param.iter = 2, centered and normalized image signals. Left to right: energy, misclassification, changes in training image classification. Iteration is the initial dictionary set; each iteration i (i > ) is the ith refinement. Change i is the change between dictionary sets i 1 and i (dictionary set i is the ith refinement, dictionary set is the initial set). Left to right: energy, misclassification, changes in training image classification. Misclassification corresponds better to changes in training image classifications than energy, unlike spectral clustering cases. 35

36 5.8 Gaussian Noise Experiments 5 EXPERIMENTS AND RESULTS 5.8 Gaussian Noise Experiments Classifying Pure Gaussian Noise For this experiment, we generate one thousand images of pure Gaussian noise and use dictionaries trained on the MNIST images to classify each image of noise as a digit -9. The purpose is to analyze how robust our algorithm is with respect to noisy images. The dictionaries used are trained with supervised dictionary learning, with 8 atoms per dictionary. We perform the experiment with and without centering and normalizing the training images, and we perform it for several different variances of the Gaussian noise: σ 2 =.1,.5, and1.. The histograms contain the classification rates for how often a signal of Gaussian noise was classified as each digit. The histograms for noise variance.1 and 1. are similar to the results displayed here for noise variance.5. Table 24: Classification rate histograms for pure Gaussian noise Train dictionaries on Train dictionaries on centered and normalized data not centered and not normalized data noise variance =.5 noise variance = Adding Noise to Test Images For this experiment, we consider the case when there are clean training images, but noisy test images. Gaussian noise is added to the MNIST test images, and then the supervised dictionaries with 8 atoms for digits -9 are used to classify the modified test images. The purpose, again, is to analyze the robustness of our algorithm with respect to noisy images. We perform the experiment with Gaussian noise variance set to: σ 2 =.1,.5, and1. and we perform the experiment with and without centering and normalizing the data. For the test images, we center and normalize after the noise has been added to the image. The histograms contain the misclassification rates for each digit. The histograms for noise variance.1 and 1. are similar to the results displayed here for noise variance Adding Noise to Training and Test Images For this experiment, we consider the case when there is noise both in the training and test data. The same procedure as above is used again, with the modification that this time Gaussian noise is added to the MNIST training images before training the dictionaries. We perform this experiment first with the setting the variance in the Gaussian noise to be the same for both training and test images, and then with the training variance equal to twice the test variance. 36

37 5.8 Gaussian Noise Experiments 5 EXPERIMENTS AND RESULTS Figure 27: Visualization of Noisy MNIST Images Table 25: Misclassification histograms for noise in test images only Centered and Not Centered and Not test noise variance =.5 test noise variance =.5 Misclassification rate 8.21% Misclassification rate 87.5% Table 26: Misclassification rates for noise in test images only Test noise variance Centered & 2.31% 8.21% 21.98% Not Centered & Not 57.5% 87.5% 88.77% Table 27: Misclassification rates for training noise variance equal to test noise variance Training and test noise variance Centered & 3.2% 17.33% 38.92% Not Centered & Not 11.8% 57.56% 75.87% 37

38 5.8 Gaussian Noise Experiments 5 EXPERIMENTS AND RESULTS Table 28: Misclassification rates for training noise variance equal to 2 times test noise variance Noise variances training:.1 training:.5 training:1. test:.5 test:.25 test:.5 Centered & 2.8% 9.15% 2.63% Not Centered & Not 5.66% 31.97% 59.8% 38

39 7 DISCUSSION 6 Code and Toolboxes The following folders of MATLAB code have been provided: MNIST: MNIST database and MNIST MATLAB helper functions spams-matlab: Sparse Modeling Software (SPAMS) toolbox[11][12] Spectral Clustering: AFFECT MATLAB Toolbox for clustering dynamic data[18] Dictionary Learning Supervised Clustering Semisupervised Clustering Unsupervised Clustering atoms Unsupervised Clustering signals Unsupervised Clustering atoms split initialization Unsupervised Clustering kmeans Gaussian Noise: experiments adding noise to MNIST The experiments can be replicated using the provided MATLAB code, using the following steps: I. Copy the contents of the MNIST, SPAMS, and Spectral Clustering folders into the desired Dictionary Learning subfolder. II. Set up the SPAMS toolbox by following the directions in HOW TO INSTALL.txt. III. Choose which experiment to perform. Set parameters and run the corresponding MATLAB script. Parameters can be set in STEP. of each script. i. Supervised Dictionary Learning: run SCRIPT SupervisedDictionary (in Supervised Clustering folder) ii. Semisupervised Dictionary Learning: run SCRIPT SemisupervisedDictionary (in Semisupervised Clustering folder) iii. Unsupervised Dictionary Learning (clustering by atoms, spectral clustering): run SCRIPT UnsupervisedDictionary atoms (in Unsupervised Clustering atoms folder) iv. Unsupervised Dictionary Learning (clustering by signals, spectral clustering): run SCRIPT UnsupervisedDictionary signals (in Unsupervised Clustering signals folder) v. Unsupervised Dictionary Learning (clustering by atoms, split-by-two each iter., spectral clustering) run SCRIPT UnsupervisedDictionary atoms split initialization (in Unsupervised Clustering atoms split initialization folder) vi. Unsupervised Dictionary Learning (clustering by signals, kmeans) run SCRIPT UnsupervisedDictionary kmeans (in Unsupervised Clustering kmeans) vii. Classifying Pure Gaussian Noise (in Gaussian Noise folder) run SCRIPT classifygaussiannoise viii. Add Noise to MNIST training/test images and classify (in Gaussian Noise folder) run SCRIPT classifymnistwithnoise NOTE: We mimicked Equation (1) when using SPAMS by setting param.mode = 2, which aims at solving 1 min D n n i=1 setting λ 2 = and scaling λ by a factor of 1/2. ( ) 1 2 xi Dαi λ α i 1 + λ 2 α i 2 2, (3) 7 Discussion Our experiments agree with [16] (Table 1) for supervised dictionary learning (Table 4 and Table 5), as long as we center and normalize the data. It is possible to improve the results by increasing the number of iterations per dictionary training (param.iter), though doing so increases running time tremously. We were unable to mimic their result, with the algorithms described in [16], using unsupervised dictionary learning, as can be seen Table 15 and Table

40 7 DISCUSSION Nevertheless, we deduce that the dictionary learning process (utilizing SPAMS [11, 12]) works as it should. Thus, the disparity between supervised and unsupervised clustering results lies in the spectral clustering process. We tried multiple spectral clustering toolboxes ([2, 3, 18]), none of which produced desirable results. This can be illustrated visibly by Figure 28, which displays the initial clusterings to produce the unsupervised dictionaries. These figures were produced with SPAMS displaypatches function. Figure 28: Clusters from centered and normalized signals of digits to 4, K = 5 Left: initial cluster of atoms, result of Algorithm 4; Right: initial cluster of signals, result of Algorithm 3. As can be seen, the clusters in both figures are not meaningful because each of them mixes 2, 3 and a few other digits together As can be seen in Figure 28 spectral clustering was unable to differentiate between 2 and 3, resulting in a high misclassification rate. This can further be seen, when clustering atoms, in Table 14, where there was an average misclassification rate of.946% when clustering {, 1}, compared to a rate of % when clustering {2, 3}. Moreover, for clustering signals as in Table 1, there was an average misclassification rate of.1891% when clustering {, 1}, compared to a rate of % when clustering {2, 3}. One can wonder why {2, 3} is more difficult to classify than {, 1}. Our first results point out a significant difference between the misclassification rates achieved with dictionaries associated with these two sets. We first tried examining whether there were any specific atoms of our large unclustered dictionary which were used disproportionately often in constructing sparse representations for the images. If there was such an atom, its appearance in multiple classes may have helped cause many false classifications. Upon further examination, such an atom was not found (see Figure 29). We moved on to considering other potential problems: concerning the initial partition, obtained in [16], the authors do not justify or explain the proposed choices of similarities measures. A very simple mathematical argument allows us to understand the undesired effects of these choices: Assume that the matrix A (see Section 4) contains only positive entries. Under that condition, S 1 := A t A = A t A. This means that two columns α 1, α 2 R K of A, or equivalently two sparse signals representations, are similar if their standard inner product a 1, a 2 = a 1 2 a 2 2 cos(a 1, a 2) on R K is large. Thus, given two pairs of vectors a 1, a 2 and b 1, b 2 of R K such that cos(a 1, a 2) = cos(b 1, b 2) the similarity measure proposed in [16] will favor vectors whose norm products are bigger. As a corollary, nearly orthogonal vectors can be seen as more similar than co-linear vectors if the product of their norms are big enough. This surprising fact is not discussed in [16]. To cope with that fact, we tried to normalize the matrix A, so that each of its columns have unit l2 norms. For an unknown reason the norms of the columns of A are, with MNIST training images, almost constant (see Figure 3 for an example). 4

41 7 DISCUSSION Figure 29: Histogram of atom usage in sparse signal decomposition for dictionaries of 8 atoms. Left: non-centered, non-normalized. Right: centered, normalized..7 Histogram of the l2 norm of the columns of A ().6.5 Emirical probability Norm interval Figure 3: The normalized histogram of the L 2 norm of the columns of A. Most of the columns norm accumulate in the interval [3, 4], which takes around 66%. Another point that is undiscussed in [16] is the absolute value. For example, the two orthogonal vectors a 1 = (1, 1) t and a 2 = (1, 1) t become co-linear when the entries are replaced by their absolute values. Surprisingly, again, replacing changing S 1 to AA t didn t dramatically improve our results. This is justified, empirically, by examining how few of the entries in A are negative numbers: over 1 6 entries. The most puzzling sentences of [16] are the last few before the concluding remarks: We observed that best results are obtained for all the experiments when the initial dictionaries in the learning stage are constructed by randomly selecting signals from the training set. If the size of the dictionary compared to the dimension of the data 41

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v2 [cs.cv] 30 Mar 2017 Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Comment-based Multi-View Clustering of Web 2.0 Items

Comment-based Multi-View Clustering of Web 2.0 Items Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Extending Place Value with Whole Numbers to 1,000,000

Extending Place Value with Whole Numbers to 1,000,000 Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Montana Content Standards for Mathematics Grade 3 Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Contents Standards for Mathematical Practice: Grade

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Why Did My Detector Do That?!

Why Did My Detector Do That?! Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5 South Carolina College- and Career-Ready Standards for Mathematics Standards Unpacking Documents Grade 5 South Carolina College- and Career-Ready Standards for Mathematics Standards Unpacking Documents

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

A survey of multi-view machine learning

A survey of multi-view machine learning Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand Texas Essential Knowledge and Skills (TEKS): (2.1) Number, operation, and quantitative reasoning. The student

More information

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple Unit Plan Components Big Goal Standards Big Ideas Unpacked Standards Scaffolded Learning Resources

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

Math 96: Intermediate Algebra in Context

Math 96: Intermediate Algebra in Context : Intermediate Algebra in Context Syllabus Spring Quarter 2016 Daily, 9:20 10:30am Instructor: Lauri Lindberg Office Hours@ tutoring: Tutoring Center (CAS-504) 8 9am & 1 2pm daily STEM (Math) Center (RAI-338)

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Diverse Concept-Level Features for Multi-Object Classification

Diverse Concept-Level Features for Multi-Object Classification Diverse Concept-Level Features for Multi-Object Classification Youssef Tamaazousti 12 Hervé Le Borgne 1 Céline Hudelot 2 1 CEA, LIST, Laboratory of Vision and Content Engineering, F-91191 Gif-sur-Yvette,

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Diploma Thesis of Michael Heck At the Department of Informatics Karlsruhe Institute of Technology

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams This booklet explains why the Uniform mark scale (UMS) is necessary and how it works. It is intended for exams officers and

More information

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE Mingon Kang, PhD Computer Science, Kennesaw State University Self Introduction Mingon Kang, PhD Homepage: http://ksuweb.kennesaw.edu/~mkang9

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design. Name: Partner(s): Lab #1 The Scientific Method Due 6/25 Objective The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Implementing a tool to Support KAOS-Beta Process Model Using EPF Implementing a tool to Support KAOS-Beta Process Model Using EPF Malihe Tabatabaie Malihe.Tabatabaie@cs.york.ac.uk Department of Computer Science The University of York United Kingdom Eclipse Process Framework

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES THE PRESIDENTS OF THE UNITED STATES Project: Focus on the Presidents of the United States Objective: See how many Presidents of the United States

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Chapter 4 - Fractions

Chapter 4 - Fractions . Fractions Chapter - Fractions 0 Michelle Manes, University of Hawaii Department of Mathematics These materials are intended for use with the University of Hawaii Department of Mathematics Math course

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

BMBF Project ROBUKOM: Robust Communication Networks

BMBF Project ROBUKOM: Robust Communication Networks BMBF Project ROBUKOM: Robust Communication Networks Arie M.C.A. Koster Christoph Helmberg Andreas Bley Martin Grötschel Thomas Bauschert supported by BMBF grant 03MS616A: ROBUKOM Robust Communication Networks,

More information

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

The University of Amsterdam s Concept Detection System at ImageCLEF 2011 The University of Amsterdam s Concept Detection System at ImageCLEF 2011 Koen E. A. van de Sande and Cees G. M. Snoek Intelligent Systems Lab Amsterdam, University of Amsterdam Software available from:

More information