EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems

Size: px
Start display at page:

Download "EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems"

Transcription

1 EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems Pilsung Kang and Sungzoon Cho Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, , Seoul, Korea Abstract. Data imbalance occurs when the number of patterns from a class is much larger than that from the other class. It often degenerates the classification performance. In this paper, we propose an Ensemble of Under-Sampled SVMs or EUS SVMs. We applied the proposed method to two synthetic and six real data sets and we found that it outperformed other methods, especially when the number of patterns belonging to the minority class is very small. 1 Introduction In classification, data imbalance occurs when the number of patterns of a class is much larger than that of the other class. Most classification algorithms are trained under the assumption that the ratio of the classes is almost equal. In real classification tasks, however, this assumption is often violated. Fraud detection[1], for instance, is the classification task that identifies the customers who are likely to commit a fraud among the customer database of the company. In this task, the number of fraudulent customers is much smaller than that of normal customers, so data imbalance occurs. In addition, data imbalance is reported in a wide range of classification tasks, such as Oil Spill Detection[2], Response Modeling[3], Remote Sensing[4], and Scene Classification[5]. Data imbalance is one of the causes that degrade the performance of machine learning algorithms including Support Vector Machines(SVMs) in classification tasks. This comes from two major causes. First, the simple accuracy as an objective function used in most classification tasks is inadequate for the task having data imbalance. For example, let us consider a classification problem in which there are two classes, 1% of the patterns belonging to the minority class and 99% of the patterns belonging to the majority class. If a classifier made a decision that all patterns should be classified into the majority class, it would achieve 99% of accuracy. This can be considered as a good performance in terms of simple accuracy, but this is of no use since the classifier does not catch any important information on the patterns of the minority class. The second cause comes from the distribution of the classes. Since the number of majority class patterns exceeds that of the minority class, the majority class is likely to invade Corresponding author

2 the territory of the minority class so that the class boundary becomes vulnerable to be distorted. In order to deal with the inappropriateness of simple accuracy in data imbalance problems, some other objective functions have been addressed in previous work[6][7][8]. Though the formulations of the functions are different from each other, they are all mainly focusing on considering the accuracy of both majority class and minority class. In order to deal with the problem caused by the skewed data distribution, three methods are commonly proposed. First, undersampling[9][10] method balances the ratio of the classes by sampling a small number of patterns from the majority class. Not only can under-sampling method elevate the classification performance but also reduce the time complexity since it samples a small number of patterns from the majority class. However, undersampling method also has a potential disadvantage of distorting the distribution of the majority class. If the sampled patterns from the majority class does not represent the original distribution, it may degenerate the classification performance. This potential drawback comes true when the number of minority class patterns is very small. Second, over-sampling method[8][11] balances the ratio of the classes by copying patterns from the minority class. Since over-sampling does not lose the information on all patterns, it can achieve a relatively high performance. However, the required time to train the classifier increases since the number of patterns used in training is much larger than the number of the original patterns. Third, modifying-cost method[4] dictates that misclassified patterns originally belonging to the minority class receive larger penalty than those belonging to the majority class. Modifying cost method can handle the data imbalance without changing the original data distribution. When data is highly imbalanced, however, its effect on the classification performance is not as good as that of under-sampling method or over-sampling method. In this paper, we propose an Ensemble of Under-Sampled SVMs, EUS SVMs. Although SVMs show a good generalization ability in many pattern classification tasks, its performance can be boosted by adopting an ensemble scheme[12]. In addition, the ensemble scheme can lower the variation of each individual classifier so that the performance of the classifier can be more stable. Thus, EUS SVMs can integrate the strength of both SVMs and the ensemble scheme. EUS SVMs build multiple different training sets by sampling patterns from the majority class and combining them with the minority class patterns. Each training set is used for training an individual SVM classifier. The output of the ensemble is produced by aggregating the outputs of all individual classifiers. By adopting the ensemble technique, EUS SVMs can not only make up for the sampling dependency of under-sampling method, but also achieve a reasonable time complexity compared to over-sampling method. We apply EUS SVMs to two synthetic and six real data sets, and the results show that EUS SVMs outperform the other methods. The rest of this paper is structured as follows. In section 2, we demonstrate the effect of data imbalance with synthetic data sets and performance of three approaches. In section 3, we introduce the propose method, EUS SVMs. In sec-

3 Fig. 1. The class boundary of 4 4 checker board data sets(set A) with SVMs base classifier tion 4, we present the experimental settings and analyze the result. In section 5, with a conclusion, we discuss future work. 2 The Effect of Data Imbalance Before we start, let us consider a performance measure appropriate for imbalanced data sets. Suppose that positive patterns are the patterns belonging to the minority class and that negative patterns are the patterns belonging to the majority class. Usual classification tasks use simple accuracy computed by (T P +T N) (T P +F N+F P +T N) when TP, TN, FP, FN represent true positive, true negative, false positive, and false negative respectively. However, as mentioned in Section 1, simple accuracy heavily relies on TN(True Negative) rather than TP(True Positive) when data is imbalanced. Thus, the classifier tends to classify most patterns as negative to achieve a high simple accuracy. In order to prevent this, some other performance measures are have been considered[6][7][8]. In this paper, we adopt Geometric Mean, which considers both the accuracies of the minority class and the majority class equally. A+, the accuracy of the minority class, is computed by (T P ) (T P +F N). A-, the accuracy of the majority class, is computed by (T N) (F P +T N). Then, geometric mean is computed by (A+) (A ). A synthetic data set was generated to understand the effect of data imbalance on the SVM classifier when the number of minority class patterns is not so small in an absolute term. Six 4 4 checker board data sets(set A) were generated. The number of minority class patterns is 320 for all data sets. The ratio of the classes varies from 1:1 to 1:50. The class boundary of each data set using SVM as a base classifier is shown in Fig. 1. The solid line represents the class boundary determined by the SVM classifier. When the number of each class s patterns is not so much different(fig. 1(a)), the generated class boundary is good enough to represent the original class boundary. As the degree of imbalance increases(fig. 1(b)), however, the boundary of the majority class invades the area of the minority class. When data imbalance is extreme(fig. 1(c)), the majority class pushes out the minority class, so the area of the minority class assigned by the classifier is very small. The performance of this experiment is shown in Fig.

4 A+ A Accuracy Geometric Mean 0 1:1 1:3 1:5 1:10 1:30 1:50 Class Ratio Fig. 2. The performance of the SVMs with various imbalance ratios 2. As the degree of imbalance increases, the accuracy of the minority class(a+) decreases rapidly and so does the geometric mean. Simple accuracy, however, tends to increase despite the decrease of A+. This is mainly because the effect of the accuracy of the majority class(a-) on simple accuracy is much greater than A+ when the degree of imbalance is high. This clearly shows that simple accuracy is inappropriate as a performance measure in data imbalance cases. Fig. 3 shows the performances and the elapsed times of existing methods, under-sampling, over-sampling, and modifying cost. Since set A1, whose class ratio is 1:1, is the set having the perfect balanced data, we evaluated the geometric mean and the elapsed time of the set A1 for no sampling only. Modifying cost method, as shown in Fig. 3, has little effect on increasing the performance of the classifier in terms of geometric mean compared to no sampling. Modifying cost method even takes very long time to train the classifier in comparison with no sampling method when the degree of imbalance is very high. Both undersampling and over-sampling seem to cope with the difficulties caused by data imbalance in terms of geometric mean, especially over-sampling representing the highest values in all cases. In terms of time complexity, however, over-sampling is very sensitive to the number of patterns while under-sampling is robust to it. Since over-sampling increases the number of the minority class patterns so that it equals to the number of the majority class patterns, the number of total training patterns becomes twice the number of the majority class patterns. Under-sampling, on the other hand, decreases the number of the majority class patterns so that it equals to the number of the minority class patterns. Therefore, as the degree of imbalance increases, training time of under-sampling does not increase. When the number of minority class data is not sufficient, however, the sampled data from the majority class may not represent the entire distribution of the majority class. Therefore, under-sampling may perform badly when there are only a few minority class patterns. Therefore, we generate another synthetic data set to see what happens to under-sampling when the number of minority class patterns is small in an absolute term. To do this, we generated five 4 4 checker board data sets(set B) having only 80 minority class patterns. Since 1:1 and 1:3 were found to be not seriously imbalanced, we removed these ratios and

5 (a) Geometric Mean of Each Method 0.5 No Sampling 0.4 Under Sampling 0.3 Over Sampling Modifying Cost 0.2 1:1 1:3 1:5 1:10 1:30 1:50 Class Ratio(Minority Class:Majority Class) (b) Elapsed Time of Each Method No Sampling Under Sampling Over Sampling Modifying Cost :1 1:3 1:5 1:10 1:30 1:50 Class Ratio(Monirity Class:Majority Class) Fig. 3. Geometric mean and elapsed time of existing methods (a) Sufficient Minority Class Patterns 8 A+ A 6 G Mean :1 1:3 1:5 1:10 1:30 1:50 Class Ratio(Minority Class:Majority Class) (b) Insufficient Minority Class Patterns :5 1:10 1:30 1:50 1:100 Class Ratio(Minority Class:Majority Class) Fig. 4. A+, A-, and geometric mean of under-sampling method with (a) sufficient minority class patterns and (b) insufficient minority class patterns added a new ratio, 1:100. The classification results of under-sampling method of two data sets, Set A(sufficient minority class patterns) and Set B(insufficient minority class patterns) are shown in Fig. 4. Under-sampling method achieved good geometric means in both data sets regardless the degree of imbalance. However, note that high geometric means were achieved in Set B(insufficient minority class patterns) by sacrificing the accuracy of the majority class while the accuracies of the majority class and the minority class are not so much different in Set A(sufficient minority class patterns). Since the number of patterns sampled from the majority class was not enough to represent the whole distribution of the majority class, the minority class invaded the area of the majority class where the majority class s patterns were not selected. Thus, the classifier overestimated the area of the minority class and resulted in high A+ and low A-. This phenomenon usually happens when a high degree of imbalance occurs with the small number of minority class patterns. 3 EUS SVMs: Ensemble of Under-Sampled SVMs Under-sampling uses only one training set consisting of the sampled majority class patterns and all minority class patterns. In this case, the boundary between two classes is vulnerable to the selected majority class patterns, which results

6 Fig. 5. The procedure of EUS SVMs do partition the training data into majority and minority class for i=1 to N(the number of ensemble population) build the majority subset by random sampling from the majority class whose size is equal to that of minority class construct the training subset by combining the majority subset and minority class train an SVM with the training subset end do combine N outputs of ensemble by a pre-determined rule Fig. 6. EUS SVMs Algorithm in a low and unstable performance. If we employ multiple training sets, majority class patterns have better chances to be included in the training sets. The more patterns included in the training sets, the less likely to distort the data distribution. Thus, we propose an ensemble approach. Given the two data sets of the minority class and the majority class, the majority class patterns are sampled without repetition to construct a subset of majority class. The number of patterns in the majority subset is equal to the number of the minority class patterns(see Fig. 5 and Fig. 6). The sampling is repeated until predetermined number(n) of majority class subsets are built. Note that each majority subset sampling is performed using the entire majority class patterns. Each majority subset is then combined with the minority class patterns to construct a training data subset, which is perfectly balanced. Each training data subset is used for constructing an individual classifier. Finally, the outputs of all individual classifiers are aggregated to produce the output of the ensemble.

7 Fig. 7. Class boundaries determined by no-sampling method with SVMs in 4 4 checker board data set(set B)[(a)-(e)] and spiral data set[(f)-(j)] Table 1. Description of real data sets Data Set Minority Majority Total Minority Patterns Patterns Patterns Ratio Vehicle % Vehicle % Ann-thyroid ,488 3, % Ann-thyroid ,488 3, % Sick-euthyroid 238 1,774 2, % Mammography ,923 11, % 4 Experimental Settings and Results 4.1 Data We used two synthetic problems and six real data sets to verify the effectiveness of EUS SVMs. Five 4 4 checker board data sets(set B) and five spiral data sets were generated. Fig. 7 shows the class boundary of each set when a single SVM classifier is trained with no sampling. Six real data sets with imbalance problem were selected from UCI Machine Learning Repository[13]. Many of the data sets have more than two classes. Since our object is to deal with imbalance, we corrected them into two-class problems. Vehicle 2(3) refers to a problem where only class 2(3) is treated the minority class while the rest is treated as the majority class. Similarly, Ann-thyroid 13(23) refers to a problem where class 1(2) is the minority class while class 3 is treated as the majority class. Since sick-euthyroid and mammography data sets originally consist of two classes, we used them without any class modification 1. Only Ann-thyroid data set has both train data set and test data set. Other data sets were tested using 5-fold cross validation. The overall description of real data sets are shown in Table 1. 1 We would like to thank professor Nitesh V. Chawla for providing us with mammography data set.

8 4.2 Ensemble Aggregation Methods The output of EUS SVMs can be different depending on the aggregation method of the ensemble. In our experiment, we employed three aggregation methods to determine the output of EUS SVMs. First is majority voting. Each individual classifier votes for one of the candidate outputs. The candidate output that has the largest votes becomes the representative output of the ensemble. Second is weighted voting. Once all individual classifiers are finished training, each has its own training error. The output of an individual classifier with small training error contributes to the output of the ensemble more than an individual classifier with large training error. Third is function value aggregation. As SVM is originally designed for two class classification, it has a binary output. The binary output comes from the absolute value of the objective function of SVM. When the objective function value is converted into the binary value, important information on the pattern is lost such as how far the pattern is from the class boundary. The bigger the absolute value, the further the pattern from the class boundary. Therefore, in function value aggregation, the output of the ensemble is determined by adding all the objective function values of the individual classifiers. 4.3 Experimental Results Note that the geometric means of no sampling in 4 4 checker board data set(set B) are 0.732, 0.663, 0.498, and corresponding to the imbalance ratios of 1:5, 1:10, 1:30, 1:50, and 1:100 respectively. The geometric means of no sampling in spiral data set are 0.756, 0.724, 0.700, 0.568, and corresponding to the imbalance ratios of 1:5, 1:10, 1:30, 1:50, and 1:100 respectively. The experimental results of under-sampling method and three EUS SVMs with synthetic data sets and real data sets are shown in Fig. 8 and Table 2 respectively. Three results can be summarized as follows. First, both under-sampling and EUS SVMs are effective to deal with data imbalance. They significantly outper Under Sampling EUS SVMs(MV) 81 EUS SVMs(WV) EUS SVMs(FVA) 1:5 1:10 1:30 1:50 1:100 (a) 4X4 Checker Board Data set (Set B) :5 1:10 1:30 1:50 1:100 (b) Spiral Data Set Fig. 8. Geometric means of (a) 4 4 checker board data set(set B) and (b) spiral Data set (MV:Majority Voting, WV:Weighted Voting, FVA:Function Value Aggregation)

9 Table 2. Geometric means of no-sampling, under-sampling and three EUS SVMs with real data sets (MV:Majority Voting, WV:Weighted Voting, FVA:Function Value Aggregation) Data set Nosamplinsampling (MV) (WV) (FVA) Under- EUS SVMs EUS SVMs EUS SVMs Vehicle Vehicle Ann-thyroid Ann-thyroid Sick-euthyroid Mammography form no sampling, especially when the degree of imbalance increases. Second, although both under-sampling and EUS SVMs work well on the imbalanced training sets, EUS SVMs outperform under-sampling in all cases especially when the original class boundary is very complicated and when the degree of imbalance is high as with spiral data sets. Third, there is no significant difference between the ensemble aggregation methods. Two implicit characteristics of EUS SVMs result in better classification performance. First, EUS SVMs use multiple training sets with balanced patterns. This reduces the possibility of sampling distorting the data distribution so that the classifier is prevented from over-fitting to the minority class. Second, the ensemble pursues diversity to increase the generalization ability by employing a number of individual classifiers. Thus, the classification performance can be better than that of a single classifier. 5 Conclusion Data imbalance is one of the issues that have been widely researched in pattern recognition and machine learning fields. In this paper, we investigate the effect of data imbalance on the performance of the classifier using 2-dimensional synthetic data sets. Among under-sampling, over-sampling, and modifying cost methods, under-sampling was found to be the best method in terms of both classification performance and time complexity. Under-sampling, however, is likely to distort data distribution when there are a very small number of minority class patterns in a highly imbalanced data set. In order to overcome the drawback of undersampling, we proposed Ensemble of Under-Sampled SVMs(EUS SVMs). On two synthetic and six real data sets with various degrees of imbalance, EUS SVMs outperformed under-sampling in all cases in terms of geometric mean. There are some limitations of our work, which lead us to future work. First, we randomly generated patterns from the majority class to build an ensemble training set. More sophisticated sampling methods can be considered to represent the data distribution better. Second, we did not focus on minority class since

10 there are a small number of patterns in the minority class. In order to boost the classification performance, some over-sampling methods, such as noise addition, can be implemented when constructing ensemble training sets. Acknowledgement This work was supported by grant No. R from the Basic Research Program of the Korea Science and Engineering Foundation, the Brain Korea 21 program in 2006 and partially supported by Engineering Research Institute of SNU. References 1. Fawcett, T., Provost, F.: Adaptive Fraud Detection. Data Mining and Knowledge Discovery 1(3). (1997) Kubat, M., Holte, R., Matwin, S.: Machine Learning for the detection of oil spills in satellite radar images. Machine Learning 30(2). (1998) Shin, H.J., Cho, S.Z.: Response Modeling with Support Vector Machine. Expert Systems with Applications 30(4). (1997) Bruzzone, L., Serpico, S.B.: Classification of imbalanced remote-sensing data by neural networks. Pattern Recognition Letters 18(11-13). (1997) Yan, R., Liu, Y., Jin, R., Hauptman, A.: On Predicting Rare Classes with SVM Ensembles in Scene Classification. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP 03) (2003) 6. Kubat, M., Holte, R., Matwin, S.: Learning when Negative Examples Abound, In Proceedings of the 9th European Conference on Machine Learning(ECML 97) (1997) 7. Dumais, S., Platt, J., Hecherman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management (1998) 8. Chawla, N.V., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Oversampling Techniques. Journal of Artificial Intelligence Research 16. (2002) Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One- Sided Selection. In Proceedings of the Fourteenth International Conference on Machine Learning. (1997) Chris, D., Holte, R.C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under- Sampling beats Over-Sampling. In Proceedings of the International Conference on Machine Learning (ICML 2003) Workshop on Learning from Imbalanced Data Sets II. (2003) 11. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. (2003) Kim, H.C., Pang, S., Je, H.M., Kim, D.J., Bang, S.Y.,: Constructing Support Vector Machine Ensemble. Pattern Recognition 36. (2003) UCI Machine Learning Repository: mlearn/mlrepository.html

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Katarzyna Stapor (B) Institute of Computer Science, Silesian Technical University, Gliwice, Poland katarzyna.stapor@polsl.pl

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

Handling Concept Drifts Using Dynamic Selection of Classifiers

Handling Concept Drifts Using Dynamic Selection of Classifiers Handling Concept Drifts Using Dynamic Selection of Classifiers Paulo R. Lisboa de Almeida, Luiz S. Oliveira, Alceu de Souza Britto Jr. and and Robert Sabourin Universidade Federal do Paraná, DInf, Curitiba,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS Md. Tarek Habib 1, Rahat Hossain Faisal 2, M. Rokonuzzaman 3, Farruk Ahmed 4 1 Department of Computer Science and Engineering, Prime University,

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis Julien Ah-Pine, Edmundo-Pavel Soriano-Morales To cite this version: Julien Ah-Pine, Edmundo-Pavel Soriano-Morales. A Study of

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Exposé for a Master s Thesis

Exposé for a Master s Thesis Exposé for a Master s Thesis Stefan Selent January 21, 2017 Working Title: TF Relation Mining: An Active Learning Approach Introduction The amount of scientific literature is ever increasing. Especially

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

A Pipelined Approach for Iterative Software Process Model

A Pipelined Approach for Iterative Software Process Model A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

This scope and sequence assumes 160 days for instruction, divided among 15 units.

This scope and sequence assumes 160 days for instruction, divided among 15 units. In previous grades, students learned strategies for multiplication and division, developed understanding of structure of the place value system, and applied understanding of fractions to addition and subtraction

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain Myongho Yi 1 and Sam Gyun Oh 2* 1 School of Library and Information Studies, Texas Woman

More information

Rule-based Expert Systems

Rule-based Expert Systems Rule-based Expert Systems What is knowledge? is a theoretical or practical understanding of a subject or a domain. is also the sim of what is currently known, and apparently knowledge is power. Those who

More information

Universidade do Minho Escola de Engenharia

Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Computerized Adaptive Psychological Testing A Personalisation Perspective

Computerized Adaptive Psychological Testing A Personalisation Perspective Psychology and the internet: An European Perspective Computerized Adaptive Psychological Testing A Personalisation Perspective Mykola Pechenizkiy mpechen@cc.jyu.fi Introduction Mixed Model of IRT and ES

More information

B. How to write a research paper

B. How to write a research paper From: Nikolaus Correll. "Introduction to Autonomous Robots", ISBN 1493773070, CC-ND 3.0 B. How to write a research paper The final deliverable of a robotics class often is a write-up on a research project,

More information

A Comparison of Standard and Interval Association Rules

A Comparison of Standard and Interval Association Rules A Comparison of Standard and Association Rules Choh Man Teng cmteng@ai.uwf.edu Institute for Human and Machine Cognition University of West Florida 4 South Alcaniz Street, Pensacola FL 325, USA Abstract

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010

More information

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Optimizing to Arbitrary NLP Metrics using Ensemble Selection Optimizing to Arbitrary NLP Metrics using Ensemble Selection Art Munson, Claire Cardie, Rich Caruana Department of Computer Science Cornell University Ithaca, NY 14850 {mmunson, cardie, caruana}@cs.cornell.edu

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

BENCHMARK TREND COMPARISON REPORT:

BENCHMARK TREND COMPARISON REPORT: National Survey of Student Engagement (NSSE) BENCHMARK TREND COMPARISON REPORT: CARNEGIE PEER INSTITUTIONS, 2003-2011 PREPARED BY: ANGEL A. SANCHEZ, DIRECTOR KELLI PAYNE, ADMINISTRATIVE ANALYST/ SPECIALIST

More information

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ; EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10 Instructor: Kang G. Shin, 4605 CSE, 763-0391; kgshin@umich.edu Number of credit hours: 4 Class meeting time and room: Regular classes: MW 10:30am noon

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Using EEG to Improve Massive Open Online Courses Feedback Interaction Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

CHAPTER 4: REIMBURSEMENT STRATEGIES 24 CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

Answer Key For The California Mathematics Standards Grade 1

Answer Key For The California Mathematics Standards Grade 1 Introduction: Summary of Goals GRADE ONE By the end of grade one, students learn to understand and use the concept of ones and tens in the place value number system. Students add and subtract small numbers

More information

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Linking the Ohio State Assessments to NWEA MAP Growth Tests * Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices Article A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices Yerim Choi 1, Yu-Mi Jeon 2, Lin Wang 3, * and Kwanho Kim 2, * 1 Department of Industrial and Management

More information

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases II Entity-Relationship (ER) Model Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database design Information Requirements Requirements Engineering

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information