Random Under-Sampling Ensemble Methods for Highly Imbalanced Rare Disease Classification

Size: px
Start display at page:

Download "Random Under-Sampling Ensemble Methods for Highly Imbalanced Rare Disease Classification"

Transcription

1 54 Int'l Conf. Data Mining DMIN'16 Random Under-Sampling Ensemble Methods for Highly Imbalanced Rare Disease Classification Dong Dai, and Shaowen Hua Abstract Classification on imbalanced data presents lots of challenges to researchers. In healthcare settings, rare disease identification is one of the most difficult kinds of imbalanced classification. It is hard to correctly identify true positive rare disease patients out of much larger number of negative patients. The prediction using traditional models tends to bias towards much larger negative class. In order to gain better predictive accuracy, we select and test some modern imbalanced machine learning algorithms on an empirical rare disease dataset. The training data is constructed from the real world patient diagnosis and prescription data. In the end, we compare the performances from various algorithms. We find that the random under-sampling Random Forest algorithm has more than 40% improvement over traditional logistic model in this particular example. We also observe that not all bagging methods are out-performing than traditional methods. For example, the random under-sampling LASSO is inferior to benchmark in our reports. Researchers need to test and select appropriate methods accordingly in real world applications. Index Terms imbalanced, rare disease, random undersampling, random forest I. INTRODUCTION Rare diseases have low prevalence rates and they are difficult to diagnose and identify. The Rare Disease Act 2002 defines rare disease to be less than 200,000 patients, or 1 in 1,500 [1]. By this nature, the rare disease dataset is extremely imbalanced. To predict or classify rare diseases from a large population is a challenging task due to low signal-to-noise ratio. This problem is called imbalanced learning in data mining field [2]. Classic statistical methods or standard machine learning algorithms are biased toward larger classes. Therefore, it is hard to positively identify rare disease patients who are in the minority class. If not treating and measuring properly, most of rare disease patients will be mis-classified as the other classes albeit the overall accuracy rate may appear to be high. There are dedicated new algorithms and methods developed for imbalanced datasets classification. In this paper, we explore selecting and applying imbalanced machine learning algorithms to identify rare disease patients from real-world healthcare data. We will study the performance improvement from using imbalanced algorithms and compare the results to traditional methods. The empirical application we select is to identify Hereditary Angioedema (HAE) disease. HAE is a rare, genetic disease that causes episodic attacks of swelling Dong Dai is Senior Manager with Advanced Analytics, IMS Health, Plymouth Meeting, PA 19462, USA ( ddai@us.imshealth.com). Shaowen Hua is Assistant Professor with School of Business, La Salle University, Philadelphia, PA, USA ( hua@lasalle.edu). under the skin. The prevalence of HAE is between 1 in every 10,000 and 1 in every 50,000 [3], [4]. Because it is rare, physicians have rarely encountered patients with this condition. On top of this, HAE attacks also resemble other forms of angioedema. Both make HAE hard to diagnose. Late or missed diagnosis can lead to incorrect treatment and unnecessary surgical intervention. Our objective in this application is to improve HAE patient identification rate using patient historical prescription and diagnosis information. Given the extremely unbalanced classes in the dataset, it is interesting to see how existing algorithms perform in such conditions. We will test a commonly used approach in imbalanced learning - Random Under Sampling method for ensemble learning with LASSO or Random Forest as element learners. The remaining of the paper will be arranged in following ways. Section 2 describes the data source used in this empirical example. In section 3, we present the rules and methodologies in anonymous patient selection and variable construction. Next section introduces imbalanced learning algorithms. In the last two sections, the results are presented and discussed. II. DATA SOURCE DESCRIPTION To study the potential application of imbalance algorithms on rare disease identification, we select Hereditary Angioedema disease as an empirical use case. The data has been extracted from IMS longitudinal prescription (Rx) and diagnosis (Dx) medical claims data. The Rx data is derived from electronic data collected from pharmacies, payers, software providers and transactional clearinghouses. This information represents activities that take place during the prescription transaction and contains information regarding the product, provider, payer and geography. The Rx data is longitudinally linked back to an anonymous patient token and can be linked to events within the data set itself and across other patient data assets. Common attributes and metrics within the Rx data include payer, payer types, product information, age, gender, 3-digit zip as well as the scripts relevant information including date of service, refill number, quantity dispensed and day supply. Additionally, prescription information can be linked to office based claims data to obtain patient diagnosis information. The Rx data covers up to 88% for the retail channel, 48% for traditional mail order, and 40% for specialty mail order. The Dx data is electronic medical claims from office-based individual professionals, ambulatory, and general health care sites per year including patient level diagnosis and procedure

2 Int'l Conf. Data Mining DMIN'16 55 information. The information represents nearly 65% of all electronically filed medical claims in the US. All data is anonymous at the patient level and HIPAA compliant to protect patient privacy. III. METHODOLOGY Since HAE disease is difficult to identify, many patients with this condition don t have a positive diagnosis ICD- 9 code associate with them. Our objective was to build predictive models to find such patients using their past prescription records and other diagnosis histories. We extracted our training samples from Rx and Dx data sources described before. A. Sample selection HAE patients are selected as those with HAE diagnosis (ICD-9 CODE = 277.6) and at least one HAE treatment (prescription or procedure) during the selection period (1/1/2012-7/31/2015). The patient s index date is defined as the earliest date of HAE diagnosis or HAE treatment. Then the lookback period is defined as from earliest diagnosis or prescription date available from January 2010 till the day before index date. Using this rule, the selected HAE patients will have lookback periods with variable lengths. After further data cleaning by deleting the records without valid gender, age and region etc., we have 1233 HAE patients in the sample data. Non-HAE patients are selected by randomly matching 200 non-hae patients with similar lookback period for each one of the 1233 HAE patients. For example, if an HAE patient has a lookback period from 1/15/2013 to 2/5/2014, then a non-hae patient is qualified to match and to be selected if he/she had clinical activity (any activity of prescription, procedure or diagnosis) from any day in January 2013 to any day in February Then the lookback period for the non-hae patient will be defined as from earliest clinical activity date in January 2013 till latest clinical activity date in February This process of non-hae patients matching to HAE patients has been done in a greedy manner. Specifically, non-hae patient sample starts as an empty set, then 200 non-hae patients matched to a given HAE patient are added to the non-hae sample, and this approach is repeated for each HAE patient in the sample until we find 200 distinctive non-hae patients for each HAE patient. The above process generates 1233 HAE patients and (200 times 1233) non-hae patients in the final patient sample. B. Predictors for HAE Literatures about HAE has been reviewed, from which a list of numerous potential HAE predictors are prepared. Specifically, three classes of clinical indications are derived for the lookback period for each patient, including prescriptions, procedures and diagnosis. Each clinical indication yields three predictors for the lookback period: flag (Yes/No: whether the patient had prescriptions, procedures or TABLE I CONFUSION MATRIX Actual=1 Actual=0 Predicted=1 True Positive (TP) False Positive (FP) Predicted=0 False Negative (FN) True Negative (TN) TABLE II MODEL PERFORMANCE METRICS Metric AUC AUPR Recall Precision True Positive Rate (TPR) False Positive Rate (FPR) Accuracy Definition Area under the ROC curve Area under the PR curve TP/(TP+FN) TP/(TP+FP) TP/(TP+FN) FP/(FP+TN) (TP+TN)/(TP+FP+FN+TN) diseases), frequency (how many times the event of prescriptions, procedures or diseases occurred), average frequency (frequency divided by length of lookback period). These clinical predictors with demographic predictors (age, gender and region) compose the final predictors list. C. Model Performance Evaluation In a binary decision problem, a classifier labels data sample as either positive or negative. The decision made by the classifier can be represented in a structure known as a confusion matrix (Table I, 1 for positive class and 0 for negative class). The confusion matrix has four categories: True Positives (TP) are examples correctly labeled as positives; False Positives (FP) refer to negative examples incorrectly labeled as positive; True Negatives (TN) correspond to negatives correctly labeled as negative; finally, False Negatives (FN) refer to positive examples incorrectly labeled as negative. Based on the confusion matrix, we will be able to further define several metrics to evaluate model performance as listed in Table II. In ROC space, one plots the False Positive Rate (FPR) on the x-axis and the True Positive Rate (TPR) on the y-axis. The FPR measures the fraction of negative examples that are misclassified as positive. The TPR measures the fraction of positive examples that are correctly labeled. In PR space, one plots Recall on the x-axis and Precision on the y-axis. Recall is the same as TPR, whereas Precision measures that fraction of examples classified as positive that are truly positive. D. Imbalanced Classification Imbalanced data sets (IDS) correspond to domains where there are many more instances of some classes than others. Classification on IDS always causes problems because standard machine learning algorithms tend to be overwhelmed by the large classes and ignore the small ones. Most classifiers operate on data drawn from the same distribution as the

3 56 Int'l Conf. Data Mining DMIN'16 training data, and assume that maximizing accuracy is the principle goal [5], [6]. Therefore, many solutions have been proposed to deal with this problem, both for standard learning algorithms and for ensemble techniques. They can be categorized into three major groups [7], [8]: (i) Data sampling: In which the training data are modified in order to produce a more balanced data to allow classifiers to perform in a similar manner to standard classification [9], [10]; (ii) Algorithmic modification: This procedure is oriented towards the adaptation of base learning methods to be more attuned to class imbalance issues [11]; (iii) Cost-sensitive learning: This type of solutions incorporate approaches at the data level, at the algorithmic level, or at both levels combined, considering higher costs for the misclassification of examples of the positive class with respect to the negative class, and therefore, trying to minimize higher cost errors [12], [13]. In this paper we implement the Random-Under-Sampling (RUS) (majority) approach. We first randomly under-sample the majority class data and combine them with the minority class data to build an artificial balanced dataset, upon which machine learning algorithms will be applied. This process is repeated for several iterations, with each iteration generating a model. The final model is an aggregation of models over all iterations (See Algorithm 1, similar to bagging [14]). In this paper, we apply RUS with LASSO [15] and Random Forest [16] (hereby denoted as Bagging LASSO and Bagging RF respectively). Specifically, we firstly perform a random under sampling of the majority pool (non-hae) and combine it with all HAE patients to build a artificial balanced dataset, then LASSO or Random Forest is applied to the balanced sampled data; models are aggregated over iterations of random samples to learn the predictive pattern of HAE, while accounting for possible interactions among predictors. The conventional logistic regression (denoted by Logit hereafter) is implemented as benchmark for model performance comparison. The usual model performance metrics such as prediction accuracy and area under the ROC curve (AUC) are not appropriate for imbalanced classification problem. For example, the imbalance ratio is 1/200 (each one HAE patient has 200 matched non-hae patients) in our data, a classifier which tries to maximize the accuracy of its classification rule may obtain an accuracy of 99.5% by simply ignoring the HAE patients, with the classification of all patients as non-hae. Instead, we will use AUPR and Precision at various Recall levels for model performance comparisons in this paper. An important difference between ROC space and PR space is the visual representation of the curves. Looking at PR curves can expose differences between algorithms that are not apparent in ROC space [17]. For validation purpose, data is split into 80% for training and 20% for testing. Model performance for five-fold Cross-Validation on training data and further validation on testing data are both reported. The results and validation are presented in the next section. IV. RESULTS With the 80% training data, we have n = 986 HAE patients and their 200 times matched non-hae patients (n = 197, 200). Then we perform five-fold cross-validation with the training data, specifically, we split all the training data into five folds, and for each given fold, we train a model with the remaining four folds and calculate the performance metrics on the given fold, the final performance outputs are metrics averaged over the five folds. Summary of the results are listed in Table III. TABLE III MODEL PERFORMANCE (FIVE-FOLD CROSS-VALIDATION) Metric Logit Bagging LASSO Bagging RF AUC 79.81% 83.03% 82.52% AUPR 9.59% 9.21% 11.61% Precision (Recall=50%) 4.49% 5.84% 5.99% Precision (Recall=45%) 6.33% 7.23% 7.55% Precision (Recall=40%) 7.86% 8.85% 10.04% Precision (Recall=35%) 10.67% 10.10% 13.44% Precision (Recall=30%) 13.61% 11.74% 16.98% Precision (Recall=25%) 15.11% 13.81% 21.74% Precision (Recall=20%) 18.46% 15.27% 24.69% Precision (Recall=15%) 22.82% 18.95% 30.25% Precision (Recall=10%) 29.02% 27.90% 31.83% Precision (Recall=5%) 36.60% 34.02% 38.63% We can see that Bagging RF has an improvement of 21% in terms of AUPR compared to Logistic Regression. And particularly for Recall level at 25%, Logistic Regression has a Precision of 15.11% while that of Bagging RF is 21.74%, which is more than 40% improvement over that of Logistic Regression. And we can see PR curve (Figure 1) differentiates the algorithms better than ROC curve (Figure 2) (The dashed lines in two figures reflect the performance of a random classifier). We further validate the model performances by applying the models trained from all training data to testing data, which has n = 247 HAE patients and their 200 times matched non-hae patients (n =49, 400). A summary of the testing results are listed in Table IV. It further validates that Bagging RF outperforms Logistic Regression in this imbalanced classification problem. The Bagging LASSO is inferior to Logistic Regression in our case. The reason could be that the high dimensional features have much correlation and in this case less sparse model is preferred. In searching for better performance, we also test costsensitive learning methods such as weighted SVM and weighted LASSO on the same training data. Both of the results don t show improvement over the standard or randomunder-sampling ensemble methods. V. CONCLUSION AND DISCUSSION In this rare disease identification problem, we test undersampling and cost-sensitive learning algorithms on a real

4 Int'l Conf. Data Mining DMIN'16 57 PR curve of 5 fold Cross Validation ROC curve of 5 fold Cross Validation Average Precision Logit Bagging LASSO Bagging RF Random Average true positive rate Logit Bagging LASSO Bagging RF Random Recall False positive rate Fig. 1. Five-fold Cross-Validation PR curve Fig. 2. Five-fold Cross-Validation ROC curve TABLE IV MODEL PERFORMANCE (TESTING) Metric Logit Bagging LASSO Bagging RF AUC 82.77% 84.33% 85.12% AUPR 13.02% 11.37% 14.09% Precision (Recall=50%) 6.09% 5.63% 7.17% Precision (Recall=45%) 8.35% 8.35% 8.43% Precision (Recall=40%) 9.71% 9.02% 11.19% Precision (Recall=35%) 12.63% 10.36% 14.48% Precision (Recall=30%) 16.78% 12.85% 20.16% Precision (Recall=25%) 21.91% 16.76% 22.55% Precision (Recall=20%) 24.50% 20.00% 30.82% Precision (Recall=15%) 24.03% 20.67% 37.76% Precision (Recall=10%) 44.64% 37.31% 38.46% Precision (Recall=5%) 54.55% 63.16% 57.14% Algorithm 1 A general framework of Random-Under- Sampling (RUS) ensemble learning Let S be the original training set. for k =1, 2,...,K do Construct subset S k containing instances from all classes with same number by executing the following: Random sample instances with (or without) replacement at the rate of N c /N i where N c denote the desired sample size and N i denotes the original sample size of class i. Train a classifier ˆf k from subset ( S k. K Output final classifier ĝ = sign ˆf k=1 k ). end for world patient data case. The training sample contains only 0.5% positive patients and it is a good real world example to demonstrate imbalanced learning challenge. We find that in this empirical example, random-under-sampling random forest method can boost precision at various recall levels compared to standard models. In the reported table, it has 40% improvement in AUPR over the benchmark model. In real world application, due to the data high dimensionality and case complexity, it is not guaranteed that randomunder-sampling Random Forest always to be the recommended method. Researchers should try various methods and select most appropriate algorithm based on performance. APPENDIX ACKNOWLEDGMENT REFERENCES [1] Rare diseases act of 2002 public law th congress. [2] H. He and Y. Ma, Eds., Imbalanced learning: foundations, algorithms, and applications. John Wiley and Sons., [3] M. Cicardi and A. Agostoni, Hereditary angioedema. New England Journal of Medicine, vol. 334, no. 25, pp , [4] L. C. Zingale, L. Beltrami, A. Zanichelli, L. Maggioni, E. Pappalardo, B. Cicardi, and M. Cicardi, Angioedema without urticaria: a large clinical survey. Canadian Medical Association Journal, vol. 175, no. 9, pp , [5] N. V. Chawla, N. Japkowicz, and A. Kotcz, Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 1 6, [6] S. Visa and A. Ralescu, Issues in mining imbalanced data setsa review paper. in Proceedings of the sixteen midwest artificial intelligence and cognitive science conference, vol. 2005, April 2005, pp [7] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 42, no. 4, pp , [8] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, An insight into classification with imbalanced data: Empirical results and

5 58 Int'l Conf. Data Mining DMIN'16 current trends on using data intrinsic characteristics. Information Sciences, no. 250, pp [9] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, vol. 16, no. 1, pp , [10] G. E. Batista, R. C. Prati, and M. C. Monard, A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp , [11] B. Zadrozny and C. Elkan, Learning and making decisions when costs and probabilities are both unknown. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, August 2001, pp [12] P. Domingos, Metacost: A general method for making classifiers costsensitive. in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, August 1999, pp [13] B. Zadrozny, J. Langford, and N. Abe, Cost-sensitive learning by cost-proportionate example weighting. in Data Mining, ICDM Third IEEE International Conference on. IEEE, November 2003, pp [14] L. Breiman, Bagging predictors. Machine learning, vol. 24, no. 2, pp , [15] R. Tibshirani, Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society., Series B (Methodological), pp , [16] L. Breiman, Random forests. Machine learning, vol. 45, no. 1, pp. 5 32, [17] J. Davis and M. Goadrich, The relationship between precision-recall and roc curves. in Proceedings of the 23rd international conference on Machine learning. ACM, June 2006, pp

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Katarzyna Stapor (B) Institute of Computer Science, Silesian Technical University, Gliwice, Poland katarzyna.stapor@polsl.pl

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Linking the Ohio State Assessments to NWEA MAP Growth Tests * Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning Overview Motivation for Analyses Analyses and

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Mining Student Evolution Using Associative Classification and Clustering

Mining Student Evolution Using Associative Classification and Clustering Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models Michael A. Sao Pedro Worcester Polytechnic Institute 100 Institute Rd. Worcester, MA 01609

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information

Learning to Rank with Selection Bias in Personal Search

Learning to Rank with Selection Bias in Personal Search Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Handling Concept Drifts Using Dynamic Selection of Classifiers

Handling Concept Drifts Using Dynamic Selection of Classifiers Handling Concept Drifts Using Dynamic Selection of Classifiers Paulo R. Lisboa de Almeida, Luiz S. Oliveira, Alceu de Souza Britto Jr. and and Robert Sabourin Universidade Federal do Paraná, DInf, Curitiba,

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

An investigation of imitation learning algorithms for structured prediction

An investigation of imitation learning algorithms for structured prediction JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Assessing Functional Relations: The Utility of the Standard Celeration Chart

Assessing Functional Relations: The Utility of the Standard Celeration Chart Behavioral Development Bulletin 2015 American Psychological Association 2015, Vol. 20, No. 2, 163 167 1942-0722/15/$12.00 http://dx.doi.org/10.1037/h0101308 Assessing Functional Relations: The Utility

More information

Multi-label classification via multi-target regression on data streams

Multi-label classification via multi-target regression on data streams Mach Learn (2017) 106:745 770 DOI 10.1007/s10994-016-5613-5 Multi-label classification via multi-target regression on data streams Aljaž Osojnik 1,2 Panče Panov 1 Sašo Džeroski 1,2,3 Received: 26 April

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Universidade do Minho Escola de Engenharia

Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE Pierre Foy TIMSS Advanced 2015 orks User Guide for the International Database Pierre Foy Contributors: Victoria A.S. Centurino, Kerry E. Cotter,

More information

Activity Recognition from Accelerometer Data

Activity Recognition from Accelerometer Data Activity Recognition from Accelerometer Data Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman Department of Computer Science Rutgers University Piscataway, NJ 08854 {nravi,nikhild,preetham,mlittman}@cs.rutgers.edu

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010

More information

Deploying Agile Practices in Organizations: A Case Study

Deploying Agile Practices in Organizations: A Case Study Copyright: EuroSPI 2005, Will be presented at 9-11 November, Budapest, Hungary Deploying Agile Practices in Organizations: A Case Study Minna Pikkarainen 1, Outi Salo 1, and Jari Still 2 1 VTT Technical

More information

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis Julien Ah-Pine, Edmundo-Pavel Soriano-Morales To cite this version: Julien Ah-Pine, Edmundo-Pavel Soriano-Morales. A Study of

More information

Effective Instruction for Struggling Readers

Effective Instruction for Struggling Readers Section II Effective Instruction for Struggling Readers Chapter 5 Components of Effective Instruction After conducting assessments, Ms. Lopez should be aware of her students needs in the following areas:

More information

ASCD Recommendations for the Reauthorization of No Child Left Behind

ASCD Recommendations for the Reauthorization of No Child Left Behind ASCD Recommendations for the Reauthorization of No Child Left Behind The Association for Supervision and Curriculum Development (ASCD) represents 178,000 educators. Our membership is composed of teachers,

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Probability estimates in a scenario tree

Probability estimates in a scenario tree 101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.

More information

What is related to student retention in STEM for STEM majors? Abstract:

What is related to student retention in STEM for STEM majors? Abstract: What is related to student retention in STEM for STEM majors? Abstract: The purpose of this study was look at the impact of English and math courses and grades on retention in the STEM major after one

More information

Speech Recognition by Indexing and Sequencing

Speech Recognition by Indexing and Sequencing International Journal of Computer Information Systems and Industrial Management Applications. ISSN 215-7988 Volume 4 (212) pp. 358 365 c MIR Labs, www.mirlabs.net/ijcisim/index.html Speech Recognition

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

Cross-lingual Short-Text Document Classification for Facebook Comments

Cross-lingual Short-Text Document Classification for Facebook Comments 2014 International Conference on Future Internet of Things and Cloud Cross-lingual Short-Text Document Classification for Facebook Comments Mosab Faqeeh, Nawaf Abdulla, Mahmoud Al-Ayyoub, Yaser Jararweh

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Viviana Molano 1, Carlos Cobos 1, Martha Mendoza 1, Enrique Herrera-Viedma 2, and

More information