Investigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data

Size: px
Start display at page:

Download "Investigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data"

Transcription

1 Investigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data Tadeusz Lasota 1, Tomasz Łuczak 2, Michał Niemczyk 2, Michał Olszewski 2, Bogdan Trawiński 2 1 Wrocław University of Environmental and Life Sciences, Dept. of Spatial Management ul. Norwida 25/27, Wrocław, Poland 2 Wrocław University of Technology, Institute of Informatics, Wybrzeże Wyspiańskiego 27, Wrocław, Poland tadeusz.lasota@up.wroc.pl, {tomasz.luczak, bogdan.trawinski}@pwr.wroc.pl, {michal.niemczyk, michal.olszewski}@student.pwr.wroc.pl Abstract. The ensemble machine learning methods incorporating bagging, random subspace, random forest, and rotation forest employing decision trees, i.e. Pruned Model Trees, as base learning algorithms were developed in WEKA environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The accuracy of ensembles generated by the methods was compared for several levels of noise injected into an attribute, output, and both attribute and output. Ensembles built using rotation forest outperformed other models. In turn, random subspace method resulted in the models that were the most resistant to noised data. Keywords: pruned model trees, bagging, random subspaces, random forest, rotation forest, cross-validation, property valuation, noised data 1 Introduction The issue of dealing with noisy data is one of key aspects in supervised machine learning to create reliable data-driven models. Noisy data may strongly affect the accuracy of resulting data models and can result in decreasing system performance in terms of predictive accuracy, processing efficiency and the size of the learner. Several works on the impact of noise, mainly in the context of classification problems and class noise, have been published. In [1] increasing the size of training set by adding noise to the training objects was explored for different amount and directions of noise injection. It was shown theoretically and empirically that the k-nearest neighbors directed noise injection was preferable over the Gaussian spherical noise injection when using multilayer perceptrons. In [2] noise was injected to both input attributes and output classes. The results varied depending on the noise type and the specific data set being processed. Naïve Bayes turned out to be the most robust algorithm, and SMO (support vector machine) the least. In [3] it was observed that the attribute noise was less harmful in comparison with class noise. Moreover, the higher the correlation between an attribute and the class, the more negative impact the attribute noise may

2 have. The authors recommend to handle noisy instances before a learner is generated. In [4] two different class noise types were applied to training sets. Fuzzy Rule Based Classification Systems revealed a good tolerance to class noise in comparison to the C4.5 crisp algorithm which is considered resistant to noise. In [5] the performance of several ensemble models learned from imbalanced and noisy binary-class data was compared. As the result clear preference of bagging over boosting was shown. We have studied recently the impact of noised data on the performance of ensemble models for regression problem [6]. We injected noise to output values and showed that random subspace and random forest techniques, where the diversity of component models is achieved by manipulating of features, were more resistant to noise than classic resampling techniques such as bagging, repeated holdout, and repeated cross-validation. For a few years we have been investigating techniques for developing an intelligent system to assist with of real estate appraisal devoted to a broad spectrum of users interested in the premises management. The outline of the system to be exploited on a cloud computing platform is presented in Fig. 1. Public registers and cadastral systems create a complex data source for the intelligent system of real estate market. The core of the system are valuation models including models constructed according to the professional standards as well as data-driven models generated using machine learning algorithms. So far, we have investigated several methods to construct ensembles of regression models to be incorporated into the system including various resampling techniques, random subspaces, random forests, and rotation forests. As base learning algorithms weak learners as evolutionary fuzzy systems, neural networks, and decision trees were employed [7], [8], [9], [10], [11], [12], [13]. Fig. 1. Outline of the intelligent system of real estate market The first goal of the investigation presented in this paper is to compare empirically ensemble machine learning methods incorporating bagging, random subspace, random forest, and rotation forest employing decision trees as base learners. Bagging, which stands for bootstrap aggregating, devised by Breiman [14] is one of the most intuitive and simplest ensemble algorithms providing good performance. Another approach to ensemble learning is called the random subspaces, also known as attribute bagging. This approach seeks learners diversity in feature space subsampling [15]. The method called random forest merges these two approaches was worked out by

3 Breiman [16]. Random forest uses bootstrap selection for supplying individual learner with training data and limits feature space by random selection. Rodríguez et al. [17] proposed in 2006 a new classifier ensemble method, called rotation forest, applying Principal Component Analysis (PCA) to rotate the original feature axes in order to obtain different training sets for learning base classifiers. The second goal is to examine the performance of the ensemble methods dealing with noisy data. The noise was artificially injected into an attribute, output, and both attribute and output. The susceptibility to noised data can be an important criterion for the selection of appropriate machine learning methods to our automated valuation system. We do not konw the purpose of property valuation. For example, the prices estimated to secure loans may differ substantially from the prices appraised to calculate taxes. We do not what sort of properties and their locations were in vogue at the moment of the sale. Moreover, the market instability and uncertainty cause the investors to take irrational sales/purchase decisons. Hence, we may assume that the historical data, we use to create real estate valuation models, contain much noise. 2 Methods Used and Experimental Setup We conducted a series of experiments to compare bagging (Bag), random subspace (RaS), random forest (RaF), and rotation forest (RtF) models with and single models (Sgl) in respect of its predictive accuracy using cadastral data on sales/purchase transactions of residential premises. All tests were accomplished using WEKA (Waikato Environment for Knowledge Analysis), a non-commercial and open source data mining system [18]. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. The decision tree WEKA algorithm, very often used for building and exploring ensemble models, namely Pruned Model Tree (M5P), was employed to carry out the experiments. M5P implements routines for generating M5 model trees The algorithm is based on decision trees, however, instead of having values at tree's nodes, it contains a multivariate linear regression model at each node. The input space is divided into cells using training data and their outcomes, then a regression model is built in each cell as a leaf of the tree. Real-world dataset used in experiments was drawn from an unrefined dataset containing above records referring to residential premises transactions accomplished in one Polish big city with the population of within 14 years from 1998 to The final dataset counted the 9795 samples. Four following attributes were pointed out as main price drivers by professional appraisers: usable area of a flat (Area), age of a building construction (Age), number of storeys in the building (Storeys), the distance of the building from the city centre (Centre), in turn, price of premises (Price) was the output variable. For random subspace, random forest, and rotation forest approaches four more features were employed: number of rooms in the flat including a kitchen (Rooms), geodetic coordinates of a building (Xc and Yc), and its distance from the nearest shopping center (Shopping). Due to the fact that the prices of premises change substantially in the course of time, the whole 14-year dataset cannot be used to create data-driven models using

4 machine learning. Therefore it was split into subsets covering individual years, and we might assume that within one year the prices of premises with similar attributes were roughly comparable. Starting from the beginning of 1998 the prices were updated for the last day of subsequent years using the trends modelled by polynomials of degree four. We might assume that one-year datasets differed from each-other and might constitute different observation points to compare the accuracy of ensemble models in our study and carry out statistical tests. The sizes of 14 one-year datasets are given in Table 1. Table 1. Number of instances in one-year datasets Following methods were applied in the experiments: Sgl M5P algorithm with the number of features equal to 4. In this case single models were built, therefore there was only one iteration of the algorithm. Bag Bagging with M5P algorithm, size of each bag was set to 100% of the training set, number of bagging iterations was set to 50. RaS Random subspace with M5P algorithm, size of each subspace was set to 75% of all attributes, number of random subspace iterations was set to 50. RaF Random forest Bagging with M5P as the Filtered Classifier and Random Subset as a filter. Size of each bag was set to 100% of the training set, number of bagging iterations was set to 50. Number of attributes in random subset was set to 75% of all attributes. RtF Rotation forest with M5P algorithm set as a classifier. Maximum and minimum number of groups was set to 4. The percentage of instances to be removed was set to 20%. As a projection filter the Principial Components with default parameters were used. Number of rotation forest iterations was set to 50. Fig. 2. Outline of experiment with random forest method within 10cv frame. The procedure is repeated 10 times according to 10cv schema. For each method 10-fold cross validation repeated ten times was used as a result generator. Schema of an experiment using RaF within WEKA 10cv frame is shown in

5 Fig. 2. As a performance function the root mean square error (RMSE) was used, and as aggregation functions of ensembles arithmetic mean was employed. During our research we analyzed the impact of data noise on the performance of the described ensemble methods. During the first run of experiment no value was changed. Next, we replaced 1%, 5%, 10%, 20%, 30%, 40%, 50% randomly selected input values (Area) in training and in testing set with noised values. Then, we did the same processing with the output value (Price). Finally, we replaced in the same way both input values (Area) and output values (Price) simultaneously. The noised values were generated randomly from range [Q1-1.5 x IQR, Q3+1.5 x IQR], where Q1 and Q3 denote value of first and third quartile, and IQR stands for the interquartile range. This assured that the numbers replacing the original values were not outliers. The schemata illustrating three modes of noise injection are given in Figures 3, 4, and 5. Fig. 3. Schema illustrating injection of noise into input variable (A) Fig. 4. Schema illustrating injection of noise into output variable (O) Fig. 5. Schema illustrating injection of noise into both an input and output variables (AO) 3 Results of Experiments The accuracy of Sgl, Bag, RaS, RaF, and RtF models created using M5P for nonnoised data, data with 10% injected noise into the attribute Area (A), the output Price (O), and both attribute and output (AO) is shown in Figures 6-9, respectively. In the charts it is clearly seen that RtF ensembles reveal the best performance, whereas the

6 biggest values of RMSE provide the Sgl and RaF models. Moreover, noise injected into the output results in higher error rate than noise introduced into the attribute. The Friedman tests performed in respect of RMSE values of all models built over 14 one-year datasets showed that there are significant differences among models for each noise injection mode considered. Average rank positions, determined by Friedman test, of single and ensemble models for different levels of injected noise into the attribute (A), the output (O), and both attribute and output (AO) are shown in Tables 2, 3, and 4, respectively. In all tables the lower rank value the better model. In each table the RtF models are in the first place and Sgl and RaF ones occupy the last positions. The further Wilcoxon paired tests indicated that there were no statistically significant differences between RtF and Bag models for (O), and between RtF and RaS models for (AO), as well as between RaF and Sgl models for all noise injection modes. Fig. 6. Performance of single and ensemble models for non-noised data Fig. 7. Performance of single and ensemble models for 10% noise injected into attribute (A)

7 Fig. 8. Performance of single and ensemble models for 10% noise injected into output (O) Fig. 9. Performance of models for 10% noise injected into both attribute and output (AO) Table 2. Average rank positions of single and ensemble models for different levels of injected noise into attribute (A) determined during Friedman test Noise/Rank 1st 2nd 3rd 4th 5th 0% RtF (1.29) Bag (2.36) RaS (3.36) Sgl (3.79) RaF (4.21)..5% RtF (1.14) RaS (2.07) Bag (3.00) RaF (4.36) Sgl (4.43) 10% RtF (1.07) RaS (1.93) Bag (3.21) RaF (4.14) Sgl (4.64) 20% RtF (1.21) RaS (1.79) Bag (3.29) RaF (4.00) Sgl (4.71) 30% RtF (1.14) RaS (1.86) Bag (3.36) RaF (3.79) Sgl (4.86) 40% RtF (1.14) RaS (1.86) Bag (3.29) RaF (3.86) Sgl (4.86) 50% RtF (1.00) RaS (2.00) Bag (3.29) RaF (3.93) Sgl (4.79) Table 3. Average rank positions of single and ensemble models for different levels of injected noise into output (O) determined during Friedman test Noise/Rank 1st 2nd 3rd 4th 5th 0% RtF (1.29) Bag (2.36) RaS (3.36) Sgl (3.79) RaF (4.21)..5% RtF (1.43) Bag (2.21) RaS (3.43) Sgl (3.79) RaF (4.14) 10% RtF (1.43) Bag (2.00) RaS (3.50) Sgl (3.50) RaF (4.57) 20% RtF (1.57) Bag (2.43) Sgl (3.36) RaS (3.43) RaF (4.21) 30% RtF (1.79) Bag (2.36) Sgl (3.21) RaS (3.50) RaF (4.14) 40% RtF (2.00) Bag (2.07) Sgl (3.14) RaS (3.50) RaF (4.29) 50% RtF (2.00) Bag (2.14) Sgl (3.07) RaS (3.64) RaF (4.14)

8 Table 4. Average rank positions of single and ensemble models for different levels of injected noise into both attribute and output (AO) determined during Friedman test Noise/Rank 1st 2nd 3rd 4th 5th 0% RtF (1.29) Bag (2.36) RaS (3.36) Sgl (3.79) RaF (4.21)..5% RtF (1.07) RaS (2.50) Bag (2.86) RaF (4.14) Sgl (4.43) 10% RtF (1.29) RaS (1.86) Bag (3.07) RaF (4.29) Sgl (4.50) 20% RtF (1.36) RaS (1.64) Bag (3.57) RaF (3.86) Sgl (4.57) 30% RtF (1.14) RaS (1.86) Bag (3.43) RaF (4.07) Sgl (4.50) 40% RtF (1.36) RaS (1.64) Bag (3.43) RaF (3.79) Sgl (4.79) 50% RtF (1.14) RaS (1.86) Bag (3.21) RaF (3.86) Sgl (4.93) Table 5. Median of percentage loss of performance for data with noise vs non-noised data for different levels of injected noise into attribute (A) Noise Sgl Bag RaS RaF RtF 1% 4.7% 4.7% 3.6% 3.9% 4.3% 5% 13.8% 14.2% 8.3% 10.3% 10.9% 10% 22.4% 22.9% 12.7% 18.1% 16.0% 20% 35.8% 37.0% 17.7% 30.5% 22.1% 30% 44.0% 44.8% 22.6% 39.1% 27.3% 40% 54.4% 52.4% 27.1% 46.8% 29.0% 50% 54.8% 57.9% 28.4% 49.5% 33.5% Table 6. Median of percentage loss of performance for data with noise vs non-noised data for different levels of injected noise into output (O) Noise Sgl Bag RaS RaF RtF 1% 2.6% 2.7% 2.8% 2.8% 2.9% 5% 16.4% 15.6% 14.8% 13.8% 16.4% 10% 23.6% 25.4% 28.6% 26.4% 32.5% 20% 45.3% 45.7% 41.4% 41.8% 46.7% 30% 61.8% 64.2% 61.4% 58.3% 67.1% 40% 72.9% 75.7% 72.1% 68.3% 80.3% 50% 84.6% 86.2% 82.8% 79.4% 88.9% Table 7. Median of percentage loss of performance for data with noise vs non-noised data for different levels of injected noise into both attribute and output (AO) Noise Sgl Bag RaS RaF RtF 1% 7.2% 6.0% 5.9% 5.9% 6.4% 5% 22.9% 22.7% 21.3% 21.6% 23.4% 10% 41.1% 41.9% 37.4% 39.8% 40.1% 20% 64.7% 68.1% 55.9% 60.4% 62.1% 30% 79.6% 80.1% 68.1% 71.9% 75.6% 40% 90.6% 91.5% 80.1% 84.5% 86.2% 50% 93.7% 90.7% 82.5% 89.9% 90.5% As for the susceptibility to noise of individual ensemble methods the general outcome is as follows. Injecting subsequent levels of noise results in worse and worse accuracy. Percentage loss of performance for data with 1%, 5%, 10%, 20%, 30%, 40%, and 50% noise versus non-noised data was computed for each one-year dataset. The aggregate results in terms of median over all datasets are presented in Tables 5, 6, and 7. The amount of loss is different for individual datasets and it increases with the growth of percentage of noise. The most important observation is that in each case the

9 average loss of accuracy for RaS is lower than for the other models. We obtained similar results in our previous research into susceptibility to noise of ensemble models built with genetic fuzzy systems as basic learning methods [6]. 4 Conclusions and Future Work A series of experiments aimed to compare ensemble machine learning methods encompassing bagging, random subspace, random forest, and rotation forest was conducted. The ensemble models were created using decision tree algorithm over real-world data taken from a cadastral system. Moreover, the susceptibility to noise of these ensemble methods was examined. The noise was injected into an attribute, output, and both attribute and output by replacing the original values by the numbers randomly drawn from the range of values excluding outliers. The overall results of our investigation were as follows. Ensembles built using rotation forest outperform any other models. On the other hand, single models and ensembles created with random forests revealed the worst performance. In turn, random subspace method resulted in the models the most resistant to noised data. We intend to continue our research into resilience to noise of regression algorithms employing other machine learning techniques such as neural networks and support vector regression. We also plan to noise data using different probability distributions. Acknowledgments. This paper was partially supported by the Polish National Science Centre under grant no. N N References 1. Skurichina, M., Raudys, S., Duin, R.P.W.: K-Nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training. IEEE Transactions on Neural Networks 11(2), (2000) 2. Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33(4), (2010) 3. Zhu, X., Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study of Their Impacts. Artificial Intelligence Review 22, (2004). 4. Sáez, J.A., Luengo, J., Herrera, F.: Fuzzy Rule Based Classification Systems versus Crisp Robust Learners Trained in Presence of Class Noise's Effects: a Case of Study. 11th International Conference on Intelligent Systems Design and Applications (ISDA2011), Córdoba, Spain, pp (2011) 5. Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data With Noisy and Imbalanced Data. IEEE Transactions on System, Man, and Cybernetics Part A: Systems and Humans 41:3, (2011) 6. Lasota, T., Telec, Z., Trawiński, B., Trawiński G.: Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise. In M. Graña et al. (Eds.): KES 2012, LNAI 7828, pp , Springer, Heidelberg (2013)

10 7. Graczyk, M., Lasota, T., Trawiński, B., Trawiński, K.: Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal. In N.T. Nguyen, M.T. Le, J. Świątek (Eds.): ACIIDS 2010, LNAI 5991, pp Springer, Heidelberg (2010) 8. Kempa, O., Lasota, T., Telec, Z., Trawiński, B.: Investigation of bagging ensembles of genetic neural networks and fuzzy systems for real estate appraisal. N.T. Nguyen, C.-G. Kim, A. Janiak (Eds.): ACIIDS 2011, LNAI 6592, pp , Springer, Heidelberg (2011) 9. Lasota, T., Telec, Z., Trawiński, G., Trawiński B.: Empirical Comparison of Resampling Methods Using Genetic Fuzzy Systems for a Regression Problem. In H. Yin et al. (Eds.): IDEAL 2011, LNCS 6936, pp , Springer, Heidelberg (2011) 10. Lasota, T., Telec, Z., Trawiński, G., Trawiński B.: Empirical Comparison of Resampling Methods Using Genetic Neural Networks for a Regression Problem In E. Corchado et al. (Eds.): HAIS 2011, LNAI 6679, pp , Springer, Heidelberg (2011) 11. Lasota, T., Łuczak, T., Trawiński B.: Investigation of Random Subspace and Random Forest Methods Applied to Property Valuation Data. In P. Jędrzejowicz et al. (Eds.): ICCCI 2011, Part I, LNCS 6922, pp , Springer, Heidelberg (2011) 12. Lasota, T., Telec, Z., Trawiński, B., Trawiński G.: Investigation of Rotation Forest Ensemble Method Using Genetic Fuzzy Systems for a Regression Problem. J.-S. Pan, S.- M. Chen, N.T. Nguyen (Eds.): ACIIDS 2012, LNAI 7196, pp , Springer, Heidelberg (2012) 13. Lasota, T., Łuczak, T., Trawiński B.: Investigation of Rotation Forest Method Applied to Property Price Prediction. In L. Rutkowski et al. (Eds.): ICAISC 2012, Part I, LNCS 7267, pp , Springer, Heidelberg (2012) 14. Breiman, L.: Bagging Predictors. Machine Learning 24(2), (1996) 15. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), (1998) 16. Breiman, L.: Random Forests. Machine Learning 45(1), (2001) 17. Rodrígeuz, J.J., Kuncheva, I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), (2006) 18. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann, San Francisco (2011)

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Time series prediction

Time series prediction Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing

More information

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Ajith Abraham School of Business Systems, Monash University, Clayton, Victoria 3800, Australia. Email: ajith.abraham@ieee.org

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Multi-label classification via multi-target regression on data streams

Multi-label classification via multi-target regression on data streams Mach Learn (2017) 106:745 770 DOI 10.1007/s10994-016-5613-5 Multi-label classification via multi-target regression on data streams Aljaž Osojnik 1,2 Panče Panov 1 Sašo Džeroski 1,2,3 Received: 26 April

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Universidade do Minho Escola de Engenharia

Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially

More information

Multivariate k-nearest Neighbor Regression for Time Series data -

Multivariate k-nearest Neighbor Regression for Time Series data - Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Data Fusion Through Statistical Matching

Data Fusion Through Statistical Matching A research and education initiative at the MIT Sloan School of Management Data Fusion Through Statistical Matching Paper 185 Peter Van Der Puttan Joost N. Kok Amar Gupta January 2002 For more information,

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS R.Barco 1, R.Guerrero 2, G.Hylander 2, L.Nielsen 3, M.Partanen 2, S.Patel 4 1 Dpt. Ingeniería de Comunicaciones. Universidad de Málaga.

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Activity Recognition from Accelerometer Data

Activity Recognition from Accelerometer Data Activity Recognition from Accelerometer Data Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman Department of Computer Science Rutgers University Piscataway, NJ 08854 {nravi,nikhild,preetham,mlittman}@cs.rutgers.edu

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Handling Concept Drifts Using Dynamic Selection of Classifiers

Handling Concept Drifts Using Dynamic Selection of Classifiers Handling Concept Drifts Using Dynamic Selection of Classifiers Paulo R. Lisboa de Almeida, Luiz S. Oliveira, Alceu de Souza Britto Jr. and and Robert Sabourin Universidade Federal do Paraná, DInf, Curitiba,

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Proceedings of the Federated Conference on Computer Science DOI: /2016F560 and Information Systems pp ACSIS, Vol. 8.

Proceedings of the Federated Conference on Computer Science DOI: /2016F560 and Information Systems pp ACSIS, Vol. 8. Proceedings of the Federated Conference on Computer Science DOI: 10.15439/2016F560 and Information Systems pp. 205 211 ACSIS, Vol. 8. ISSN 2300-5963 Predicting Dangerous Seismic Events: AAIA 16 Data Mining

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

GACE Computer Science Assessment Test at a Glance

GACE Computer Science Assessment Test at a Glance GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Katarzyna Stapor (B) Institute of Computer Science, Silesian Technical University, Gliwice, Poland katarzyna.stapor@polsl.pl

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

Using Genetic Algorithms and Decision Trees for a posteriori Analysis and Evaluation of Tutoring Practices based on Student Failure Models

Using Genetic Algorithms and Decision Trees for a posteriori Analysis and Evaluation of Tutoring Practices based on Student Failure Models Using Genetic Algorithms and Decision Trees for a posteriori Analysis and Evaluation of Tutoring Practices based on Student Failure Models Dimitris Kalles and Christos Pierrakeas Hellenic Open University,

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Content-based Image Retrieval Using Image Regions as Query Examples

Content-based Image Retrieval Using Image Regions as Query Examples Content-based Image Retrieval Using Image Regions as Query Examples D. N. F. Awang Iskandar James A. Thom S. M. M. Tahaghoghi School of Computer Science and Information Technology, RMIT University Melbourne,

More information

Multi-label Classification via Multi-target Regression on Data Streams

Multi-label Classification via Multi-target Regression on Data Streams Multi-label Classification via Multi-target Regression on Data Streams Aljaž Osojnik 1,2, Panče Panov 1, and Sašo Džeroski 1,2,3 1 Jožef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia 2 Jožef Stefan

More information

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010

More information

Shockwheat. Statistics 1, Activity 1

Shockwheat. Statistics 1, Activity 1 Statistics 1, Activity 1 Shockwheat Students require real experiences with situations involving data and with situations involving chance. They will best learn about these concepts on an intuitive or informal

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Using EEG to Improve Massive Open Online Courses Feedback Interaction Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie

More information

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison

More information

Cooperative evolutive concept learning: an empirical study

Cooperative evolutive concept learning: an empirical study Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Detecting Student Emotions in Computer-Enabled Classrooms

Detecting Student Emotions in Computer-Enabled Classrooms Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Detecting Student Emotions in Computer-Enabled Classrooms Nigel Bosch, Sidney K. D Mello University

More information

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Instructor: Mario D. Garrett, Ph.D.   Phone: Office: Hepner Hall (HH) 100 San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

A survey of multi-view machine learning

A survey of multi-view machine learning Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE Mingon Kang, PhD Computer Science, Kennesaw State University Self Introduction Mingon Kang, PhD Homepage: http://ksuweb.kennesaw.edu/~mkang9

More information

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

A NEW ALGORITHM FOR GENERATION OF DECISION TREES TASK QUARTERLY 8 No 2(2004), 1001 1005 A NEW ALGORITHM FOR GENERATION OF DECISION TREES JERZYW.GRZYMAŁA-BUSSE 1,2,ZDZISŁAWS.HIPPE 2, MAKSYMILIANKNAP 2 ANDTERESAMROCZEK 2 1 DepartmentofElectricalEngineeringandComputerScience,

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Guru: A Computer Tutor that Models Expert Human Tutors

Guru: A Computer Tutor that Models Expert Human Tutors Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information