Student Performance Prediction and Risk Analysis by Using Data Mining Approach
|
|
- Scott Blake
- 5 years ago
- Views:
Transcription
1 Student Performance Prediction and Risk Analysis by Using Data Mining Approach Bilal Mehboob, Rao Muzamal Liaqat, Nazar Abbas CEME, NUST Pakistan ABSTRACT: Today we are surrounding with large data related to student performance (class participation, attendance, pre student history, quiz result, subject dependency, student CGPA till to final semester). In this paper we will evaluate the reason of student failure basis on the previous data, predict the risk of failure for next course so that students may be mentally prepare for offered course as well dependency level of the course. In engineering it is common practice if a student doesn t knows about the basic course he/she can t perform well in advance courses of same scopes. In this paper we will back trace the failure cause with the help of decision tree. This work wills also help out to estimate the risk in early phase, which can help the teachers to design an effective planning for the students who are at risk. We have used the six algorithms for prediction and risk analysis ID3 gives the best results as compared to other. In this paper we have used the data set of CEME, NUST. Our dataset consists of 450 records extracted from five degrees (DE_29, DE_30, DE_31, DE_32, and DE_33). Keywords: Data Mining, Risk Analysis, ID3, Performance Prediction Received: 10 January 2017, Revised 19 February 2017, Accepted 1 March DLINE. All Rights Reserved 1. Introduction Education plays an important role for the development of a country, especially for underdeveloped countries like Pakistan. However it is important to find out the failure reason of students to improve educational growth as well as gap wholes in this domain. Traditionally teachers predict the student performance on the basis of their experience; they had well understood the student nature and temperament. In this paper we will use data mining approach to predict the student performance as well identification of risk failure. Data mining techniques and methods can be applied in various fields such as marketing, sales, trades, business, real states, web engineering etc. Use of data mining in education domain is rapidly increasing. Data mining is also related with knowledge discovery, in which we extract useful information or pattern from the data. We have different tools Journal of Intelligent Computing Volume 8 Number 2 June
2 that are used in data mining such as rapid miner, weka, spsss etc. In data mining two important concepts unsupervised learning and supervised learning is used respectively. In unsupervised learning we use clustering mechanism to extract the useful information from the data. Famous algorithm for unsupervised learning are K-NN, K-Mean, DBSCAN, SVM (Support Vector Machine) and Expectation Maximization Clustering (EMC). Our main focus in this paper is on supervised learning in we will use the decision tree, naïve Bayes, ID3 and random forest algorithms for student performance prediction and risk analysis. Data mining concept in education is known as EDM (Educational Data Mining), it was defined by International society of educational data mining [1], and this society also deals with different types of data that come from educational domain. In this paper data mining approach is used to predict the student early performance and risk level. We have used the logistic regression, naïve Bayes, ID3 and CART algorithm to predict the student performance and reason of their failure. In this paper we select the CART algorithm on the basis of accuracy, AUC, sensitivity and Error Rate. We have performed the comparative analysis of these algorithms to evaluate our work. Rest of the paper is divided into 5 sections. Section 2 describes the related work in this domain. Proposed methodology is described in Section 3. Experimental result and evaluation is described in Section 4. Comparative analysis of different algorithm is carried out in Section 5. Conclusion and future work is discussed in Section Related Work Data mining has emerged as an active research in educational domain because we can expose lot of things by using data mining approach. In this way we can analyze the students performance, faculty performance, difficulty level among different subjects as well as the failure reason among students. According to survey thousands of students are dropped out due to their poor performances in academics [15]. We can extract the strength and weakness of students by using data mining approach [2]. Data mining has been used to extract the hidden pattern and useful information from the data [3]. Remero and Ventura give us an exhaustive overview of using data mining approach by different researcher from 1995 to 2005 for educational data mining [4]. Rayan Baker has described the state of data mining in educational data [5]. Traditional statistical method have been used to calculate the student performance, these methods doesn t give satisfactory results in student performance prediction based on previous data [6]. Researcher has used the tree based structure to obtain the useful information from data, in tree like structure we have root nodes and leaf nodes information between these nodes is depicted in the form of layers [5]. Clustering algorithm has been used for performance prediction [7]. Dr. S. Hari and J James Manoharan has used the K-Mean clustering to predict the student performance, they have divided the data into different sets of clusters [7]. Dr. V.Shirividya and Keerthana have used both classification and clustering techniques to predict the student performance [8]. Proposed Methodology Figure 1. Proposed Methodology D A Carnegie and C Watterson has used the high school record to predict the students GPA in first year of engineering [9]. Dekker and M Pechenizkiy have used the data mining approach to find out the dropout information and student performance [10]. H. Bydzovska and Bayer give the concept to use the student data along with social data to predict the student performance more precisely [11]. Dr. Sangeeta and T Mishra has used the has used the Random Tree and J48 algorithms to predict the student 50 Journal of Intelligent Computing Volume 8 Number 2 June 2017
3 performance [12]. Lopez and C Romero introduce the concept of meta classifier for clustering and used the EM (Expectation Maximization) algorithm to predict the academic performance of students [13]. Pallamreddy et al. have used (DT) Decision Tree algorithm on dataset, this algorithms give the tree like model that helps to understand the decisions and consequences based on the nature of data [14]. 3. Data Selection Now we will explain the each step in the proposed methodology one by one. In this paper we have used the data set provided by College of Electrical and Mechanical Engineering CEME, NUST. Data set consist of 450 records extracted from degree DE_29, DE_30 DE_33.We have selected these degrees due to complete access of profile data as well as individual academic records. 4. Preprocessing & Cleaning Figure 2. Filtering Mechanism Figure 3. Data Formatting In this step we covert raw data into machine understandable data by applying some preprocess steps. We have to covert the data according to nature of algorithms e.g. to run ID3 and CHAID we convert the data into polynomial form. To run DT we have to assign the label attribute. In preprocessing we exclude the missing value to make the data compatible with algorithms. We have applied the filtering mechanism to remove the missing values in the data, filters also give us facility to extract the data Journal of Intelligent Computing Volume 8 Number 2 June
4 of our interest by using the built in operators in filters. If a column reveals low information according to label attributes then we can exclude it from data. We have used the rapid miner studio 7 to perform preprocessing steps. Filtering mechanism and data formatting are represented by in figure2 and 3 respectively. 5. Feature Selection and Extraction It is an important step in which we select the most important attributes that have the direct effect on label attribute. There are many methods that are used for feature selection and extraction. We use the entropy, IG (information Gain), reducts and core to extract the useful features. In this paper we will use the correlation matrix to find out the most important attribute. In correlation matric each attribute is assigned a numeric value known as weight. Minimum value of assigned weight is 0 and maximum value is 1. Higher the value of weight reveals the importance of attribute or discriminability level of that attributes. On the basis of assigned weights we can apply the threshold value to reduce the attributes. Weight assigned to different attributes by using correlation attributes are shown by table 1. Figure 4. Correlation Matrix Attribute Weight Attribute Weight Category 1 Database Engineering Mathematics_ OOP.172 Mathematics_ Algorithm and computing.329 Mathematics_ Data Structure.298 Mathematics_ PL&E.247 Mathematics_ Computer Networks.156 Engineering Mechanics Mobile networks.223 Pakistan Studies Network Analysis.226 Digital system Design Digital communication.064 Electronic Circuit Computer Aided drawing.546 Control System Design project.390 AI Metric/O level.867 Engineering Economics FSC/A level.886 Software Engineering Games/Activities Data Mining Table 1. Weights by Using Correlation 52 Journal of Intelligent Computing Volume 8 Number 2 June 2017
5 Now we will apply the DT (decision tree), Random forest, Random tree, ID3, CHAID to extract the useful information from the data. To get the better understanding from data we have mapped the CGPA of 8 th semester in Student performance prediction Label. We have calculated the performance of each algorithm on the basis of Accuracy, Sensitivity and Recall. Comparative analysis of each algorithm is shown in table 2. Algorithm Accuracy Average Precision Risk Precision Below Average Above Average Precession Precision Decision Tree Random Tree Random Forest ID CHAID Decision Stump Results & Interpretation Table 2. Comparative Analysis of Algorithms Journal of Intelligent Computing Volume 8 Number 2 June
6 54 Journal of Intelligent Computing Volume 8 Number 2 June 2017
7 Journal of Intelligent Computing Volume 8 Number 2 June
8 8. Conclusion & Future Work In this paper we have used the six algorithms; results obtained by using each algorithm are mentioned in comparative analysis table. ID3 gives the best results as compare to other on the basis of accuracy and precision for performance prediction and risk analysis. A descriptive result of ID3 helps us to extract following assumptions from the data. Students who were at Risk in Mathematics_1 were also at Risk in Mathematic_2. Students who were at Average in Digital Image processing were also at Average in Digital Signal Processing. Students who were above average in DBMS, they performed well Design Projects. Students who have the status of Hafiz e Quran performed well as compared to others. Students who score low grades in DBMS, they also got low grades in Database Engineering. Mostly students were above average in Microprocessor Based Design, those were above average in computer Architecture. Students who were below average or at risk in Math_1, they got low grades in Math_5. Each result is extracted from the ID3 algorithm description. We have divided the student result into four categories on the basis of CGPA of 8 th semester. If the CGPA was >=3.0 we have categorize them as Above Average, CGPA (2.3 to 2.5) named as Below Average, CGPA (2.6 to 2.9) labeled as Average and students who have CGPA <=2.2 classified as Risk. In future we will design a tool that will predict the next semester subject s grades and Risk Analysis on the base of current result. References [1] Educational Data Mining Society, Available: [2] Personal learning plan, The Glossary of Educational Reform, Great Schools Partnership, Portland, ME, USA. [Online]. Available: [3] Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases, AI Magazine, Fall [4] Romero, C., Ventura, S. (2007). Educational data mining: a survey from 1995 to 2005, Expert Systems with Applications, 33, p [5] Baker, R.S.J.D., Yacef, K. (2009). The State of Educational Data Mining in 2009: A Review and Future Visions, Journal of Educational Data Mining, 1, 1 (1). [6] Sharabiani, Ashkan., Karim, Fazle., Sharabiani, Anooshiravan., Atanasov, Mariya., Darabi, Houshang (2014). An Enhanced Bayesian Network Model for Prediction of Students Academic Performance in Engineering Programs, 2014 IEEE Global Engineering Education Conference, p [7] Jamesmanoharan, J., Ganesh, S.H., Felciah, M.L.P., Shafreenbanu, A.K. (2014). Discovering Students Academic Performance Based on GPA Using K-Means Clustering Algorithm Computing and Communication Technologies (WCCCT), 2014 World Congress on Feb March , p [8] Keerthana, G., Srividhya, Dr.V. (2014). Performance Enhancement of Classifiers using Integration of Clustering and Classification Techniques in International Journal of Computer Science Engineering (IJCSE), 03 May, p [9] Carnegie, D. A., Watterson, C., Andreae, P., Browne, W. N. Prediction of success in engineering study, In: 2012 IEEE Global Engineering Education Conference (EDUCON), 2012, p [10] Dekker, G. W., Pechenizkiy, M., Vleeshouwers, J. M. (2009). Predicting Students Drop Out: A Case Study, In: Proceedings of the 2nd International Conference on Educational Data Mining, Cordoba, Spain, vol. 9, p [11] Bayer, J., Bydzovská, H., Géryk, J., Obs1vac, T., Popel1nskó, L. (2012). Predicting drop-out from social behaviour of students, In: Proceedings of the 5th International Conference on Educational Data Mining-EDM 2012, Chania, Greece, p [12] Piatetsky-Shapiro, Gregory (1991). Discovery, analysis, and presentation of strong rules, in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA. 56 Journal of Intelligent Computing Volume 8 Number 2 June 2017
9 [13] Lopez, M.I., Romero, C., Ventura, S., Luna, J.M. (2012). Classification via clustering for predicting final marks starting from the student participation in Forums, In: Proc. EDM, 2012, p [14]venkatasubbareddy, Pallamreddy., Sreenivasarao, Vuda (2010). The Result Oriented Process for Students Based On Distributed Data Mining, International Journal of Advanced Computer Science and Applications, 1 (5) November 2010, p [15] Hammond, L. D., Zielezinski, M. B., Goldman, S. (2014). Using technology to support at-risk students learning, In: Alliance for Excellent Education, Stanford Center for Opportunity Policy in Education, [Online]. Available: stanford.edu/ sites/default/files/scope-pub-using-technologyreport.pdf Journal of Intelligent Computing Volume 8 Number 2 June
ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationIssues in the Mining of Heart Failure Datasets
International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationBusiness Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationAnalyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio
SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationSTUDYING ACADEMIC INDICATORS WITHIN VIRTUAL LEARNING ENVIRONMENT USING EDUCATIONAL DATA MINING
STUDYING ACADEMIC INDICATORS WITHIN VIRTUAL LEARNING ENVIRONMENT USING EDUCATIONAL DATA MINING Eng. Eid Aldikanji 1 and Dr. Khalil Ajami 2 1 Master Web Science, Syrian Virtual University, Damascus, Syria
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationUniversidade do Minho Escola de Engenharia
Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationApplications of data mining algorithms to analysis of medical data
Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationCOMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS
COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationImpact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees
Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationData Fusion Models in WSNs: Comparison and Analysis
Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationMining Student Evolution Using Associative Classification and Clustering
Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationContent-based Image Retrieval Using Image Regions as Query Examples
Content-based Image Retrieval Using Image Regions as Query Examples D. N. F. Awang Iskandar James A. Thom S. M. M. Tahaghoghi School of Computer Science and Information Technology, RMIT University Melbourne,
More informationComparison of EM and Two-Step Cluster Method for Mixed Data: An Application
International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationTest Effort Estimation Using Neural Network
J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish
More informationActivity Recognition from Accelerometer Data
Activity Recognition from Accelerometer Data Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman Department of Computer Science Rutgers University Piscataway, NJ 08854 {nravi,nikhild,preetham,mlittman}@cs.rutgers.edu
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationUSER ADAPTATION IN E-LEARNING ENVIRONMENTS
USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationPp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures
Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining (Portland, OR, August 1996). Predictive Data Mining with Finite Mixtures Petri Kontkanen Petri Myllymaki
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationCLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationWelcome to. ECML/PKDD 2004 Community meeting
Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationModeling user preferences and norms in context-aware systems
Modeling user preferences and norms in context-aware systems Jonas Nilsson, Cecilia Lindmark Jonas Nilsson, Cecilia Lindmark VT 2016 Bachelor's thesis for Computer Science, 15 hp Supervisor: Juan Carlos
More informationA Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and
A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning Overview Motivation for Analyses Analyses and
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationLarge-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy
Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationIdentification of Opinion Leaders Using Text Mining Technique in Virtual Community
Identification of Opinion Leaders Using Text Mining Technique in Virtual Community Chihli Hung Department of Information Management Chung Yuan Christian University Taiwan 32023, R.O.C. chihli@cycu.edu.tw
More informationDeveloping True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability
Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan
More informationCross-lingual Short-Text Document Classification for Facebook Comments
2014 International Conference on Future Internet of Things and Cloud Cross-lingual Short-Text Document Classification for Facebook Comments Mosab Faqeeh, Nawaf Abdulla, Mahmoud Al-Ayyoub, Yaser Jararweh
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationFeature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes
Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Viviana Molano 1, Carlos Cobos 1, Martha Mendoza 1, Enrique Herrera-Viedma 2, and
More informationHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION Atul Laxman Katole 1, Krishna Prasad Yellapragada 1, Amish Kumar Bedi 1, Sehaj Singh Kalra 1 and Mynepalli Siva Chaitanya 1 1 Samsung
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationIntegrating E-learning Environments with Computational Intelligence Assessment Agents
Integrating E-learning Environments with Computational Intelligence Assessment Agents Christos E. Alexakos, Konstantinos C. Giotopoulos, Eleni J. Thermogianni, Grigorios N. Beligiannis and Spiridon D.
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationMultivariate k-nearest Neighbor Regression for Time Series data -
Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,
More informationExperiment Databases: Towards an Improved Experimental Methodology in Machine Learning
Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationEmpowering Students Learning Achievement Through Project-Based Learning As Perceived By Electrical Instructors And Students
Edith Cowan University Research Online EDU-COM International Conference Conferences, Symposia and Campus Events 2006 Empowering Students Learning Achievement Through Project-Based Learning As Perceived
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationCircuit Simulators: A Revolutionary E-Learning Platform
Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,
More informationA survey of multi-view machine learning
Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More informationContent-free collaborative learning modeling using data mining
User Model User-Adap Inter DOI 10.1007/s11257-010-9095-z ORIGINAL PAPER Content-free collaborative learning modeling using data mining Antonio R. Anaya Jesús G. Boticario Received: 23 April 2010 / Accepted
More informationSemi-Supervised Face Detection
Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University
More information