Introduction to Machine Learning applied to genomic selection
|
|
- Meghan Logan
- 6 years ago
- Views:
Transcription
1 Introduction to Machine Learning applied to genomic selection O. González-Recio 1 Dpto Mejora Genética Animal, INIA, Madrid; O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
2 Outline 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
3 Outline 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
4 MACHINE LEARNING What is Learning? Making useful changes in our minds. -Marvin Minsky- Denotes changes in the system that enable the system to make the same task more effectively the next time. -Herbert Simon- Machine Learning Multidisciplinary field. Bio-informatics, statistics, genomics, data mining, astronomy, www,... Avoids rigid parametric models that may be far away from our observations. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
5 MACHINE LEARNING What is Learning? Making useful changes in our minds. -Marvin Minsky- Denotes changes in the system that enable the system to make the same task more effectively the next time. -Herbert Simon- Machine Learning Multidisciplinary field. Bio-informatics, statistics, genomics, data mining, astronomy, www,... Avoids rigid parametric models that may be far away from our observations. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
6 MACHINE LEARNING Machine Learning in genomic selection Massive amount of information. Need to extract knowledge from large, noisy, redundant, missing and fuzzy data. ML is able to extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design. Supervised Learning: we have a target output (phenotypes). O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
7 MACHINE LEARNING Massive Genomic Information What does information consume in an information-rich world? it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. -Herbert Simon; Nobel price in Economics- Overview Develop algorithms to extract knowledge from some set of data in an effective and efficient fashion, to predict yet to be observed data following certain rules. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
8 MACHINE LEARNING Massive Genomic Information What does information consume in an information-rich world? it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. -Herbert Simon; Nobel price in Economics- Overview Develop algorithms to extract knowledge from some set of data in an effective and efficient fashion, to predict yet to be observed data following certain rules. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
9 INTRO What is Learning? Given: a colection of examples (data) E (phenotypes and covariates) Produce: an equation or description (T) that covers all or most examples, and predicts (P) the value, class or category of a yet-to-be observed example. The algorithm learns relationships and associations between already observed examples to predict phenotypes when their covariates are observed. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
10 MOTIVATION Definition a computer program is said to learn from experience E with respect to some class of task T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
11 INTRO Machine Learning is a piece in the process to adquire new knowledge. Workflow in Data Mining tasks From Inza et al. (2010) O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
12 OUTLINE OF THE COURSE In this course Basic concepts in Machine Learning Design of a learning system. Regularization and bias-variance trade off. Ensemble methods: Boosting Random Forest O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
13 Outline Learning System Design Description 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
14 Why is it important? Learning System Design Description Vital to implement an effective learning. What should be considered Wonder what do we want to answer. What scenario is expected. Design the learning and validation sets in consequence. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
15 Learning System Design Description Learning system in genomic selection Genome-wide association studies Goal: Find genetic variants associated to a given trait. What is the phenotype distribution in our population. Prediction of genetic merit in future generations is less important. Diseases: Case-control, case-case-control designs. Genomic selection Goal: Predict genomic merit of individuals w/o phenotype. We expect DNA recombinations in subsequent generations. Re-phenotyping every x generations. Overlapped or discrete generations. Select training and testing sets according to the characteristics of our population. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
16 Outline Learning System Design Types of designs 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
17 Learning design Same learning and validation set Learning System Design Types of designs O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
18 Learning design k-fold cross validaion Learning System Design Types of designs O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
19 Learning design Training and testing sets Learning System Design Types of designs O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
20 Outline Ensemble methods Ensemble methods 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
21 Introduction Ensemble methods Ensemble methods Wide variate of competing methods Bayes alphabet, Bayesian LASSO, Ridge regression, Logistic regression, Neural networks,... The comparative accuracy depends strongly on the trait, problem addressed or genetic architecture. A priori we don t know what method is better for a new problem. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
22 Introduction Ensemble methods Ensemble methods Ensembles Ensembles are combination of different methods (usually simple models). They have very good predictive ability because use complementary and additivity of models performances. Ensembles have better predictive ability than methods separately. They have known statistics properties (no black boxes ). In a multitud of counselors there is saftey O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
23 Introduction Ensemble methods Ensemble methods Ensembles y = c 0 + c 1 f 1 (y,x) + c 2 f 2 (y,x) c M f M (y,x) + e O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
24 Ensemble methods Building Ensembles: Two steps Ensemble methods 1. Developing a population of varied models Also called base learners. May be weak models: slightly better than random guess. Same/different method. Features Subset Selection (FSS). May capture non-linearities and interactions. Partition of the input space. 2. Combining them to form a composite predictor Voting. Estimated weight. Averaging. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
25 Ensemble methods Building Ensembles: Two steps Ensemble methods 1. Developing a population of varied models Also called base learners. May be weak models: slightly better than random guess. Same/different method. Features Subset Selection (FSS). May capture non-linearities and interactions. Partition of the input space. 2. Combining them to form a composite predictor Voting. Estimated weight. Averaging. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
26 Examples Ensemble methods Ensemble methods Most common ensembles Model averaging (e.g. Bayesian model averaging). Bagging. Boosting. Random Forest. Can be worse Most ensembling use variations of one kind of modeling examples, but complex and heterogeneus ensembling may be imagined. Boosting and Random Forest High dimensional heuristic search algorithms to detect signal covariates. Do not model any particular gene action or genetic architecture. Do not provide a simple estimate of effect size. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
27 Outline Ensemble methods Bagging 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
28 Bagging Ensemble methods Bagging Bootstrap aggregating bootstrap data and average results ŷ = 1 M M m=1 f m(ψ m ), with Ψ m being a bootstrapped sample of the N records of (y,x). f m ( ) is the model of choice applied to the bootstrapped data. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
29 Bagging Ensemble methods Bagging Bootstrap aggregating e N(0,σ 2 e ) i.i.d. Averaging residuals ê i = 1 M M m=1 (y i ŷ im ), we expect that e approximatte to zero by a factor of M. Unfortunately, e are not independent during the process and a limit is usualy reached. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
30 Outline Ensemble methods Boosting 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
31 Boosting Ensemble methods Boosting Properties Based on AdaBoost (Freund and Schapire, 1996). May be applied to both continuous and categorical traits. Bühlmann and Yu (2003) proposed a version for high dimensional problems. Covariate selection Small step gradient descent O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
32 Boosting Ensemble methods Boosting In genomic selection Apply base learners on the residuals of the previous one. Implement feature selection at each step. Apply a small weight on each learner and train a new learner on residuals. It does not require heritance model specification (additivity, epistasis, dominance,... ). O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
33 Outline Ensemble methods Random Forest 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
34 Random Forest Ensemble methods Random Forest Properties Based on classification and regression trees (CART). Analyze discrete or continuous traits. Implements feature selection. Exploits randomization. Massively non-parametric. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
35 Random Forest Ensemble methods Random Forest Advantages in genomic selection It does not require heritance model specification (additivity, epistasis, dominance,... ). It is able to capture complex interactions in the data. Implements bagging (Breiman, 1996). Reduce error prediction by a factor of the number of trees. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
36 Outline Ensemble methods Examples 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
37 Ensemble methods Examples Examples L2-Boosting algorithm applied to high-dimensional problems in genomic selection (Genetics Research, 2010) Gonzalez-Recio O., K.A. Weigel, D. Gianola, H. Naya and G.J.M. Rosa Prediction accuracy for productive lifetime in a testing set in dairy cattle (3304 training/1398 testing; 32,611 SNPs) Method Pearson correlation MSE bias Boosting_OLS Bayes A Bayesian LASSO O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
38 Ensemble methods Examples Examples L2-Boosting algorithm applied to high-dimensional problems in genomic selection (Genetics Research, 2010) Gonzalez-Recio O., K.A. Weigel, D. Gianola, H. Naya and G.J.M. Rosa Prediction accuracy for progeny average feed conversion rate in a testing set in broilers (333 training/61 testing; 3481 SNPs) Pearson correlation MSE bias Boosting_NPR Boosting OLS Bayes A Bayesian LASSO O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
39 Ensemble methods Examples Examples Analysis of discrete traits in a genomic selection context using Bayesian regressions and Machine Learning (reviewing) Gonzalez-Recio O. and S. Forni ˆ for Scrotal Hernia incidence from three lines Prediction accuracy (cor(y, y)) of PIC Line A (923 purebred) Line B (919 purebred) Line C (700 crossbred) O. González-Recio (INIA) TBA BTL Machine Learning RanFor L2B LhB UPV Valencia, Sept / 51
40 Ensemble methods Examples Examples Analysis of discrete traits in a genomic selection context using Bayesian regressions and Machine Learning (reviewing) Gonzalez-Recio O. and S. Forni Area under the ROC curve for Scrotal Hernia incidence from three lines of PIC Method Line A (923 purebred) Line B (919 purebred) Line C (700 crossbred) O. González-Recio (INIA) TBA BTL Machine Learning RanFor L2B LhB UPV Valencia, Sept / 51
41 Ensemble methods Examples Examples Analysis of discrete traits in a genomic selection context using Bayesian regressions and Machine Learning (reviewing) Prediction accuracy for Scrotal Hernia incidence from a nucleus line of PIC O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
42 Outline Regularization Bias-Variance trade off 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
43 Background Regularization Bias-Variance trade off Regularization Analysis of high throughput genotyping data: large p, small n problem. Models without regularization or feature subset selection (FSS) are prone to overfitting and decrease predictive ability. Including all covariates increases the complexity of the model. Follow Occam s Razor: entities must not be multiplied beyond necessity or When accuracy of two hypothesis is similar, prefer the simpler one. Generalization is hurt by complexity. All new assumptions introduce possibilities for error, then, keep it simple. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
44 Model complexity Regularization Bias-Variance trade off Bias-variance trade off Low complexity: high bias, low variance. Large complexity: low bias, high variance. Optimum intermedium bias-variance trade off Variance Bias^2 MSE Model complexity O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
45 Model complexity Regularization Bias-Variance trade off Bias-variance trade off Low complexity: high bias, low variance. Large complexity: low bias, high variance. Optimum intermedium O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
46 Regularization Bias-Variance trade off Regularization in shrinkage models Penalization term or prior assumptions Ridge Regression: penalize p s=1 β 2 s. Bayes B (C, D,...): set snp variance/coefficient to zero with probability π, and remaining snp variances are assumed inverted chi-squared prior distribution. Bayes A: assume a inverted chi-squared prior distribution for SNP variance. LASSO: penalize λ p s=1 β s. Bayesian LASSO: double exponential prior distribution (controlled by λ) on SNP coefficients. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
47 Outline Regularization Model complexity in ensembles 1 2 Learning System Design Description Types of designs 3 Ensemble methods Overview Bagging Boosting Random Forest Examples 4 Regularization Bias-Variance trade off Model complexity in ensembles 5 Remarks O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
48 Complexity of ensembles Regularization Model complexity in ensembles Use simple models. Use many models. Interpretation of many models, even simple model, may be much harder than with a single model. Ensembles are competitive in accuracy though at a probable loss of interpretability. Too complex ensembles may lead to overfitting. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
49 Complexity of ensembles Regularization Model complexity in ensembles Are ensembles truly complex? They appear so, but do they act so? Controling complexity in ensembles is not as simple as merely count coefficients or assume prior distrbutions. Many ensembles do not show overfitting (Bagging, Random Forest). Control the complexity of the ensembles using cross-validation (There exist more complicated ways). Tune the number of ensembles constructed. Use more or less complex base learners. In general, ensembles are rather robust to overfitting. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
50 Complexity of ensembles Regularization Model complexity in ensembles Mean Squared Error in the training set (2 different base learners). O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
51 Complexity of ensembles Regularization Model complexity in ensembles Mean Squared Error in the testing set (2 different base learners). O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
52 Complexity of ensembles Regularization Model complexity in ensembles Mean Squared Error in the testing set (2 different base learners). O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
53 Remarks Remarks Machine Learning New data/concepts are frequently generated in molecular biology/genomic, and ML can efficiently adapt to this fast evolving nature. ML is able to deal with missing and noisy data from many scenarios. ML is able to deal with huge volumes of data generated by novel high-throughput devices, extracting hidden relationships not noticeable to experts. ML can adjust its internal structure to the data producing accurate estimates. ML uses algorithms that learn from the data (combinations of artificial inteligence and statistics). Need a careful data preprocessing and design of the learning system. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
54 Remarks Remarks Ensembles Ensembles are combination of several base learners, improving accuracy substantially. Ensembles may seem complex, but they do not act so. Perform extremely well in a variety of possible complex domains. Have desirable statistical properties. Scale well computationally. We will learn how to implement ensembles in a genomic selection context. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
55 To take home Remarks Inherent complexity of genetic/biologic systems have unknown properties/rules that may not be parametrized. Learn from experiences, interpret from knowledge. If worried for shrinkage, use boosting. If believe in state of nature yet, use Random Forest. O. González-Recio (INIA) Machine Learning UPV Valencia, Sept / 51
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationComparison of network inference packages and methods for multiple networks inference
Comparison of network inference packages and methods for multiple networks inference Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr 1ères Rencontres R - BoRdeaux, 3
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationWhy Did My Detector Do That?!
Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,
More informationABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms
ABSTRACT DEODHAR, SUSHAMNA DEODHAR. Using Grammatical Evolution Decision Trees for Detecting Gene-Gene Interactions in Genetic Epidemiology. (Under the direction of Dr. Alison Motsinger-Reif.) A major
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationUniversityy. The content of
WORKING PAPER #31 An Evaluation of Empirical Bayes Estimation of Value Added Teacher Performance Measuress Cassandra M. Guarino, Indianaa Universityy Michelle Maxfield, Michigan State Universityy Mark
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationMachine Learning and Development Policy
Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes
More informationTime series prediction
Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationProbability estimates in a scenario tree
101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSTA 225: Introductory Statistics (CT)
Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationImpact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees
Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,
More informationA Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2006 A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements Donna S. Kroos Virginia
More information12- A whirlwind tour of statistics
CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationUniversidade do Minho Escola de Engenharia
Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially
More informationAnalysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems
Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Ajith Abraham School of Business Systems, Monash University, Clayton, Victoria 3800, Australia. Email: ajith.abraham@ieee.org
More informationRicopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015
Ricopili: Postimputation Module WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015 Ricopili Overview Ricopili Overview postimputation, 12 steps 1) Association analysis 2) Meta analysis
More informationApplications of data mining algorithms to analysis of medical data
Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationQuantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)
Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1 Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available
More informationSemi-Supervised Face Detection
Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationIssues in the Mining of Heart Failure Datasets
International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar
More informationIndividual Differences & Item Effects: How to test them, & how to test them well
Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects Properties of subjects Cognitive abilities (WM task scores, inhibition) Gender Age
More informationMGT/MGP/MGB 261: Investment Analysis
UNIVERSITY OF CALIFORNIA, DAVIS GRADUATE SCHOOL OF MANAGEMENT SYLLABUS for Fall 2014 MGT/MGP/MGB 261: Investment Analysis Daytime MBA: Tu 12:00p.m. - 3:00 p.m. Location: 1302 Gallagher (CRN: 51489) Sacramento
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationCooperative evolutive concept learning: an empirical study
Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationLahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017
Instructor Syed Zahid Ali Room No. 247 Economics Wing First Floor Office Hours Email szahid@lums.edu.pk Telephone Ext. 8074 Secretary/TA TA Office Hours Course URL (if any) Suraj.lums.edu.pk FINN 321 Econometrics
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationHonors Mathematics. Introduction and Definition of Honors Mathematics
Honors Mathematics Introduction and Definition of Honors Mathematics Honors Mathematics courses are intended to be more challenging than standard courses and provide multiple opportunities for students
More informationLarge-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy
Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010
More informationMathematics subject curriculum
Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June
More informationHierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationAn Empirical Comparison of Supervised Ensemble Learning Approaches
An Empirical Comparison of Supervised Ensemble Learning Approaches Mohamed Bibimoune 1,2, Haytham Elghazel 1, Alex Aussem 1 1 Université de Lyon, CNRS Université Lyon 1, LIRIS UMR 5205, F-69622, France
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationTheory of Probability
Theory of Probability Class code MATH-UA 9233-001 Instructor Details Prof. David Larman Room 806,25 Gordon Street (UCL Mathematics Department). Class Details Fall 2013 Thursdays 1:30-4-30 Location to be
More informationA Comparison of Charter Schools and Traditional Public Schools in Idaho
A Comparison of Charter Schools and Traditional Public Schools in Idaho Dale Ballou Bettie Teasley Tim Zeidner Vanderbilt University August, 2006 Abstract We investigate the effectiveness of Idaho charter
More informationActivity Recognition from Accelerometer Data
Activity Recognition from Accelerometer Data Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman Department of Computer Science Rutgers University Piscataway, NJ 08854 {nravi,nikhild,preetham,mlittman}@cs.rutgers.edu
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationMulti-label classification via multi-target regression on data streams
Mach Learn (2017) 106:745 770 DOI 10.1007/s10994-016-5613-5 Multi-label classification via multi-target regression on data streams Aljaž Osojnik 1,2 Panče Panov 1 Sašo Džeroski 1,2,3 Received: 26 April
More informationA Pipelined Approach for Iterative Software Process Model
A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationMulti-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.
Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling. Bengt Muthén & Tihomir Asparouhov In van der Linden, W. J., Handbook of Item Response Theory. Volume One. Models, pp. 527-539.
More informationMISSISSIPPI STATE UNIVERSITY SUG FACULTY SALARY DATA BY COLLEGE BY DISCIPLINE
MISSISSIPPI STATE UNIVERSITY Agriculture & Life Sciences Agricultural & Biological Eng. Professor $74,571 $103,068 $86,417 $92,026 $77,927 $110,675 $91,048 $95,693 $80,265 $116,208 $94,119 $99,749 /140301
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationSTT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.
STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationSchool Size and the Quality of Teaching and Learning
School Size and the Quality of Teaching and Learning An Analysis of Relationships between School Size and Assessments of Factors Related to the Quality of Teaching and Learning in Primary Schools Undertaken
More informationPROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia
PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT by James B. Chapman Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationGDP Falls as MBA Rises?
Applied Mathematics, 2013, 4, 1455-1459 http://dx.doi.org/10.4236/am.2013.410196 Published Online October 2013 (http://www.scirp.org/journal/am) GDP Falls as MBA Rises? T. N. Cummins EconomicGPS, Aurora,
More informationDetailed course syllabus
Detailed course syllabus 1. Linear regression model. Ordinary least squares method. This introductory class covers basic definitions of econometrics, econometric model, and economic data. Classification
More informationCLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department
More informationBusiness Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages
More informationPeer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice
Megan Andrew Cheng Wang Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice Background Many states and municipalities now allow parents to choose their children
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationData Fusion Through Statistical Matching
A research and education initiative at the MIT Sloan School of Management Data Fusion Through Statistical Matching Paper 185 Peter Van Der Puttan Joost N. Kok Amar Gupta January 2002 For more information,
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationProfessor Christina Romer. LECTURE 24 INFLATION AND THE RETURN OF OUTPUT TO POTENTIAL April 20, 2017
Economics 2 Spring 2017 Professor Christina Romer Professor David Romer LECTURE 24 INFLATION AND THE RETURN OF OUTPUT TO POTENTIAL April 20, 2017 I. OVERVIEW II. HOW OUTPUT RETURNS TO POTENTIAL A. Moving
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationInstructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100
San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More informationA survey of multi-view machine learning
Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct
More informationData Structures and Algorithms
CS 3114 Data Structures and Algorithms 1 Trinity College Library Univ. of Dublin Instructor and Course Information 2 William D McQuain Email: Office: Office Hours: wmcquain@cs.vt.edu 634 McBryde Hall see
More informationAnalysis of Enzyme Kinetic Data
Analysis of Enzyme Kinetic Data To Marilú Analysis of Enzyme Kinetic Data ATHEL CORNISH-BOWDEN Directeur de Recherche Émérite, Centre National de la Recherche Scientifique, Marseilles OXFORD UNIVERSITY
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More information