STATS216v Introduction to Statistical Learning Stanford University, Summer Final (Solutions) Duration: 3 hours

Size: px
Start display at page:

Download "STATS216v Introduction to Statistical Learning Stanford University, Summer Final (Solutions) Duration: 3 hours"

Transcription

1 Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2016 Remember the university honor code. Final (Solutions) Duration: 3 hours Write your name and SUNet ID (ThisIsYourSUNetID@stanford.edu) on each page. There are 25 questions in total. All questions are of equal value and are meant to elicit fairly short answers: each question can be answered using 1-5 sentences. You may not access the internet during the exam. You are allowed to use a calculator, though any calculations in the exam, if any, do not have to be carried through to obtain full credit. You may refer to your course textbook and notes, and you may use your laptop provided that internet access is disabled. Please write neatly.

2 1. Heart rates for mice typically decrease with age. A scientist hypothesizes that the rate of this decrease changes dramatically at age 5. The scientist gathers heart rate data for mice of many different ages. What is one way to statistically investigate this hypothesis? State any assumptions your method requires. We can do a linear regression using heart rate as a response with two features: X 1 = age and X 2 = age I [age>5]. We assume the errors of the model are normally distributed. If the coefficient corresponding to X 2 is statistically significant, then we have evidence that the scientist is correct. 2. A web company has a large dataset of movie reviews. A review is represented as a vector counting the number of appearances of the most common English words. The company has access only to this data, and they would like to know which reviews are positive and which are negative. The dataset is very large and all reviews are long so they want to avoid paying someone to read them. A data scientist suggests using a support vector machine to automatically classify each review as 1 if it is positive and as 0 otherwise, and thus avoiding that even a single movie review has to be read by a human. Do you think that this is a good solution? Explain. Fitting a classifier is not a reasonable idea, because the company does not have any labeled data on which to train it.

3 3. A researcher fits a Lasso regression model based on a dataset with 300 predictors, picking λ via cross-validation, and ends up with 10 nonzero coefficients. On a whim, she also decides to run best subset selection to find the best linear model with 10 of the original predictors. This results in a model with much lower training error than the Lasso fit. Excited by her results, she obtains some new data, and notices that the Lasso vastly outperforms the best subset model in this validation data. Explain why this is in fact not surprising. This is reasonable because best subset selection is selecting from many, many models in fact, a total of ( ) possible models. With such a large number of possibilities, it is possible that best subset selection is overfitting to the training data. As a consequence, it is not surprising this method yields high test error, and in particular higher test error than the Lasso. 4. A scientist is studying ring-width series in trees from semiarid environments of western North-America. He asks you to help him fit a predictive model of the effect of the tree age on the ring-width. He provides a large dataset of ring-width series and he believes that the width should be a smooth function of age, but the relationship may behave very differently across different (but unknown) age ranges. Suggest a reasonable method to solve the scientist s problem. Since the relationship may be very different across different age ranges, linear or polynomial regression are not appropriate. Regression splines require one to specify the position of the knots, which are unknown to the scientist. Since all he wants to assume is smoothness, a smoothing spline is a reasonable answer.

4 5. A researcher fitted a local regression model of y on x, as displayed in Figure 1. The corresponding prediction curve is shown in red, on the same figure. Being unsatisfied with the outcome, he then asks a consulting data scientist for some advice on how to improve the fit. After some thought, the data scientist suggests that the model may be suffering from high bias and recommends that the span parameter should be decreased. Given the information at your disposal, explain whether you would agree with her. In practice, how would you confirm your intuition about the optimal span parameter? y x FIGURE 1. Data for question 5. You should not agree with the data scientist. It appears from the picture that the model suffers from high variance, not high bias. The red curve is very wiggly, while the data seem to follow a smooth periodic trend. Therefore, it seems that the span should be increased, not decreased. In practice, you should select the span parameter for local regression via cross-validation. 6. A researcher applies random forests to a very large set of genetic data, to predict whether or not they have prostate cancer. The researcher would like to use 10-fold cross-validation to estimate the test error of the random forest model. Suggest a more computationally efficient way of estimating the test error. Would your answer change if the researcher were using bagging instead of random forests? There is no need to use cross-validation. Random forests are based on bagging and thus one can estimate the test error with the out-of-bag error. The same applies to bagging.

5 7. A famous dispute in Sociology has to do with whether the number of books in a child s home is relevant to the child s educational outcome, commonly measured by a score on a standardized test. All sociologists agree that family income and average age of parents are also relevant, so the question is whether number of books is important after accounting for family income and parent age. Unfortunately, sociologists also agree that the effect of books, family income and average age of parents will be highly nonlinear. Provide a statistically valid way to settle this dispute, and mention any assumptions your method requires. They can use a Generalized Additive (GAM) model. For instance, they could try stand_score = β 0 + f 1 (books) + f 2 (income) + f 3 (parent_age) + ε, with f 1, f 2 and f 3 cubic splines (or of further degree if needed). Assuming the errors ε are normal, we can perform an F -test to check whether the coefficients of basis functions in f 1 are indeed zero or not. 8. A friend of yours suggests a new ensemble tree method. First, he fits N decision trees for a variable y on features X; then, given a new observation, he uses the average of the prediction of each of the N trees. Two of your STATS216v classmates claim this is a useless method: friend A says the predictions coming from this method are exactly what you would get using a random forest, and friend B says the predictions are exactly the same as using bagging. With which of your friends A and B, if any, do you agree? With neither. The proposed algorithm will fit the same tree N times, since there is no inherent randomness in fitting a tree. Bagging and random forests, however, will both fit N different trees by subsampling observations (bagging) or subsampling observations and features (random forests).

6 9. A geologist is interested in prediction and she believes that her data follows the model y = β 0 + β 1 b 1 (x) + β 2 b 2 (x) + ɛ, where the basis functions b 1 (x) and b 2 (x) are, respectively, defined as b 1 (x) = (x 1) I(x 1), b 2 (x) = (x 4) 2 I(x 4), where, as usual, I(x 1) equals 1 for x 1 and 0 otherwise. You fit a linear regression to the above model and obtain the coefficient estimates ˆβ 0 = 2, ˆβ 1 = 1 and ˆβ 2 = 2. Sketch the estimated curve between x = 3 and x = 6. Note the intercepts, slopes, and the values of the curve at x = 3 and x = 6. The answer is shown in Figure 2. y slope=1 slope=0 quadratic x FIGURE 2. Answer to question A doctor would like to classify whether someone has diabetes or not, using p = 1000 gene expression levels, and n = 2000 patients. He wants to pick between bagging or random forest for his prediction method. As is common in the medical area, however, we expect many of the 1000 predictors to be irrelevant in determining whether the patient has diabetes or not. In light of this, would you recommend him use random forest or bagging? Explain. Bagging is a good idea here, whereas random forest isn t. Many of the random forest trees will contain mostly irrelevant features, and so won t make the predictions any better. Each bagged tree, on the other hand, can use all features and will therefore be able to use the few relevant features in every tree.

7 11. You are asked by a bank to create a regression model for credit scores based on n = 100 clients and p = 500 features. For various reasons, you considering using either random forests or the Lasso. You are then told by the bank that, for the new data your method is about to see, we expect about 20% of the features for each data point to be missing at random. In light of this new fact, which of the two classifiers would you choose? Random forests, since they can deal with missing predictors by using surrogate variables. Lasso, on the other hand, would only pick a few of the p = 500 predictors available, resulting in a problem if one of these predictors end up missing. 12. Consider the dataset shown in Figure 3. You would like to use 5-fold cross-validation to compare the performance of logistic regression and QDA on this dataset, but a friend of yours says that cross-validation would not work properly in this case. What could go wrong? Justify. 2 1 X X1 FIGURE 3. Dataset for Problem 12. Using cross-validation and logistic regression here is a bad idea. Note that the two classes are almost linearly separable. As cross-validation only uses a subset of the observations for training, it is likely that the classes in the training data will end up being perfectly separable by straight line.

8 13. Consider a random forest regression algorithm that randomly samples m out of p total predictors at each split. Suppose that m = p α, for some constant value α [0, 1]. Which of the following statements are always true (for any given dataset)? Briefly justify your answers. (a) The test error is an increasing function of α. (b) The out-of-bag error is an increasing function of α. (c) If α = 1, this procedure is known as bagging. (d) The test error is always minimized at α = 1 2. (a) False, nothing can be said about test error. (b) False, much like the test error. (c) True. When α = 1, there is no randomization over predictors, so random forest becomes bagging. (d) False. This is a popular choice, but it does not always minimize the test error. 14. An environmental engineer is using a classification tree to predict the location of bird nest sites from some ecological data. She knows that one of the predictors is much stronger than all others. Would you recommend she should use bagging or boosting to improve the prediction accuracy of her classification tree? Explain why, and suggest a third alternative method that is also suitable. Since one of the predictors is very strong, the bagged trees would probably be highly correlated. Random forest would decorrelate the trees by allowing the bagging algorithm to consider only a random subset of the predictors at each step, so that provides a third method that should be suitable for this dataset. On the other hand, boosting grows the trees sequentially. By fitting small trees to the residuals, we can expect to slowly improve ˆf and build different shaped trees to attack the residuals that cannot be explained by the most powerful predictor.

9 15. Suppose that you want to cluster 5 observations, using hierarchical clustering. You have computed the dissimilarity between each pair of observations, and summarized it into the matrix below: This means, for instance, that the dissimilarity between the first and second observations is 0.3, and the dissimilarity between the second and third observations is 0.5. (a) On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering these four observations using single linkage. Be sure to indicate on the plot the height at which each fusion occurs, as well as the observations corresponding to each leaf in the dendrogram. (b) Would the dendrogram change if you used hierarchical clustering with complete linkage instead of single linkage? If yes, how? (a) The dendrogram is shown in Figure 4(a). (b) In this case, only the height of the last fusion would change. The new dendrogram is shown in Figure 4(b). Cluster Dendrogram Cluster Dendrogram Height Height d hclust (*, "single") d hclust (*, "complete") FIGURE 4. Answers to question 15 (a) and (b)

10 16. A computer scientist has collected a very large dataset of labeled pictures of the handwritten digit 3. Each picture is represented as a gray-scale image (that is, = 256 real values representing the intensity of each pixel). A subset of 130 handwritten digits is shown in the figure below, as an example. The data scientist notes that all pictures are different, showing a variety of writing styles. However, her dataset is so large that she cannot afford to store all the pictures. Using one of the methods learned in this course, how would you suggest she should compress her dataset, in order to retain both the nature of the 3 digit and some important differences in writing styles? FIGURE 5. Data for question 16. She can average all pictures (pixel-by-pixel) to obtain a simple description of what a 3 looks like on average. The differences in writing style can be compressed using PCA. In particular, she can represent each picture as a point in R 256 and perform PCA on the differences between the individual images and the average image. Then, she can keep only the first few components. This is reasonable because the pixels are inherently correlated. Therefore, one should expect a small number of components to provide a good low-dimensional representation of each image. 17. TRUE or FALSE: for a fixed dataset, if we use a decision tree algorithm for prediction then the more terminal nodes the tree has (equivalently, the more splits there are) the likelier it is our prediction algorithm suffers from high variance but low bias. True: more splits means we have more flexibility in choosing the decision function, though since we have less points in each region, variance becomes a problem.

11 18. For each of the following, suggest one method learned in this class which fits the description (no explanations are necessary; assume all predictors are quantitative): (a) A classification method that is affected by standardizing the predictors. (b) A classification method that is not affected by standardizing the predictors. (c) A dimensionality reduction method that is affected by standardizing the variables. (d) A regression method that is not affected by applying an increasing function to the predictors. There are many possible answers. An example is: (a) Linear SVM. (b) Logistic regression. (c) PCA. (d) Decision trees. 19. You are interested in predicting whether an individual will vote for Democrats, Republicans or Independent in an election year, and for this you gather many features on each person, such as economic and social status, ideological leanings and where they live. You have reasons to believe that a linear decision boundary would be appropriate to break apart Democrats from Republicans, but a quadratic decision boundary would be more suitable to distinguish Democrats from Independents, and Republicans from Independents. Suggest a modification to one of the algorithms seen in class that would make it ideal for this scenario. We could fit a modified version of quadratic discriminant analysis in which the Republicans and Democrats categories have the same covariance matrix and the Independents category has its own covariance matrix. This would induce a linear boundary between Republicans and Democrats and quadratic boundaries between Independents and either of the two other categories.

12 20. A friend of yours who is working in finance develops a model to predict whether the market will go up or down tomorrow. He uses cross-validation to validate his model, and obtains the left curve below, which is discouraging: it tells him that it s best for him to use his model with no parameters at all. To investigate this issue, he decides to bootstrap his method by creating N datasets, each being sampled with replacement from his original dataset, and running cross-validation N times. He averages all the cross-validation curves and obtains the right curve in the figure below. From this, he claims it s better to use 5 parameters, not 0. Would you agree with him? FIGURE 6. Plots for Problem 20. No. Because the bootstrap samples contain duplicates, it can often happen that the same data point is both in his training and test sets. Hence, the CV curve will be too optimistic. 21. A friend of yours claims he can accurately determine the probability that a startup company will be succesfull. He collected data on n = 200 past startups, along with p = 30 relevant predictors and a binary outcome, stating whether the company was acquired/ipo-ed within 10 years. To estimate the success probabilities, he says he used SVMs with a polynomial kernel, for added flexibility. Would you trust his claim? Explain. I would not. SVMs cannot estimate probabilities, so his claim must be false.

13 22. A sports researcher is interested in predicting whether an athlete will win a medal in the next Olympic Games. She has several measurements for each athlete: 10 variables measuring their current health level, such as oxygen flow and muscle strength, 12 variables measuring their endorsement and popularity, and 21 variables measuring relevant physical attributes. She has data from past competitions and, by using a multiple linear regression along with cross-validation, finds that her model is extremely good at prediction. However, none of her predictors have a statistically significant coefficient. Assuming the normality assumption holds, what is a reasonable explanation for this outcome? She is using many correlated predictors, which do not hurt the model s predictive ability, but means they have low p-values. Indeed, since any variable in the model can be dropped and replaced by a strong correlate, linear regression cannot confidently say which of the predictors are statistically significant. 23. You are hired by the government to help them decide which wells in a given county provide potable water. To do so, they determined which of n = 58 wells have potable water. Besides these 58 wells, they have the location of 43 others, and would like to use location data to decide whether these new wells are potable or not. Geologists tell you that wells that are close in distance usually have a similar type of water, and so they frequently employ SVMs for the classification task. Do you think this is a good idea? If so, which SVM kernel should they use? If not, state why and suggest a better classifier. SVMs are a good idea, if we use them with the radial kernel. Indeed, the radial kernel has very local behavior, in the sense that only nearby training observations have an effect on the class label of a test observation, which is exactly what the geologists expertise suggests.

14 24. Three friends are discussing which variants of hierarchical clustering could have produced figure 7 below. Friend A claims it is the result of a hierarchical clustering with complete linkage, friend B argues this is due to using single linkage, and friend C claims it is due to using average linkage instead. With which of your three friends, if any, do you agree? FIGURE 7. Figure for Problem 24 With none of them: no agglomerative hierarchical clustering could have resulted in the figure above. Indeed, they would all first merge the two closest points into a cluster, and in this case the closest points in the figure belong to different clusters. 25. An astronomer has image data from a telescope and is interested in locating the six galaxy clusters he expects to find within those images. For each pixel in the picture, he knows the temperature, radiation, and gravitational force at the pixel location. However, it is known that astronomical data routinely contains several outliers. Explain why this should make him reconsider the use of k-means, and suggest an adaptation for this algorithm to attenuate the problem. The k-means algorithm relies on the means of the clusters, which are very sensitive to outliers. One way to attenuate the problem is to use the medians, which are much more robust. (This is what is called the k-medoids algorithm.)

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4 Chapters 1-5 Cumulative Assessment AP Statistics Name: November 2008 Gillespie, Block 4 Part I: Multiple Choice This portion of the test will determine 60% of your overall test grade. Each question is

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

AP Statistics Summer Assignment 17-18

AP Statistics Summer Assignment 17-18 AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic

More information

STA 225: Introductory Statistics (CT)

STA 225: Introductory Statistics (CT) Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Machine Learning and Development Policy

Machine Learning and Development Policy Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Go fishing! Responsibility judgments when cooperation breaks down

Go fishing! Responsibility judgments when cooperation breaks down Go fishing! Responsibility judgments when cooperation breaks down Kelsey Allen (krallen@mit.edu), Julian Jara-Ettinger (jjara@mit.edu), Tobias Gerstenberg (tger@mit.edu), Max Kleiman-Weiner (maxkw@mit.edu)

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Analysis of Enzyme Kinetic Data

Analysis of Enzyme Kinetic Data Analysis of Enzyme Kinetic Data To Marilú Analysis of Enzyme Kinetic Data ATHEL CORNISH-BOWDEN Directeur de Recherche Émérite, Centre National de la Recherche Scientifique, Marseilles OXFORD UNIVERSITY

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Universityy. The content of

Universityy. The content of WORKING PAPER #31 An Evaluation of Empirical Bayes Estimation of Value Added Teacher Performance Measuress Cassandra M. Guarino, Indianaa Universityy Michelle Maxfield, Michigan State Universityy Mark

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District Report Submitted June 20, 2012, to Willis D. Hawley, Ph.D., Special

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

STAT 220 Midterm Exam, Friday, Feb. 24

STAT 220 Midterm Exam, Friday, Feb. 24 STAT 220 Midterm Exam, Friday, Feb. 24 Name Please show all of your work on the exam itself. If you need more space, use the back of the page. Remember that partial credit will be awarded when appropriate.

More information

P-4: Differentiate your plans to fit your students

P-4: Differentiate your plans to fit your students Putting It All Together: Middle School Examples 7 th Grade Math 7 th Grade Science SAM REHEARD, DC 99 7th Grade Math DIFFERENTATION AROUND THE WORLD My first teaching experience was actually not as a Teach

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology Essentials of Ability Testing Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology Basic Topics Why do we administer ability tests? What do ability tests measure? How are

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

Algebra 2- Semester 2 Review

Algebra 2- Semester 2 Review Name Block Date Algebra 2- Semester 2 Review Non-Calculator 5.4 1. Consider the function f x 1 x 2. a) Describe the transformation of the graph of y 1 x. b) Identify the asymptotes. c) What is the domain

More information

Hierarchical Linear Models I: Introduction ICPSR 2015

Hierarchical Linear Models I: Introduction ICPSR 2015 Hierarchical Linear Models I: Introduction ICPSR 2015 Instructor: Teaching Assistant: Aline G. Sayer, University of Massachusetts Amherst sayer@psych.umass.edu Holly Laws, Yale University holly.laws@yale.edu

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

learning collegiate assessment]

learning collegiate assessment] [ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766

More information

CSC200: Lecture 4. Allan Borodin

CSC200: Lecture 4. Allan Borodin CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning Overview Motivation for Analyses Analyses and

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Strategic Practice: Career Practitioner Case Study

Strategic Practice: Career Practitioner Case Study Strategic Practice: Career Practitioner Case Study heidi Lund 1 Interpersonal conflict has one of the most negative impacts on today s workplaces. It reduces productivity, increases gossip, and I believe

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators May 2007 Developed by Cristine Smith, Beth Bingman, Lennox McLendon and

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation Multimodal Technologies and Interaction Article Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation Kai Xu 1, *,, Leishi Zhang 1,, Daniel Pérez 2,, Phong

More information

A Bootstrapping Model of Frequency and Context Effects in Word Learning

A Bootstrapping Model of Frequency and Context Effects in Word Learning Cognitive Science 41 (2017) 590 622 Copyright 2016 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12353 A Bootstrapping Model of Frequency

More information

Comparison of network inference packages and methods for multiple networks inference

Comparison of network inference packages and methods for multiple networks inference Comparison of network inference packages and methods for multiple networks inference Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr 1ères Rencontres R - BoRdeaux, 3

More information

How Effective is Anti-Phishing Training for Children?

How Effective is Anti-Phishing Training for Children? How Effective is Anti-Phishing Training for Children? Elmer Lastdrager and Inés Carvajal Gallardo, University of Twente; Pieter Hartel, University of Twente; Delft University of Technology; Marianne Junger,

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS, Australian Council for Educational Research, thomson@acer.edu.au Abstract Gender differences in science amongst

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1 Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

What is related to student retention in STEM for STEM majors? Abstract:

What is related to student retention in STEM for STEM majors? Abstract: What is related to student retention in STEM for STEM majors? Abstract: The purpose of this study was look at the impact of English and math courses and grades on retention in the STEM major after one

More information

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are: Every individual is unique. From the way we look to how we behave, speak, and act, we all do it differently. We also have our own unique methods of learning. Once those methods are identified, it can make

More information

A survey of multi-view machine learning

A survey of multi-view machine learning Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v2 [cs.cv] 30 Mar 2017 Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

More information

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Exploratory Study on Factors that Impact / Influence Success and failure of Students in the Foundation Computer Studies Course at the National University of Samoa 1 2 Elisapeta Mauai, Edna Temese 1 Computing

More information

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1 Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Cal s Dinner Card Deals

Cal s Dinner Card Deals Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download

More information

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

Evaluation of ecodriving performances and teaching method: comparing training and simple advice EJTIR Issue 14(3), 014 pp. 01-13 ISSN: 1567-7141 www.ejtir.tbm.tudelft.nl Evaluation of ecodriving performances and teaching method: comparing training and simple advice Cindie Andrieu 1, Guillaume Saint

More information

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy Logistics: This activity addresses mathematics content standards for seventh-grade, but can be adapted for use in sixth-grade

More information

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE Pierre Foy TIMSS Advanced 2015 orks User Guide for the International Database Pierre Foy Contributors: Victoria A.S. Centurino, Kerry E. Cotter,

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Word learning as Bayesian inference

Word learning as Bayesian inference Word learning as Bayesian inference Joshua B. Tenenbaum Department of Psychology Stanford University jbt@psych.stanford.edu Fei Xu Department of Psychology Northeastern University fxu@neu.edu Abstract

More information

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics 5/22/2012 Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics College of Menominee Nation & University of Wisconsin

More information

Examining the Earnings Trajectories of Community College Students Using a Piecewise Growth Curve Modeling Approach

Examining the Earnings Trajectories of Community College Students Using a Piecewise Growth Curve Modeling Approach Examining the Earnings Trajectories of Community College Students Using a Piecewise Growth Curve Modeling Approach A CAPSEE Working Paper Shanna Smith Jaggars Di Xu Community College Research Center Teachers

More information