Gene-Expression Microarrays Classification using Feature Selection and Support Vector Machines

Size: px
Start display at page:

Download "Gene-Expression Microarrays Classification using Feature Selection and Support Vector Machines"

Transcription

1 Gene-Expression Microarrays Classification using Feature Selection and Support Vector Machines Darcy Davis Allison Hanuschak - Alina Lazar Department of Computer Science and Information Systems Youngstown State University 1. Introduction Every living organism contains inside its cells genetic material, which is transmitted from one generation to the next. The genetic material encoded in each cell is composed out of nucleic acid (DNA). The DNA molecule is organized into segments called genes. An organism has the same genes in all its cells but they can be in different stages at different time moments. The genetic information stored into DNA may be transcribed into complementary RNA molecules which in turn may be translated into proteins. Many complex human diseases and especially cancer are correlated with abnormal functionality at this level. After 1996, a new technology called DNA microarrays gave researchers the possibility to synthesize a global gene image of the cell. The study of gene-expression microarrays is a reasonably new development in biology that allows thousands of genes to be studied simu ltaneously. Each mircoarray is a silicon chip on which gene probes are align in a grid pattern. Measurements are done by using fluorescent detection. The fact that today one microarray can be used to measure all the human genes has led to advances in the diagnosis and prognosis of diseases and also in the drug discovery [1]. However, the amount of data in each microarray is too overwhelming for manual analysis, since a single sample often contains measurements for around 10,000 genes. Due to this excessive amount of information, efficiently producing results requires automatic computer controlled analysis of data. By using machine learning techniques [2, 3], the computer can be trained to recognize patterns that biologically classify the microarrays. Assuming that half of the instances are from healthy patients and half are from patients that have a disease, especially cancer, by using machine learning algorithms we can find gene combinations to distinguish and separate the healthy patients from the sick ones. The main data analysis techniques [4] used currently in biomedical applications related to microarrays are: classification, clustering and gene selection. In contrast with data sets from other fields a typical microarray dataset has a large number of genes (~10000) and a small number of samples (~100). However, we can expect that not all the genes will carry relevant information for a particular classification task. The process of selecting only the important components it is called gene or feature selection [5]. Supervised learning or classification algorithm can be used to classify and predict diseases outcome. Unlike classification, clustering does not use a tissue annotation as a decision and it is used to discover new biological classes. Several machine learning algorithms [6] have been previously used to classify microarrays datasets, including, decision trees, Fisher linear discriminant analysis, nearest neighbor, neural networks, Bayesian networks and support vector machines. Supervised machine learning method known like support vector machine (SVM) [7, 8] have been used to analyze a preexisting data set of microarrays and diagnose cancer. As it is unreasonable to expect perfect diagnosis with a limited knowledge of cancer, the goal is to optimize the correctness of the diagnosis by employing different methods for using and training the SVM. Applying machine learning algorithms on DNA microarray data sets is of maximum importance for the future medical research related to gene expression analysis for disease classification and genotyping for diagnosis and drug discovery. 2. Specific Questions The goal of our proposed research will be to use supervised learning to classify and predict cancer or other diseases, based on the gene expressions collected from microarrays. These microarrays give us information concerning the rate at which a certain gene is expressing itself, or in other terms, the rate at which its DNA is being transcribed into RNA and then being translated into the corresponding protein. Today, there are many freely 1

2 available public microarray data sets available to analyze and utilize in our research. Table 1 summarizes the data sets that will be used in the present research. Table 1. Publicly Available Microarray Datasets Name National Center for Biotechnology Information Stanford Microarray Database URL University of Pittsburgh Microarray Dataset Collection Kent Ridge Bio-medical Data Set Repository Known sets of data will be used to train the machine learning protocols to categorize cancer patients according to their prognosis. Consequently, the accuracy of the routines developed will be tested against a separate set of known data. The outcome of this study will provide information regarding the efficiency of the machine learning techniques, in particular SVM methods, in discovering patterns related to genetic disorders, and also will allow the identification of relevant types of gene expressions. These could possibly be abnormal expression rates for a particular gene, the presence or absence of a particular gene or sequence of genes, or a pattern of unusual expression across a gene subset. Subsequently, SVM methods with different parameters will be applied to identify the best ones in terms of accuracy, efficiency and least false positive outcome [9]. It is envisioned that this would thereby provide help to guide physicians in determining the best treatment for a patient, for example regarding the aggressiveness of a course of treatment on which to place a patient. 3. Methods Two of the most important and hard problems in microarray data analysis relate to the dimensionality of the data and to noise. Because many data analysis techniques involve exhaustive search over the object space, they are very sensitive to the size of the data in terms of time complexity. In case of microarrays, the solution is to reduce the search space vertically (in terms of genes) by using a feature selection method. The other problem is that errors occur during actual data collection and they are referred as noise in the data. Supervised learning methods based on statistical learning theory, for classification and regression, provide good generalization and classification accuracy on real data. However, their inherent trade-off is their computational expense. Recently, support vector machines (SVM) [10] have become a popular tool for learning methods since they translate the input data into a larger feature space where the instances are linear separable, thus increasing efficiency. In the SVM methods a kernel which can be considered a similarity measure is used to recode the input data. The kernel is used accompanied by a map function Φ. Even if the mathematics behind the SVM is straight forward, finding the best choices for the kernel function and parameters can be challenging, when applied to real data sets. We will use the Libsvm developed by Chang [11]. Usually, the recommended kernel function [12] for nonlinear problems is the Gaussian radial basis function, because it resembles the sigmoid kernel for certain parameters and it requires less parameters than a polynomial kernel. The kernel function parameter γ and the parameter C, which controls the complexity of the decision function versus the training error minimization, can be determined by running a 2 dimensional grid search, which means that the values for pairs of parameters (C, γ) are generated in a predefined interval with a fixed step. The performance of each combination is computed and used to determine the best pair of parameters. The non-sparse property of the solution leads to a really slow evaluation process. Thus, for the microarray datasets a data reduction [13] can be done in terms of genes or features of the dataset considered. Redundant or highly correlated features can be replaced with a smaller uncorrelated number of features capturing the entire information. This is done by applying a method called Principal Component Analysis (PCA) before using the SVM algorithm. The method is performed by solving an eigenvector problem or by using iterative algorithms and the result is a set of orthogonal vectors called principal components. The mapping of the larger set into the new smaller set is done by projecting the initial instances on the principal components. The first principal component is defined 2

3 as the direction given by a linear regression fit through the input data. This direction will hold the maximum variance in the input data. The second component is orthogonal on the first vector, uncorrelated and it is defined to maximize the remaining variance. This procedure is repeated until the last vector is obtained. The envisioned research will follow the main steps of knowledge discovery processes: - Gene selection - the irrelevant attributes (genes) are removed and the selected data is represented as a two-dimensional table. - Preprocessing - if the selected table contains missing values or empty cell entries, the table must be preprocessed in order to remove some of the incompleteness. Statistics should be run to obtain more information about the data. - Training and validation sample - the initial table is divided into at least two tables by using a crossvalidation procedure. One will be used in the training step, the other in the validation or testing step. - Interpretation and evaluation - the validation or test data set is then used to test the classificatory performance of the methods in terms of efficiency and accuracy. A time projection for the project is given in the next table. Table 2. Project Time Table Task Name Literature review research about the gene expression data, support vector machine techniques and feature selection algorithms. Developing programs that automatically test machine learning algorithms against for classification and prediction. Full scale integration off the successful algorithms to large gene expression datasets. Dissemination of results through papers and communications at specific conferences. Evaluation of the applicability of the developed algorithms to other datasets S O N D J F M A M J J A 4. References [1] R. Burbridge, M. Trotter, B. Buxton, and S. Holden, Drug design by machine learning; support vector machines for pharmaceutical data analysis. Computers and Chemistry 26:5-14, [2] M. Molla, M. Waddell, D. Page, J.and Shavlik, Using Machine Learning to Design and Interpret Gene- Expression Microarrays. AI Magazine 25:23-44, [3] Z. Wang, Y. Wang, J. Lu, S. Kung, J. Zhang, R. Lee, J. Xuan, at al., Discriminatory Mining of Gene Expression Microarray Data. The Journal of VLSI Signal Processing 35: , [4] W. Dubitzky, M. Granzow, and D. Berrar, Data Mining and Machine Learning Methods for Microarray Analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis - Papers from CAMDA 2000, Boston. Kluwer, Academic Publishers, [5] P. S. Bradley and O. L. Mangasarian, Feature Selection via Concave Minimization and Support Vector Machines. In Machine Learning Proceedings of the Fifteenth International Conference(ICML '98), J. Shavlik, editor, Morgan Kaufmann, San Francisco, California, 82-90, [6] S. Cho, and H. Won, Machine Learning in DNA Microarray Analysis for Cancer Classification. APBC 2003: ,

4 [7] B. Schölkopf and A. Smola, Learning with Kernels. MIT Press, Cambridge Massachusetts, [8] V. N. Vapnik, The Nature of Statistical Learning Theory, 2 nd edition, Springer-Verlag, New York, NY, [9] J.B. Tobler, M.N. Molla, E.F. Nuwaysir, R.D. Green, and J.W. Shavlik, Evaluating machine learning approaches for aiding probe selection for gene-expression arrays. Bioinformatics 18: , [10] T. Joachims, Making large-scale SVM learning practical., In B. Scholkopf, C. J. C. Burges and A. j. Smola, editors, Advances in Kernel Methods Support Vector Learning, pp , MIT Press,, Cambrige, MA, [11] C.-C. Chang, and C.-J. Lin, LIBSVM: a library for support vector machines, Software available at [12] N. Cristianini and J. Shawe -Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge, England, [13] Y.-J. Lee and O.L. Mangasarian, RSVM: Reduced Support Vector Machines, Proc. Of the First SIAM International Conference on Data Mining, Chicago, April 5-7, Impact on the Goal of CREU The foremost goal of the CREU project is to encourage females and minorities to pursue graduate work and study in the field of computer science. This project will provide a realistic research experience for the two female undergraduates, by active involvement in the planning, execution and interpretation of scientific research. Welldeveloped research projects can significantly enrich the educational experience for undergraduate students. Working on this research project, students will be able to enhance their computer and programming skills, apply those skills to investigate scientific problems, learn how to formulate questions and problems and to participate in the discovery of new knowledge. A good research experience can foster an enthusiasm for lifelong learning and a desire to continue education beyond the baccalaureate. Successful scientific instruction should develop in student a sense of wonder and curiosity about the world. The students will be exposed to both sides of the scientific investigation: hypothesis testing and development of theoretical explanations of observations. No science education is complete without research related activities, technical writing and oral presentations. Darcy Davis feels that this project will certainly support the goals of CREU. As an undergraduate female with intentions of pursuing graduate work in computer science, this project will give her, a useful introduction to the practical applications of her studies for research, focusing on artificial intelligence. This is a project that can potentially be a foundation for her senior thesis, and the mathematical concepts will be a wonderful basis for the presentations she intends to make at this year's mathematics conferences. Allison Hanuschak believes that this research project will introduce the world of graduate research to her and will be an exceptional opportunity to gain valuable research experience. In addition, by completing the CREU project, she thinks that she will have a distinct advantage for admissions to the graduate school of her choice. Also, she hopes to to encourage fellow female students by setting an example for them and being a positive role model for continuing study in computer science. All in all, this experience will be beneficial for her and will also aid her in pursuing graduate study. Both students intend to present this project at the 2005 YSU QUEST conference. 6. Student Activity and Responsibilities Specific tasks for the two participant students will include: literature search and review, reading and discussing research articles, designing and implementing data mining and machine learning algorithms, data processing, data analysis and interpretation, summarizing and preparing results for presentations and publications, participation at the YSU QUEST 2005 and writing the final report. The primary responsibility of the two students is to participate in all phases of the project: proposal, development, experiments, and dissemination. The students will be required to do weekly independent work and to 4

5 schedule team meetings. It is important that they work together as a team. The faculty advisor will meet with the students every other week. will be used for questions, announcements and documents interchange. 7. Faculty Activity and Responsibilities As faculty advisor for the proposed project, Dr. Alina Lazar will work to actively mentor the two students and continuously supervise their progress during the one year period. She will meet with the students on regular basis to guide their activities and answer their questions related to the project. Dr. Lazar has extensive experience in data mining, machine learning and artificial intelligence and she has written several papers related to the subject of this proposal. Her knowledge will make this project an enjoyable research experience for the undergraduate students. The department will supply a small computer lab and Dr. Lazar will provide the necessary software from funds previously obtained through the university. She guided the students on how to develop and write the present proposal and she will help them with the final report and also with the preparation of a conference paper. The overall guidance and mentoring will not refer only to this project but it will provided insights about how to apply and how to succeed in graduate school, about being a female computer scientist and what the options are after graduate school. 8. Budget For the proposed project we are requesting $2000 for the two participant female students. An additional $500 will be used to buy computer media, books and other materials necessary for the project. While working on the project the students will be encouraged to apply for the Undergraduate Research Grant Award sponsored by the Youngstown State University and other scholarships. 5

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics 2017-2018 GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics Entrance requirements, program descriptions, degree requirements and other program policies for Biostatistics Master s Programs

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

AD (Leave blank) PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

AD (Leave blank) PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland AD (Leave blank) Award Number: W81XWH-09-1-0282 TITLE: Georgetown University and Hampton University Prostate Cancer Undergraduate Fellowship Program PRINCIPAL INVESTIGATOR: Anna Riegel, PhD CONTRACTING

More information

Exposé for a Master s Thesis

Exposé for a Master s Thesis Exposé for a Master s Thesis Stefan Selent January 21, 2017 Working Title: TF Relation Mining: An Active Learning Approach Introduction The amount of scientific literature is ever increasing. Especially

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

GUIDELINES FOR COMBINED TRAINING IN PEDIATRICS AND MEDICAL GENETICS LEADING TO DUAL CERTIFICATION

GUIDELINES FOR COMBINED TRAINING IN PEDIATRICS AND MEDICAL GENETICS LEADING TO DUAL CERTIFICATION GUIDELINES FOR COMBINED TRAINING IN PEDIATRICS AND MEDICAL GENETICS LEADING TO DUAL CERTIFICATION PREAMBLE This document is intended to provide educational guidance to program directors in pediatrics and

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor.

Prerequisite: General Biology 107 (UE) and 107L (UE) with a grade of C- or better. Chemistry 118 (UE) and 118L (UE) or permission of instructor. Introduction to Molecular and Cell Biology BIOL 499-02 Fall 2017 Class time: Lectures: Tuesday, Thursday 8:30 am 9:45 am Location: Name of Faculty: Contact details: Laboratory: 2:00 pm-4:00 pm; Monday

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Nanotechnology STEM Program via Research Experience for High School Teachers

Nanotechnology STEM Program via Research Experience for High School Teachers Nanotechnology STEM Program via Research Experience for High School Teachers Mangilal Agarwal 1,*, Qurat-ul-Ann Mirza 3, 7, Joseph Bondi 3, 7, Brandon Sorge 3, Maher Rizkalla 1,4, Richard Ward 2, Corbin

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

BIOH : Principles of Medical Physiology

BIOH : Principles of Medical Physiology University of Montana ScholarWorks at University of Montana Syllabi Course Syllabi Spring 2--207 BIOH 462.0: Principles of Medical Physiology Laurie A. Minns University of Montana - Missoula, laurie.minns@umontana.edu

More information

Department of Anatomy and Cell Biology Curriculum

Department of Anatomy and Cell Biology Curriculum Department of Anatomy and Cell Biology Curriculum The graduate program in Anatomy and Cell Biology prepares the student for a research and/or teaching career with concentrations in one or more of the following:

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

What Teachers Are Saying

What Teachers Are Saying How would you rate the impact of the Genes, Genomes and Personalized Medicine program on your teaching practice? Taking the course helped remove the fear of teaching biology at a molecular level and helped

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

MYCIN. The MYCIN Task

MYCIN. The MYCIN Task MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Knowledge-Based - Systems

Knowledge-Based - Systems Knowledge-Based - Systems ; Rajendra Arvind Akerkar Chairman, Technomathematics Research Foundation and Senior Researcher, Western Norway Research institute Priti Srinivas Sajja Sardar Patel University

More information

BIOLOGICAL CHEMISTRY MASTERS PROGRAM

BIOLOGICAL CHEMISTRY MASTERS PROGRAM BIOLOGICAL CHEMISTRY MASTERS PROGRAM STUDENT HANDBOOK 2017-2018 About the Cover Jennifer Gehret McCarthy, Ph.D. (BioChem 2012) The marine environment, full of bioactive natural products, is largely untapped.

More information

BENCHMARK TREND COMPARISON REPORT:

BENCHMARK TREND COMPARISON REPORT: National Survey of Student Engagement (NSSE) BENCHMARK TREND COMPARISON REPORT: CARNEGIE PEER INSTITUTIONS, 2003-2011 PREPARED BY: ANGEL A. SANCHEZ, DIRECTOR KELLI PAYNE, ADMINISTRATIVE ANALYST/ SPECIALIST

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Dana Carolyn Paquin Curriculum Vitae

Dana Carolyn Paquin Curriculum Vitae Dana Carolyn Paquin Curriculum Vitae Education 2007 Ph.D., Mathematics, Stanford University. Thesis: Multiscale methods for image registration. 2002 B.S., Mathematics (Magna Cum Laude), Davidson College.

More information

Biomedical Sciences (BC98)

Biomedical Sciences (BC98) Be one of the first to experience the new undergraduate science programme at a university leading the way in biomedical teaching and research Biomedical Sciences (BC98) BA in Cell and Systems Biology BA

More information

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms ABSTRACT DEODHAR, SUSHAMNA DEODHAR. Using Grammatical Evolution Decision Trees for Detecting Gene-Gene Interactions in Genetic Epidemiology. (Under the direction of Dr. Alison Motsinger-Reif.) A major

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Time series prediction

Time series prediction Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Agent-Based Software Engineering

Agent-Based Software Engineering Agent-Based Software Engineering Learning Guide Information for Students 1. Description Grade Module Máster Universitario en Ingeniería de Software - European Master on Software Engineering Advanced Software

More information

Program in Molecular Medicine

Program in Molecular Medicine Graduate Program in Life Sciences Program in Molecular Medicine Student and Faculty Handbook 2017-2018 UNIVERSITY OF MARYLAND GRADUATE SCHOOL UNIVERSITY OF MARYLAND SCHOOL OF MEDICINE Graduate Program

More information

A project-based learning approach to protein biochemistry suitable for both face-to-face and distance education students

A project-based learning approach to protein biochemistry suitable for both face-to-face and distance education students A project-based learning approach to protein biochemistry suitable for both face-to-face and distance education students R.J. Prior, School of Health Studies, University of Canberra, Australia J.K. Forwood,

More information

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.) PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.) OVERVIEW ADMISSION REQUIREMENTS PROGRAM REQUIREMENTS OVERVIEW FOR THE PH.D. IN COMPUTER SCIENCE Overview The doctoral program is designed for those students

More information

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE Mingon Kang, PhD Computer Science, Kennesaw State University Self Introduction Mingon Kang, PhD Homepage: http://ksuweb.kennesaw.edu/~mkang9

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Classroom Assessment Techniques (CATs; Angelo & Cross, 1993)

Classroom Assessment Techniques (CATs; Angelo & Cross, 1993) Classroom Assessment Techniques (CATs; Angelo & Cross, 1993) From: http://warrington.ufl.edu/itsp/docs/instructor/assessmenttechniques.pdf Assessing Prior Knowledge, Recall, and Understanding 1. Background

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

A survey of multi-view machine learning

A survey of multi-view machine learning Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

A Vector Space Approach for Aspect-Based Sentiment Analysis

A Vector Space Approach for Aspect-Based Sentiment Analysis A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer

More information

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison

More information

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Massachusetts Institute of Technology Tel: Massachusetts Avenue  Room 32-D558 MA 02139 Hariharan Narayanan Massachusetts Institute of Technology Tel: 773.428.3115 LIDS har@mit.edu 77 Massachusetts Avenue http://www.mit.edu/~har Room 32-D558 MA 02139 EMPLOYMENT Massachusetts Institute of

More information

Computational Data Analysis Techniques In Economics And Finance

Computational Data Analysis Techniques In Economics And Finance Computational Data Analysis Techniques In Economics And Finance If searched for a ebook Computational Data Analysis Techniques in Economics and Finance in pdf format, in that case you come on to correct

More information

Biology 10 - Introduction to the Principles of Biology Spring 2017

Biology 10 - Introduction to the Principles of Biology Spring 2017 Biology 10 - Introduction to the Principles of Biology Spring 2017 Welcome to Bio 10! Lecture: Monday and Wednesday Lab: Monday 7:00 10:00pm or 5:30-7:00pm Wednesday 7:00 10:00pm Room: 2004 Lark Hall Room:

More information

Rule-based Expert Systems

Rule-based Expert Systems Rule-based Expert Systems What is knowledge? is a theoretical or practical understanding of a subject or a domain. is also the sim of what is currently known, and apparently knowledge is power. Those who

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute Page 1 of 28 Knowledge Elicitation Tool Classification Janet E. Burge Artificial Intelligence Research Group Worcester Polytechnic Institute Knowledge Elicitation Methods * KE Methods by Interaction Type

More information

Observing Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers

Observing Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers Observing Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers Dominic Manuel, McGill University, Canada Annie Savard, McGill University, Canada David Reid, Acadia University,

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

Jeff Walker Office location: Science 476C (I have a phone but is preferred) 1 Course Information. 2 Course Description

Jeff Walker Office location: Science 476C   (I have a phone but  is preferred) 1 Course Information. 2 Course Description BIO 221 Human Physiology I Jeff Walker Office location: Science 476C E-mail: walker@maine.edu (I have a phone but e-mail is preferred) Fall 2017 1 Course Information Room Science 105 Class meetings are

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance 901 Beyond the Blend: Optimizing the Use of your Learning Technologies Bryan Chapman, Chapman Alliance Power Blend Beyond the Blend: Optimizing the Use of Your Learning Infrastructure Facilitator: Bryan

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

What can I learn from worms?

What can I learn from worms? What can I learn from worms? Stem cells, regeneration, and models Lesson 7: What does planarian regeneration tell us about human regeneration? I. Overview In this lesson, students use the information that

More information

To link to this article: PLEASE SCROLL DOWN FOR ARTICLE

To link to this article:  PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Dr Brian Winkel] On: 19 November 2014, At: 04:59 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Master's Programme Biomedicine and Biotechnology

Master's Programme Biomedicine and Biotechnology Master's Programme Biomedicine and Biotechnology Translation of the curriculum, published June 2 nd, 2009 in the bulletin ( Mitteilungsblatt ) of the University of Veterinary Medicine, Vienna. University

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Mathematics. Mathematics

Mathematics. Mathematics Mathematics Program Description Successful completion of this major will assure competence in mathematics through differential and integral calculus, providing an adequate background for employment in

More information

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients S.Sambath Kumar 1, Dr M. Nandhini 2, 1 Research scholar, 2 Assistant Professor 1,2 Department of Computer Science, Pondicherry

More information

Stephanie Ann Siler. PERSONAL INFORMATION Senior Research Scientist; Department of Psychology, Carnegie Mellon University

Stephanie Ann Siler. PERSONAL INFORMATION Senior Research Scientist; Department of Psychology, Carnegie Mellon University Stephanie Ann Siler PERSONAL INFORMATION Senior Research Scientist; Department of Psychology, Carnegie Mellon University siler@andrew.cmu.edu Home Address Office Address 26 Cedricton Street 354 G Baker

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and Name Qualification Sonia Thomas Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept. 2016. M.Tech in Computer science and Engineering. B.Tech in

More information