Categorical Data Analysis and Generalized Linear Models (CDA)

Similar documents
ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

12- A whirlwind tour of statistics

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Doctor of Public Health (DrPH) Degree Program Curriculum for the 60 Hour DrPH Behavioral Science and Health Education

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

CS/SE 3341 Spring 2012

TU-E2090 Research Assignment in Operations Management and Services

Discovering Statistics

EPI BIO 446 DESIGN, CONDUCT, and ANALYSIS of CLINICAL TRIALS 1.0 Credit SPRING QUARTER 2014

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

BIOH : Principles of Medical Physiology

Course outline. Code: LFS303 Title: Pathophysiology

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Lecture 1: Machine Learning Basics

ACCOUNTING FOR MANAGERS BU-5190-AU7 Syllabus

Foothill College Summer 2016

Detailed course syllabus

ACCOUNTING FOR MANAGERS BU-5190-OL Syllabus

Evaluation of Learning Management System software. Part II of LMS Evaluation

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

SY 6200 Behavioral Assessment, Analysis, and Intervention Spring 2016, 3 Credits

Office Hours: Mon & Fri 10:00-12:00. Course Description

Analysis of Enzyme Kinetic Data

Graduate Program in Education

Assessment. the international training and education center on hiv. Continued on page 4

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Nutrition 10 Contemporary Nutrition WINTER 2016

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

New Venture Financing

Hierarchical Linear Models I: Introduction ICPSR 2015

MGT/MGP/MGB 261: Investment Analysis

Nottingham Trent University Course Specification

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

What is related to student retention in STEM for STEM majors? Abstract:

Discovering Statistics

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Constructing a support system for self-learning playing the piano at the beginning stage

Faculty of Health and Behavioural Sciences School of Health Sciences Subject Outline SHS222 Foundations of Biomechanics - AUTUMN 2013

Litterature review of Soft Systems Methodology

value equivalent 6. Attendance Full-time Part-time Distance learning Mode of attendance 5 days pw n/a n/a

Math 181, Calculus I

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Briefing document CII Continuing Professional Development (CPD) scheme.

Assignment 1: Predicting Amazon Review Ratings

Julia Smith. Effective Classroom Approaches to.

POFI 1349 Spreadsheets ONLINE COURSE SYLLABUS

Bluetooth mlearning Applications for the Classroom of the Future

A Practical Introduction to Teacher Training in ELT

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Course specification

Master of Statistics - Master Thesis

Critical Care Current Fellows

THE UNITED REPUBLIC OF TANZANIA MINISTRY OF EDUCATION, SCIENCE, TECHNOLOGY AND VOCATIONAL TRAINING CURRICULUM FOR BASIC EDUCATION STANDARD I AND II

Presentation Advice for your Professional Review

Math 098 Intermediate Algebra Spring 2018

Curriculum for the Academy Profession Degree Programme in Energy Technology

A student diagnosing and evaluation system for laboratory-based academic exercises

School of Innovative Technologies and Engineering

Spring 2012 MECH 3313 THERMO-FLUIDS LABORATORY

Introduction To Business Management Du Toit

CS Machine Learning

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Course outline. Code: HLT100 Title: Anatomy and Physiology

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Accreditation of Prior Experiential and Certificated Learning (APECL) Guidance for Applicants/Students

Answers To Hawkes Learning Systems Intermediate Algebra

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

A Program Evaluation of Connecticut Project Learning Tree Educator Workshops

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Content Teaching Methods: Social Studies. Dr. Melinda Butler

COURSE HANDBOOK 2016/17. Certificate of Higher Education in PSYCHOLOGY

Course specification

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Designing Idents for Television

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

MTH 141 Calculus 1 Syllabus Spring 2017

ECON 6901 Research Methods for Economists I Spring 2017

School: Business Course Number: ACCT603 General Accounting and Business Concepts Credit Hours: 3 hours Length of Course: 8 weeks Prerequisite: None

Medical Terminology - Mdca 1313 Course Syllabus: Summer 2017

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

STUDENT MOODLE ORIENTATION

Page 1 of 8 REQUIRED MATERIALS:

AU MATH Calculus I 2017 Spring SYLLABUS

Chemistry 106 Chemistry for Health Professions Online Fall 2015

Transcription:

Unit of Study Notes Categorical Data Analysis and Generalized Linear Models (CDA) Semester 2, 2015 Prepared by, Prof Annette Dobson and Dr Mark Jones School of Population Health The University of Queensland Copyright School of Population Health, The University of Queensland 1

Contact details Welcome to CDA. The unit co-ordinator is Mark Jones and the UQ program co-ordinator is Annette Dobson. Mark Jones will be communicating with you throughout the unit and will be marking the assessments. Our contact details are as follows: Dr Mark Jones School of Population Health, Public Health Building University of Queensland Herston Road, Herston, QLD 4006 E-mail: m.jones@sph.uq.edu.au Office phone: 07-3365 5116; Fax: 07-3365 5540 Professor Annette Dobson School of Population Health, Public Health Building University of Queensland Herston Road, Herston, QLD 4006 E-mail: a.dobson@sph.uq.edu.au Office phone: 07-3365 5346; Fax: 07-3365 5540 2

Background This unit, Categorical Data Analysis and Generalized Linear Models (CDA), is about statistical methods for analysing data when the response or outcome variable is categorical. Methods for contingency tables have a long history but are often somewhat ad hoc. Most methods for analysing categorical data, however, are special cases of Generalized Linear Models (GLMs). These include modelling count data (e.g., using Poisson regression); binary data (using logistic regression); data in more than two nominal categories (nominal or multinomial regression); or more than two ordered categories (ordinal logistic regression). GLMs provide a unifying framework that you will meet again in other units such as SVA and LCD. Much of the material in CDA is similar to Annette Dobson s and Adrian Barnett s book An Introduction to Generalized Linear Models (third edition, Chapman Hall/CRC, 2008). The history of the relationship is that an early version of CDA was derived from an early version of the book but the material was changed over several years specifically for CDA. The revised (3 rd edition) of the book was based on the CDA version but with changes. The CDA notes are designed specifically for distance delivery using the BCA model and are independent of and different from the book. Aim of the Unit The aim of the unit is to enable you to use generalized linear models and other methods to analyse categorical data with proper attention to the underlying assumptions. There is an emphasis on the practical interpretation and communication of results to colleagues and clients who may not be statisticians. Objectives On completion of this unit you should: 1. be able to explain and use standard methods for analysing data in contingency tables, including matched and stratified data; 2. understand the theory of GLMs and statistical inference based on GLMs for categorical data; 3. use correctly logistic regression models for binary, multinomial and ordinal categorical data 4. analyse correctly count data using Poisson regression. 3

Assumed knowledge The following BCA units are recommended pre-requisites MBB: Mathematical Background for Biostatistics EPI: Epidemiology PDT: Probability and Distribution Theory PSI: Principles of Statistical Inference LMR: Linear Models Modules 2 and 3 in particular build on material presented in Principles of Statistical Inference (PSI). Some students in previous years have commented that their PSI notes were useful in refreshing their memory on concepts such as Wald, Score and likelihood ratio tests. The statistical foundations that you develop in CDA will be invaluable to you in your career as a statistician and in subsequent BCA units such as Longitudinal and Correlated Data (LCD), Bayesian Statistical Methods (BAY) and Bioinformatics (BIF). LMR is a recommended pre-requisite but due to timetabling constraints some students may be taking CDA and LMR concurrently. The extent to which this may be a problem depends on each student s prior knowledge and experience of statistical modelling, including multiple regression, analysis of variance and the use of diagnostics. For students who have done LMR you may want to refresh your knowledge of strategies for analysis and the vagaries of model building. Overview The unit is organised into six Modules each taking 2 weeks. The modules are in three distinct groups. Module 1 is a refresher of the disparate methods for analysing categorical data that you have encountered previously in introductory statistics and epidemiology units. We think it is important to revise this material so you are better able to link it to the approach presented in CDA. As there are many excellent textbooks on these topics we have used excerpts from one of these for this module: Agresti s book Categorical Data Analysis (second edition, John Wiley & Sons, Inc. 2002). Modules 2 and 3 are very different. They establish the statistical foundations for a unified approach to modelling categorical (and other forms) of data, namely generalized linear models (GLMs). These modules rely heavily on PSI (and hence MBB and PDT). Beware: Some students experience shock at the gear change between Module 1 and Modules 2 and 3 do not panic! Modules 4-6 bring it all together. They work through specific GLMs needed for the types of problems introduced in Module 1 but in a unified way that also links closely to LMR. 4

Contents Module 1. July 27 August 9 Introduction to and revision of conventional methods for contingency tables especially in epidemiology: odds ratios and relative risks, chi-squared tests for independence, Mantel-Haenszel methods for stratified tables, and methods for paired data. Module 2. August 10 August 23 The exponential family of distributions; generalized linear models (GLMs), and parameter estimation for GLMs Module 3. August 24 September 6 Inference for GLMs including the use of score, Wald and deviance statistics for confidence intervals and hypothesis tests, and residuals. Module 4. September 7 September 20 Binary variables and logistic regression models including methods for assessing model adequacy. Module 5. September 21 September 26 October 5 October 11 Nominal and ordinal logistic regression for categorical response variables with more than two categories. Module 6. October 12 October 25 Count data and Poisson regression 5

Reference books Agresti A. An Introduction to Categorical Data Analysis, Wiley InterScience, 1996, ISBN 0-471-11338-7. Agresti A. Categorical Data Analysis (second edition), Wiley, 2002, ISBN 0-471-36093-7 Agresti A. Analysis of Ordinal Categorical Data, Wiley, 1984 Dobson AJ and Barnett AG. An Introduction to Generalized Linear Models (third edition), published Chapman Hall / CRC in 2008, ISBN 978-1-58488-950-2. Hilbe JM. Logistic Regression Models, Chapman & Hall/CRC Press, 2010 Kirkwood BR, Sterne JAC. Essential Medical Statistics (second edition) Blackwell, 2003, ISBN 0-86542-871-9. Le CT. Applied Categorical Data Analysis, Wiley, 1998. Woodward M. Epidemiology: Study Design and Data Analysis (second edition), published Chapman Hall / CRC in 2005, ISBN 978-1-58488-415-6. Hardin JW and Hilbe JM. Generalized Linear Models and Extensions (second edition), published Stata Press, 20 Feb 2007, ISBN 1597180149, 9781597180146. Software You will need to use statistical software for the exercises and assignments. Stata is the default for this unit. Hilbe s book has detailed R commands corresponding to most of the Stata commands used in the book. Woodward s book and the supplementary materials on the web include examples using SAS and Stata. R code for many of the examples and exercises in Modules 2-6 is given in the book by Dobson and Barnett. Agresti s book includes an appendix about SAS (and some other software) commands for methods covered in CDA. For some exercises Excel may be a suitable tool. However, you may use whatever you like. 6

Timetable Week beginning Monday Module Co-ordinator Assignment to be submitted Monday 27 July Module_1 Mark Jones 3 August 10 August Module_2 17 August 24 August Module_3 31 August 7 September Module_4 Assignment 1 14 September 21 September Module_5 28 September Break 5 October Module_5 Mark Jones 12 October Module_6 Assignment 2 19 October 26 October Study 2 November Assignment 3 7

Method of Delivery & Communication At the start of semester we will send a welcome email and ask if you wish to receive a hard copy of the unit materials. If you respond positively to this question the unit materials will be posted to you, with your copy of this guide. The course notes are also available on the BCA elearning site, along with the data sets for exercises and assignments. However the readings may not be available on the BCA elearning site hence these may be emailed to you. We would like to encourage the use of the discussion board facilities on the elearning site, in order to try and reduce the isolation of studying by distance. Firstly, you will see a Student Introductions forum on the discussion board. You can add your own information to this forum, if you wish, so that others in the course can contact you. For example: Jonathan Bloggs j.bloggs@ctc.edu.au ph: 02-9999-9999 NHMRC Clinical Trials Centre, Sydney Jonathan is a trainee biostatistician at the Clinical Trials Centre. He is currently working with trials of new medications for diabetes and heart disease. This is entirely optional. If you would like to be part of the forum, but without your contact details, that will be fine as well. When you log in to the elearning site, you will see under Discussions various forum headings. We will include some general discussion points in each module to encourage discussion amongst the group, but would like you to discuss matters and help each other as much as you can. Some students in the past have said they haven t used the discussion board as much as they would have liked, as they didn t want to be seen to be colluding in the preparation of assignments. We encourage discussion about the course material, and assignments, as long as worked answers are not given. Assessment The assessment is based entirely on assignments. There is no examination and no marks awarded for online discussions. There are two assignments each worth 35% of the marks and one assignment worth 30%. These will involve analysing real data sets. They will give you scope to demonstrate insight and flair! The due dates for the assessment items are shown in the Timetable on page 7. They are due in on Mondays. 8

Assignments will be posted on the elearning site. Write your assignment in any style (e.g. journal, technical report) but make sure that the layout is clear and that all the questions are answered. Marks will be allocated for presentation. Do not be afraid to use long, but descriptive and specific, headings or sub-headings (e.g. Methods for assessing statistical interactions instead of Methods ). Remember to define any acronyms you use, and briefly explain any new terminology or assumptions. Marks will be lost (from the style section) for assignments that are too long or include irrelevant material that indicates that you did not understand the question. Raw computer output is not acceptable. The following two documents available on the BCA website as resources for current students may be helpful: Guide for Reporting Statistical Results Referencing Style Guide They are available at www.bca.edu.au/currentstudents.html Before commencing the course, you should read the BCA assessment guide (Appendix), and the information about the plagiarism policy of your home university. Assessment deadlines are important. Extensions or late submissions policy Requests for an extension an assignment must be made in advance of the due date. Requests must be made directly to the module coordinator by email. The module coordinator will reply with the decision as to whether an extension has been granted and the new due date. Extensions can cause delays in feedback for other students who submitted on time. Also due to prerequisites, late results may preclude you from studying subsequent units. Different universities have different result submission deadlines. BCA results have to be transmitted between universities, which shortens the available time. Feedback Your Assignments will be returned to you via the elearning site. 9

Outline answers for exercises in each Module will be posted on the elearning site after students have had a chance to attempt the exercises without access to the solutions. Model answers for Assignments are not really appropriate as there is hardly ever a unique best solution. With permission of the students concerned I would like to adopt a system used in CDA in previous years. This involves posting on the elearning site for each Assignment the work of two student assignments who received high marks. 10

Complaints policy Please see the BCA complaints policy in the Assessment Guide and in online assessment submission pages. Summary of changes to materials and/or procedures since last delivery The main issues for CDA that have been identified by previous students and the BCA peer review process are the jumps in concepts and methods between Module 1 and Modules 2-3 and again for Modules 4-6 but at the end it does come together. BCA peer reviewers tend not to like the inclusion of Module 1 but students do like it and we have tried to connect it to the other modules whenever we can. We did a major revision in 2012 where we changed the textbook used for module 1 so that module 1 would better connect with the material presented in module 2. We also edited module 3 to hopefully make it clearer. In addition we developed completely new assignments as well as created power-point slides with audio to enhance learning. In 2014 we made three additional power-point files for modules 4-6 that show analysis of the same large data set to illustrate fitting and evaluating models for binary/binomial outcomes (module 4), nominal/ordinal outcomes (module 5) and count outcomes (module 6). You will get access to the data set to enable you to run your own analyses. This year we have made additional changes based on feedback from students last year. We have further simplified module 3 by removing non-essential material showing the derivation of the sampling distributions for various statistics presented in the module. Our plan was to also create additional videos for the more mathematical material presented in modules 2 and 3. However on searching the internet we found many relevant online videos which provide good explanations of the concepts. Hence we have collated a list of recommended online videos for students to access to enhance their understanding. We also plan to introduce new assignments for 2015. 11

Appendix BCA Assessment Guide Can be downloaded from: http://www.bca.edu.au/linked%20docs/student%20resources/bca_assessment_guide_student.pdf 12