Enterprise Computing Community Conference 2011 Marist College, Poughkeepsie, NY June 12-14, 2011

Similar documents
A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Statistics and Data Analytics Minor

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Emporia State University Degree Works Training User Guide Advisor

Evaluation of a College Freshman Diversity Research Program

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

OFFICE SUPPORT SPECIALIST Technical Diploma

Bellevue University Admission Application

READY OR NOT? CALIFORNIA'S EARLY ASSESSMENT PROGRAM AND THE TRANSITION TO COLLEGE

STA 225: Introductory Statistics (CT)

Millersville University Degree Works Training User Guide

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

REGISTRATION. Enrollment Requirements. Academic Advisement for Registration. Registration. Sam Houston State University 1

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Race, Class, and the Selective College Experience

Python Machine Learning

NC Education Oversight Committee Meeting

UDW+ Student Data Dictionary Version 1.7 Program Services Office & Decision Support Group

On-Line Data Analytics

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

The Condition of College & Career Readiness 2016

NTU Student Dashboard

Access Center Assessment Report

Multiple Measures Assessment Project - FAQs

Academic Freedom Intellectual Property Academic Integrity

Shelters Elementary School

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Baker College Waiver Form Office Copy Secondary Teacher Preparation Mathematics / Social Studies Double Major Bachelor of Science

LAKEWOOD HIGH SCHOOL LOCAL SCHOLARSHIP PORTFOLIO CLASS OF

Requirements for the Degree: Bachelor of Science in Education in Early Childhood Special Education (P-5)

ADMISSION TO THE UNIVERSITY

PowerCampus Self-Service Student Guide. Release 8.4

Effective practices of peer mentors in an undergraduate writing intensive course

Computerized Adaptive Psychological Testing A Personalisation Perspective

ADVANCED PLACEMENT STUDENTS IN COLLEGE: AN INVESTIGATION OF COURSE GRADES AT 21 COLLEGES. Rick Morgan Len Ramist

Naviance / Family Connection

Educational Attainment

PROGRAMME SPECIFICATION KEY FACTS

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Mining Association Rules in Student s Assessment Data

Miami-Dade County Public Schools

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

VOL VISION 2020 STRATEGIC PLAN IMPLEMENTATION

Lecture 1: Machine Learning Basics

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

ProMedica Defiance Regional Hospital Physicians Scholarship Fund Guidelines and Application

Course Development Using OCW Resources: Applying the Inverted Classroom Model in an Electrical Engineering Course

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

Bellevue University Bellevue, NE

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

College of Education & Social Services (CESS) Advising Plan April 10, 2015

MASTER OF EDUCATION DEGREE: PHYSICAL EDUCATION GRADUATE MANUAL

Raw Data Files Instructions

PEIMS Submission 1 list

Online Master of Business Administration (MBA)

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report

B.S/M.A in Mathematics

BENCHMARK TREND COMPARISON REPORT:

Psychometric Research Brief Office of Shared Accountability

ACHE DATA ELEMENT DICTIONARY as of October 6, 1998

Content-free collaborative learning modeling using data mining

Capturing and Organizing Prior Student Learning with the OCW Backpack

Institution of Higher Education Demographic Survey

UNIVERSITY GRADUATE SCHOOL RULES AND REGULATIONS

Indiana Collaborative for Project Based Learning. PBL Certification Process

Florida A&M University Graduate Policies and Procedures

Assessing the Impact of an Academic Recovery Program

10/6/2017 UNDERGRADUATE SUCCESS SCHOLARS PROGRAM. Founded in 1969 as a graduate institution.

ACADEMIC ALIGNMENT. Ongoing - Revised

DegreeWorks Advisor Reference Guide

Applications of data mining algorithms to analysis of medical data

Humboldt-Universität zu Berlin

Towards a Collaboration Framework for Selection of ICT Tools

The University of Akron. College Credit Plus Program

Data Glossary. Summa Cum Laude: the top 2% of each college's distribution of cumulative GPAs for the graduating cohort. Academic Honors (Latin Honors)

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Georgia State University Official Transcript Statement of Authenticity

Best Colleges Main Survey

Grade 6: Correlated to AGS Basic Math Skills

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

LIM College New York, NY

College of Liberal Arts (CLA)

Degree Qualification Profiles Intellectual Skills

Welcome to the session on ACCUPLACER Policy Development. This session will touch upon common policy decisions an institution may encounter during the

Average Daily Membership Proposed Change to Chapter 8 Rules and Regulations for the Wyoming School Foundation Program

Strategic Plan Dashboard Results. Office of Institutional Research and Assessment

Teacher intelligence: What is it and why do we care?

Academic Intervention Services (Revised October 2013)

Admission ADMISSIONS POLICIES APPLYING TO BISHOP S UNIVERSITY. Application Procedure. Application Deadlines. CEGEP Applicants

K12 International Academy

Human Emotion Recognition From Speech

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Southeast Arkansas College 1900 Hazel Street Pine Bluff, Arkansas (870) Version 1.3.0, 28 July 2015

A STUDY ON THE EFFECTS OF IMPLEMENTING A 1:1 INITIATIVE ON STUDENT ACHEIVMENT BASED ON ACT SCORES JEFF ARMSTRONG. Submitted to

STUDENT LEARNING ASSESSMENT REPORT

Unequal Opportunity in Environmental Education: Environmental Education Programs and Funding at Contra Costa Secondary Schools.

MASTER OF ARTS IN APPLIED SOCIOLOGY. Thesis Option

Transcription:

Enterprise Computing Community Conference 2011 Marist College, Poughkeepsie, NY June 12-14, 2011 Eitel J.M. Lauría School of Computer Science & Mathematics Marist College Poughkeepsie, NY 12601 Eitel.Lauria@marist.edu Joshua Baron Senior Academic Technology Officer Marist College Poughkeepsie, NY 12601 Josh.Baron@marist.edu

In 2001, only 36% of students graduated in 4 years across all four-year institutions (US Dept. of Education). When considering 6+ years, the figure goes up to 58%. For Black and Hispanic students, the four-year degree completion rate drops to 21% and 25%. Similarly, only 28% of all students pursuing certificates or associates degrees in 2004 from two-year institutions completed their programs within three years. As a result, the United States now ranks 12th in the world in the percentage of 25- to -34-year-olds with an associate degree or higher.

Academic analytics is the term used to describe the application of data mining techniques to develop predictive models that can help monitor and anticipate student performance, and take action in issues related to student teaching and learning. Academic analytics, which combines select institutional data, statistical analysis, and predictive modeling to create intelligence upon which students, instructors, or administrators holds great potential to provide new and innovative technological tools for improving course and degree completion.

2004: Talavera used clustering to discover patterns reflecting user behaviors in learning management systems 2005: Laurie and Timothy used data mining as a strategy for assessing asynchronous discussion forums in online courses. 2005: researchers at the University of Georgia predicted with up to 74% accuracy, based on high school GPA and SAT math scores, the likelihood that a student would successfully complete an online course. 2007: Campbell combined factor analysis and logistic regression to develop predictive models trained with data extracted from CMS usage and student demographics. Recently, Purdue University, based on Campbell s seminal dissertation, has implemented a practical application: Course Signals (now supported by Sungard). Early academic warning system, determines in real time which students might be at risk Once identified, these students can receive interventions via notifications sent by their instructor which guide them to appropriate academic support resources, such as online practice exams or tutoring assistance, along with encouragement to use them

Objective: Expand use of academic analytics to improve course completion rates Marist is lead institution working with 6 partners and in collaboration with IBM and SGHE Funding ($250k) from Gates/Hewlett Foundation Administered by EDUCAUSE Next Generation Learning Challenges (NGLC) program

Organizational Capacity Sakai leadership Open-source experience Existing relationships Ed tech knowledge Retention expertise Innovators Open Academic Analytics Initiative (OAAI) Logic Model Project Activities Short-Term Outcomes Long-Term Impact Develop and Release Sakai SED API Release Enhanced OAAI Predictive Model and Release Under Open License Academic Analytics Course Pilots Diverse Academic Contexts Project Resources Open Educational Resources (OER) TTP Sakai Instance hosted at Marist In-kind contributions (graduate students) Campbell s research on CMS-based analytics Partnerships with IBM and SGHE Open-source CMS/LMS and BI tools suite Human Capital Mr. Baron Ed tech and Sakai community leadership Dr. Lauria Data mining & business intelligence expert Dr. Regan Educational technology research experience Ms. Ruiz-Grech Minority student support expert Ms. Fiore Tech supported learning services expert Ms. Cullen academic advising expertise Mr. Dashew instructional design and ed tech expert Mr. Harris technology implementation at HBCUs Mr. Gillman Sakai technical development expert Develop OAAI Predictive Model based on Marist Dataset Conduct Research on Predictive Model Portability Pilot and Research use of Online Academic Support Environment Publish Research Results on Portability and use of OASE Publish Best Practices for Using Sakai and Pentaho for Academic Analytics Demonstrate a 14% Increase in Students Receiving a B/C Grade and an 8% Increase in Course Completion Rates Between Control and Treatments Groups. Four of the Six OAAI Institutions to Scale Academic Analytics 20% of Sakai Community (55-65 Schools) Deploy Academic Analytics by 2016 SED = Student Effort Data

(www.sakaiproject.org) Open Source virtual learning environment Sakai Project started in 2004 Michigan, Indiana, Stanford, MIT (and Berkeley) Mellon Foundation Grant Currently 160+ Production and 140 Pilot Instances on 7 continents About to release version 2.7 Since then Marist College has become a prominent member of the Sakai community Adopted Sakai in 2006

Courses Portfolios Projects My Workspace

Collect Data Reduce Data Rescale / Transform Data Build Models Evaluate Models Apply Selected Models

Student Data (Demographics & Course enrollment) Banner Data Extraction (course event data aggregated, student data added, student identity removed) Data Pre-processing (missing values, outliers, incomplete records, derived features) Data Set Course Event Data Sakai

Identifying student information is removed during the data extraction process. The data collection process must comply with Marist College University s Human Subjects Institutional Review Board s (IRB) regulations regarding protection of human subjects. In addition to IRB, there are Family Educational Rights and Privacy Act (FERPA) issues as well that need to be addressed.

Feature High School Rank SAT Verbal Score SAT Math Score SAT Composite score ACT Composite Score Aptitude score Birth Date Age Race Gender Full-time or Part-time Status Class Code Cumulative GPA Semester GPA University Standing Description The high school rank as expressed as a percentile. The numeric SAT verbal score. The numeric SAT mathematics score. Defined as the sum of the SAT verbal and SAT math scores. The ACT composite score. Defined as the SAT composite score or the converted ACT to SAT score. In the cases in which students have both SAT and ACT scores, the SAT score will remain The birth date of the student Converted from the birth date, expressed in years. The race of the student (self-reported) The gender of the student (self-reported). Code for full-time or part-time student based on the number of credit hours currently enrolled. The current academic standing of the student as expressed by the number of semesters of completed coursework. Ranges from one to eight for undergraduate students. One (1) indicates a first semester freshman. Four (4) would indicate a second semester sophomore. Cumulative university grade point average (four point scale). Semester university grade point average (four point scale). Current university standing such as probation, dean s list, or semester honors.

Feature Subject Course Course size Course length Course Grade The Dept from which the course is offered. The course identification Description The number of students in the course/ section The length of the course, measured in weeks The final course grade of the student. Entries are A, B, C, D,F, I, or W. If the student drops the course within the official drop/add window, the course grade field will be null. Course completion Course completion was defined as students completing the course within the normal semester timeframe. In other words, students who did not withdraw or receive an incomplete Academic success Defined as students completing the course within the normal timeframe and receiving a grade of C or better.

Feature Description Avg Site Visits per week The total number of times per week the student enters a course Percent Lesson Content The total number of times a section in the Lessons tool Accessed is accessed by a student / The total number of times a section in the Lessons tool is accessed in the course Percent Discussion The total number of discussion postings by student / Postings Percent Discussion Postings read Percent Assessments completed Percent Assessments opened Percent Assignments completed Percent Assignments opened total number of discussion postings in the course The total number of discussion postings opened by student / total number of discussion postings opened in the course The number of assessments completed by the student / The number of assessments completed by all students in the course The total number of assessments opened by the student./ the total number of assessments open by all students in the course. Note: If a student opens the same assessment multiple times, the system records each entry. The number of assignments completed by the student / The number of assignments completed by all students in the course The total number of assignments opened by the student./ the total number of assignments open by all students in the course. Note: If a student opens the same assignments multiple times, the system records each entry.

C4.5/C5.0 Decision Tree Logistic Regression Support Vector machines Bayesian Networks

Inference (prediction, diagnosis, causal explanation) Prediction (classification) Reduce Data Transform & Discretize Partition Data Build Models Evaluate and Choose Models Linear Feature Transformation (Factor Analysis) Transform 70% Train 20% Validate 10% Test Logistic Regression Predictive Accuracy. Validation with held out data Embedded Feature Selection - 70% Train 20% Validate 10% Test C4.5 / C5.0 Decision Tree Predictive Accuracy. Validation with held out data Embedded Feature Selection - 70% Train 20% Validate 10% Test Support Vector Machines Predictive Accuracy. Validation with held out data Embedded Feature Selection Linear and Nonlinear Feature Transformation Transform & Discretize 70% Train 20% Validate 10% Test Bayesian Networks (Model Selection: search the space of BNs) Average Predictive Accuracy over nodes. Validation with held out data

Data mining and predictive modeling are affected by input data of diverse quality A predictive model is usually as good as its training data Good: lots of data Not so good: Missing data (tools not used, data not entered) Variability in Sakai tools usage Variability in instructor s assessment criteria Variability in workload criteria

This research derives its motivation from the need of introducing alternative research methods and model development approaches capable of developing tools that can be used in practical settings to predict academic performance and carry out early detection of students at risk. The methodology presented will be initially applied on realworld data extracted from Marist College transactional systems: its open source course management system (Sakai / ilearn) and student demographics and course enrollment data. We hope that this methodological framework is used by other higher education institutions as a template to facilitate development of predictive models for academic success using Sakai data.