SDTM Implementation within Data Management CDISC Users Group Meeting Lauren Shinaberry MSc, CCDM
Data Management and SDTM SDTM can be seen as the end point representation of all the work done within DM Data Management takes care of getting the data ready for a submission At the time of submission the data is stable Within DM the data is in flux 2
Understanding the purpose For those responsible for mapping CRF data to SDTM, it is critical to understand the topic of the data Is it being collected to assist with cleaning? Is it being collected to calculate exposure to the drug or to calculate accountability for the drug? 3
CDISC Implementation at PRA In 2004, PRA began considering the benefits of implementing CDISC standards Decided SDTM would be where the largest benefit could be gained Cross-functional group was formed Biostatistics, medical writing, safety, monitoring and data management CRF templates were designed to support SDTM requirements 4
What we expected That DM would be most affected by the change and primarily drive it That CRF design could be pared down to only what was required by SDTM That there would be efficiencies in screen design and all related programming activities That an introduction to the SDTM Implementation Guide would be sufficient and DM staff would be equipped to find the answers there That this change would fit within our current processes 5
What really happened There was a much bigger impact on Stats and Analysis Programming than expected In practice, CRF design did not radically change Efficiencies were not immediately apparent Those creating SDTM datasets needed much more support than just the IG Any difficulty that occurred during a study was blamed on SDTM regardless of the true issue Our processes didn t always fit what needed to happen 6
Biggest complaint? Our analysis programmers spent a huge amount of effort trying to create listings from SDTM that looked like the way the CRF looked Ex. List all inc/exc criteria responses for all subjects, but SDTM only stores the exceptions 7
The Listings Issue Still one of the biggest points of education internally and with clients The FDA specifically states that if providing SDTM, no listings are needed For non-fda submissions, while listings are still necessary, it is not necessary to undo SDTM structure simply to make the listings cosmetically match the CRF 8
Blurring the line between DM and Stats SDTM contains some data that traditionally belong to the Analysis and Reporting group, not DM Population flags Baseline flags Questionnaire scoring Conversion of lab units 9
Blurring of lines Challenged preconceptions in both groups Historically, AnRep didn t touch any DM datasets Needed to reassure groups that it was OK to add some fields to what DM provided as a starting point Concept of SDTM- SDMT deliverables are a shared responsibility 10
Unblinding issues DM domain contains ARM and ARMCD which (for blinded studies) are not known until after the study is locked What should go into ARM and ARMCD until then? 11
ARM and ARMCD Leave blank until lock? Concern that we would not be able to tell if they were blank on purpose or accidentally Populate with dummy codes? For example, every other record is Placebo or Study Drug Concern that dummy codes might be confused with actual unblinding 12
ARM/ARMCD solution Until lock, ARM = Blinded and ARMCD = BLIND Makes it clear to anyone viewing the dataset that it has not been unblinded No ambiguity from missing data Replaced with actual arm data once locked 13
What did we learn? We underestimated the skill level required to successfully create SDTM-compliant datasets We did not prepare the project teams for the initial loss in efficiency while a new way of doing things was begun We focused primarily on DM training and neglected to include Analysis and Reporting, PM and senior management We didn t consider how submission ready data differs from data in the process of being collected and cleaned 14
What did we improve? All database programmers, analysis programmers and statisticians are required to attend a 2 day training session on the use of SDTM, including exercises to apply what was learned Communicated clearly to senior management what SDTM could and could not address for a project and what issues can not be blamed on SDTM implementation 15
Additional lessons learned As a CRO we have an easier time implementing CDISC standards because we are already used to implementing many different client-specific standards We can start to copy work from across client studies, not just within client programs Many of our clients look to us for education and guidance on CDISC standards, and we freely share the lessons we ve learned 16
Process issues Traditionally AnRep did not ever modify the datasets provided to them from DM SDTM is a shared set of datasets Unstable data has significant impact on the stability of the SDTM datasets Previous validation process was not suited to the complex conversions required by many SDTM dataset rules 17
What do we mean by stability? SUPPxx datasets may not exist unless a specific response to a question was received SDTM does not permit topic variables to be missing, but during cleaning this may occur Should those records be dropped? Should some temporary place holder be used? 18
Missing topics For special cases, place holders would be used for missing topics Ex.: End of study page did subject complete? Is No, but specific reason has not been given Site queried for reason, but for interim analysis, DSTERM = DISCONTINUED REASON UNKNOWN 19
Current status Specialist group created to create SDTM datasets Education is expanded to wider audience of functions Are finally able to copy work from previous studies and only need to make minor changes from client to client SDTM datasets are treated as derived datasets and validation now included double programming 20
Additional improvements Program more defensively and implement rules on how to handle incomplete data Run conversion programs more frequently to revalidate the output instead of waiting until right before a deliverable Keep a library of custom SDTM datasets created on previous studies to better manage amount of rework 21