A Basic Guide To SDTM Mapping


Standard Data Tabulation Model (SDTM) is a model for managing clinical trial data. It was created by the Clinical Data Interchange Standards Consortium (CDISC) Submission Data Standards Team. The main aim of developing this model was to create a standard structure for collecting, organizing, analyzing, and reporting data for clinical studies. 

SDTM mapping is an integral part of submitting clinical data. It helps translate data from a source dataset into the SDTM standard format and structure. Most data collected during clinical trial processes are often in non-CDISC standards and must be converted to the CDISC SDTM structure. This process is one of the most challenging parts of clinical data management, requiring technical expertise and time.  

This article will discuss the basics of SDTM mapping:  

1. Identify The Datasets To Map 

Datasets are the basic building blocks of SDTM mapping. Identifying which datasets must be mapped and converted into the SDTM format is essential. This process can be done manually or using a software program that automates the conversion process. For manual mapping, it is crucial to understand the CDISC data standards and their relationship with the source dataset. 

It’s important to note that some datasets don’t follow CDISC standards and can’t be mapped to the matching SDTM standard variable. Such datasets are called SDTM supplemental qualifiers, allowing users to add more data in a ‘variable name – variable value’ format. These datasets are named beginning with SUPP and then two letters from the SDTM domain they’re created for—for example, SUPPEX for the Exposure dataset. Therefore, researchers need to understand and distinguish which datasets are supplemental and which are not during this stage.  

2. Identify SDTM Datasets That Correspond To Those Datasets 

Once the source datasets have been identified, it’s essential to determine which SDTM datasets they correspond to. This is done by reviewing the domain definitions in the CDISC library and mapping them accordingly. For example, a source dataset that contains information about the patient’s age should be converted to the Demographic domain. 

Once this step has been completed and all SDTM datasets mapped, researchers can create an SDTM dataset.  

3. Gather Metadata Of The Datasets And The Corresponding SDTM Metadata 

Metadata is vital for SDTM mapping as it helps researchers understand the data structure and format of the datasets. Metadata includes variable names, length, number of values, and units of measure. Knowing this information allows researchers to match source datasets with the corresponding SDTM domains more quickly.  

Some of the SDTM domains include demographics (DM), adverse events (AE), laboratory (LB), vital signs (VS), medical history (MH), and concomitant medications (CM). These datasets represent the essential clinical trial data that must be accurately mapped for quality data and reliable results. 

4. Map Variables In The Datasets To The Corresponding SDTM Domains 

After gathering the metadata and identifying the corresponding SDTM domains, the next step is to map each variable in the source dataset to their matching SDTM domain. This process is known as ‘variable mapping’ and requires researchers to thoroughly review the variables in both datasets and match them accordingly. For example, a source dataset containing patient height information should be mapped to the VS domain.  

Once the mapping is done, the data is checked for accuracy, quality, and interoperability. Common errors include incorrect data types, length, and missing and invalid values. Once the errors have been corrected, the mapped datasets are ready for analysis and reporting. 

5. Create Custom And SUPPQUAL Domains If Needed  

In some cases, researchers may need to create custom and SUPPQUAL domains for additional data not represented in the core SDTM domains. This process requires a deep understanding of the CDISC standards and the ability to accurately create and validate these new domains.  

Custom domains are created when the data doesn’t fit into any of the existing SDTM domains, and SUPPQUAL domains are created for datasets that are supplemental to the core set. Once these domains have been created, they can be mapped to their corresponding source dataset variables.  

6. Validate SDTM Datasets And Generate Define.xml Files  

Once all source datasets have been mapped to their corresponding SDTM domains, validating the data is essential. This is done by generating a define.xml file which contains information about the variables, labels, and values in each domain. This file can ensure all the data has been mapped correctly.  

Once the define.xml file has been generated, it can be used to validate the SDTM datasets. This is done by comparing each variable to its corresponding define.xml entry and confirming that it is accurate. If any errors are found, they must be corrected before the dataset can be used for analysis and reporting. 


SDTM mapping is a critical step in clinical trial data analysis and reporting. It protects the data correctly mapped to its corresponding SDTM domains to generate accurate results. Following the abovementioned steps, researchers can accurately map their source datasets and generate reliable SDTM datasets for further analysis and reporting.