Data Collection | Research Registry Toolkit

Deciding What Data to Collect

Researchers often want to collect as much data as possible; however, some data may be expensive or impractical to obtain. Consider the data you need to meet the objectives of your registry and the costs and benefits of collecting these and additional data.

Questions to consider include:

How will we use these data in analysis?
How difficult is it to get these data? From where do we obtain these data?
Are the data likely to have missing values or fields?
How burdensome to staff or participants will it be to collect these data?

Study teams should consider consulting with a biostatistician in the planning phase of the project to discuss which data are necessary to answer substantive research questions.

Selecting Data Sources

Data sources for research registries fall into two categories: data collected specifically for the registry (primary data), and existing datasets originally collected for another purpose (secondary data). Selection of data sources depends on the needs of your project, and in many cases, a registry project will use multiple data sources. This section provides an overview of primary and secondary data sources. For a more comprehensive comparison of advantages and disadvantages of research data sources, check out “Commonly Utilized Data Collection Approaches in Clinical Research”. ¹

What about Biological Specimens?

Biological specimens are important for many registries and can be useful for understanding the underlying causes of diseases and conditions. Biological specimens may include urine, blood, saliva or tissue samples, and can be collected specifically for the repository or be leftover (residual) samples obtained initially for clinical purposes. When a registry includes biological specimen, it is usually referred to as a repository. Please review the Regulatory section for important policy guidelines when using biological specimen.

Primary Data Collection

Primary data may include information collected through surveys and interviews, such as participant attitudes and beliefs, behavior, health history, and patient reported outcomes – a report of health-related status directly from the patient. In addition to participant reported data, registries may also record observations about participants or administer laboratory tests, such as a glucose test, to participants.

Primary data collection allows the registry teams to control the data collected to a greater degree than when using secondary data sources. However, primary data collection requires staff time and associated costs and participant effort.

Primary data collection often requires data collected directly from participants. When deciding which data elements to collect directly from participants, consider the following:

Responsiveness: It may be hard to achieve good response rates in surveys sent to patients; low response rates can affect your ability to generalize results. Consider strategies you can use to increase responsiveness, including offering participant incentives¹ and dedicating time and resources to participant engagement.

Saczynski JS, McManus DD, Goldberg RJ. Commonly used data-collection approaches in clinical research. Am. J. Med. 2013;126(11):946-950. doi:10.1016/j.amjmed.2013.04.016.
Burden: Extensive surveys or interactions can be burdensome to patients. As you identify data sources and create a data collection plan, it is important to be conscious of time and effort and balance your data needs with potential participant burden.
Accuracy: Participants may not be the most accurate source of information for measures you wish to collect. For example, a participant may not know the result of recent lab tests.

When you find that primary data collection may not be appropriate or feasible for collecting certain data elements, consider using secondary data sources.

Secondary Data Collection

Secondary data sources, or existing datasets originally collected for another purpose, are useful resources for registries. One of the more commonly used existing data source is the electronic health record.

Electronic health records (EHRs) contain data collected for routine clinical care and billing purposes. Data from EHRs can be a valuable information source for registries. Study teams may use an EHR to identify potential registry participants or to gain additional information about patients after they enroll. EHR data are most often collected through chart abstraction or data sets provided by a research data warehouse – a collection of data from a health system’s electronic health record. The use of EHR data in research requires important regulatory, consent, and HIPAA considerations.

The Carolina Data Warehouse for Health at UNC- Chapel Hill

The Carolina Data Warehouse for Health (CDW-H) is a central repository containing clinical, research, and administrative data from the UNC Health Care system. Investigators who have appropriate IRB approval can request data sets through the CDW-H to identify potentially eligible patients and conduct secondary data analyses.

i2b2@UNC is a self-service, cohort discovery tool, which researchers can use to find counts of patients from UNC Health Care before obtaining IRB approval.

EHR data is a valuable resource, but there are some limitations. Data from EHRs are collected primarily for clinical purposes, so the data you are interested in may not be available or may be incomplete.¹ Chart abstraction allows study teams to explore more detail in the medical record. A successful chart abstraction requires a clear, defined study protocol and a team well trained in that protocol.^2,³ REDCap can be used as a standardized data collection tool for chart abstraction.

Apply it! Building an Allergy Registry

A study team is building an allergy registry. The team requests a dataset from their institution’s research data warehouse for patients with certain clinical criteria related to allergies. After appropriate IRB and regulatory approvals, the study team is provided with a list of medical record numbers (MRNs), names, phone numbers, and demographic data for individual patients.

Study staff use the phone numbers to contact patients and ask them if they want to participate in the registry. After a patient consents to be a part of the registry, their name and relevant medical data are added to the registry. At this stage, the study staff ask participants questions about their allergy experiences and enter those responses into the registry record. They also conduct chart abstraction to gather additional information from the patient’s medical record and record those data in the registry.

The data in this registry are used to identify patients who may be eligible for future studies. Based on this information, researchers are able to reach out to those listed in the registry to ask if they are interested in participating in relevant research opportunities.

Data Collection Tools

A variety of data collection and management tools are available for research registries, including paper and pencil surveys, spreadsheets, Access databases, online survey platforms like Qualtrics, and electronic data capture and management tools like REDCap. Different tools are appropriate for different needs.

In this Research Registry Toolkit, we highlight REDCap—a secure web application for building and managing online surveys and databases and is designed to support data capture for research studies and operations. Over 2000 institutions use REDCap, including the CTSA programs at many academic medical centers.

REDCap at UNC – Chapel Hill

REDCap is available to UNC-Chapel Hill investigators. Learn more: research.unc.edu/systems/redcap