Designing for Data Quality
Data Collection Tool Design
Electronic data capture tools can be designed in a way that encourages accurate data collection. Elements of data collection tools that promote quality are discussed in this section.
Required Fields and Reminders
Many electronic data collection tools allow you to designate required fields. Users are required to complete these fields before submitting the instrument. Required fields should be used to capture information essential to the project, and can be included in surveys completed by participants and forms that study staff use to enter data.
Reminders and Prompts
Reminders and prompts alert users about incomplete fields and can be used as an alternative to required fields. The user has the opportunity to review and complete this field, but can advance regardless.
Required Fields Gone Wrong
Required fields are a useful and necessary feature, but it is important to test your instruments and make sure required fields are appropriate, as demonstrated in this scenario:
Your registry data collection tools are complete and you decided to make every field required. What could go wrong?
In your participant-facing survey, you ask a participant for their last recorded blood pressure. The participant does not know his blood pressure. He is now left with two options (1) stop taking the survey without finishing or (2) enter a value that may be incorrect. This is a problem, because the data are either incomplete or incorrect.
There are two ways to avoid this issue:
- Remove the field requirement.
- Keep the field requirement and add an “I don’t know” option.
Free Text Fields
Free texts fields are text fields that allow users to enter any value. Free text fields can be difficult to analyze and answers can be very heterogeneous. Therefore, it is important to carefully consider the use of free text fields.
To Free Text or Not to Free Text…That is the Question
Let’s say you want to survey participants about symptoms they experience.
To collect this information, you could include a survey question like this: “Please list the symptoms you experienced in the past week” and allow them to answer in a free text field. Your results will likely be quite varied, with some participants writing full paragraphs describing their symptoms and others writing a very brief list. There may also be typos or misspellings that are difficult to understand.
Alternatively, you could include this survey question: “Please select the symptoms you experienced in the past week (check all that apply),” followed by a list of the most common symptoms. With this approach, you receive data that can more easily be compared, but if a participant experienced symptoms not on the list, they will not be able to report it.
Both of these approaches could be appropriate depending on the needs of your registry and overall data collection plan.
Many data collection tools allow users to specify the format or content type for a text field. This feature can help promote data accuracy by reducing unintentional errors. Below are some examples of field validation:
- Emails and Phone Numbers – Text fields can require data be entered in a specified format such as email or phone number.
- Dates – When collecting dates, consider using a “date picker,” which requires the user to select an exact date on a calendar to complete the field. A date picker is useful in many cases, especially for variables like birth date. However, in some cases, your participants may not be able to identify an exact date of an event (e.g., when did you start smoking). In this case, you could use a free text field with validation to require the user enter an integer to ask “what year did you start smoking?”.
- Value Ranges – In many cases, you will know a realistic range for values of fields. For example, if your study involves people ages 20-35, an age of 40 would indicate invalid data. Setting range constraints on the field will disallow entry of data outside of a specified range.
You can say that again! Using duplicate data entry
Let’s say you have crucial information, such as a medical record number, which must be entered by you team correctly for every participant. No matter how great your team is, typos happen, so consider using duplicate data entry. This means that study staff must enter crucial information twice to ensure accuracy.
Selecting Appropriate Field Types
Many concepts can be collected using multiple field types. For example information about age can be collected by asking for birthdate (using a “date picker”), age (using free text), or age range (using radio buttons). See Figure 1 for examples of all three options.
The most appropriate field type depends on the needs of your study. In this example, birthdate is most specific, but also personally identifiable information, and, depending on the source, protected health information. Age gathered through a free text field may be appropriate for situations requiring deidentified data. Age range is likely less helpful for analysis, but can be useful for describing general demographics of participants.
Figure 1: Three possible ways to collect information about age. The best method depends on the needs of your study.
The use of validated and standardized instruments is considered best practice when relevant tools are available, as they support high-quality data collection and enable results to be compared across studies. Examples of validated instruments include the Short Form Health Survey (SF-36), Patient Health Questionnaire 9 (PHQ9), and a variety of Patient-Reported Outcomes Measurement Information System (PROMIS) measures. If you create a new scale or survey instrument and resources allow, it is recommended that you validate it appropriately according to standard process in your field.
Can I use REDCap for that?
The REDCap Shared Library contains existing, validated instruments you can use for your registry. Look for the red star, which denotes a curated instrument approved by the REDCap Library Oversight Committee.
- Required Fields and Reminders
Testing Data Collection Tools
After deciding what data to collect and how to collect those data, it is important to test the tools you created. Below are some suggestions for testing your data collection tools:
- Test data validation in electronic tools by entering data in incorrectly. For example, if you have email address validation, try typing in “datawizard@emailcom.”
- Test the branching logic in your electronic tools using multiple data scenarios.
- If you have created participant surveys, ask study team members to complete surveys as if they were a participant. These study team members should report on any areas of confusion.
- If you have created a tool that research coordinators will use to enter data while talking to a participant, ask study team members to act out the participant-coordinator interaction and see if the fields make sense.
Planning For Data Maintenance
Carefully considering data needs, data sources, data collection processes, and tools is important for planning for data quality. Doing so can reduce the long-term burden of maintaining and cleaning data in your registry. However, even with the best planning, there may be unexpected errors. Strategies for identifying and addressing these errors include:
- Writing data collection protocols and Standard Operating Procedures and reviewing these with study team members
- Developing a plan to regularly review the quality and accuracy of the data being collected
- Consulting with an expert about data cleaning. Any data manipulations should be carefully tracked and you should weigh the costs and benefits of making these changes.
Wait, what about alternate contact information?
If your registry requires participant follow-up, it is especially important to have accurate and complete contact information. Strategies include:
- Collecting multiple contacts for participants (e.g., email AND phone number)
- Using validation when collecting phone number or email in an online tool
- Collecting contact information for a secondary contact person
For other suggestions on how to support registry retention and follow up, see participant retention.