When working with research data that involves people, it's important to recognize that sensitive information isn't limited to medical studies or projects requiring IRB oversight. Any data that could potentially lead to someone being identified carries risk.
These risks might include identity theft, legal issues, financial hardship, or damage to a person’s reputation or well-being.
In human-subject research, sensitive data refers to any piece of information that could be used to trace a person's identity. It's important to note that the potential for harm doesn’t need to be clearly proven for data to be considered sensitive. Even if a researcher doesn’t believe there's a significant risk, that perception doesn’t change the nature of the data or the need to protect it.
To respect participants' autonomy and privacy, identifying information should never be disclosed or made discoverable unless the individual has explicitly agreed to it.
Direct identifiers are details that clearly point to a specific person or a very small group. These typically include things like names, full addresses, social security numbers, phone numbers, email addresses, and biometric data like facial images, voiceprints, or fingerprints. Before any dataset is shared, these identifiers usually need to be removed.
Indirect identifiers may not point to someone on their own, but they can be combined with other publicly available information, such as content from social media or government databases, to figure out who someone is. Examples of indirect identifiers include zip code, date of birth, education level, occupation, race or ethnicity, medical conditions, and more.
Before sharing any research data, it’s necessary to remove both direct and indirect identifiers to protect participant confidentiality and reduce the risk of re-identification.
