Calendar Icon White
July 10, 2023
Clock Icon
6
 min read

Deidentification of PHI (Protected Health Information): A Tokenization & Redaction Approach

Deidentify PHI (Patient Data)

Deidentification of PHI (Protected Health Information): A Tokenization & Redaction Approach

TL;DR

TL;DR

  • Protected Health Information (PHI) is sensitive data that requires careful handling, similar to personally identifiable information (PII) or credit card data.
  • Tokenization, redaction, and masking are three techniques used for PHI deidentification.
  • Tokenization replaces sensitive PHI with unique identifiers, redaction removes or obscures individual identifiers, and masking conceals a portion of the data element.
  • These techniques are used in various scenarios such as collecting PHI, displaying PHI to authorized users, sending PHI to third-party partners, performing analytics on PHI, and redacting PHI for document sharing.
  • The integration of tokenization, redaction, and masking ensures secure handling of PHI, reduces the risk of data breaches, and ensures compliance with privacy regulations in the healthcare industry.

Much like personally identifiable information (PII) or credit card data, protected health information (PHI) is a category of sensitive data that requires careful handling. The emergence of tokenization technology provides an effective solution to safeguard this sensitive health data, which has led to a growing interest in understanding and implementing this approach. Here, we explore the process of PHI deidentification through tokenization and the reasons why it is essential for any healthcare entity handling PHI.

What is PHI?

Protected Health Information, or PHI, is any information in a medical record that can be used to identify an individual and that was created, used, or disclosed while providing a health care service, such as a diagnosis or treatment. In the United States, PHI is guarded under the Health Insurance Portability and Accountability Act (HIPAA), including demographic information, medical histories, test results, and insurance information.

The PHI Deidentification Challenge

The challenge with PHI lies in the need to balance privacy and utility. While it is critical to protect patient privacy by deidentifying their health information, the deidentified data must still retain enough utility for essential tasks such as health research, quality assurance, and clinical studies. This challenge is amplified by the increasing volume and complexity of health data being collected, and the growing number of services that touch this data.

Graph showing Total amount of global healthcare data generated in 2013 and a projection for 2020

Three Pillars of PHI Deidentification: Tokenization, Redaction, and Masking

To address this challenge effectively, three techniques come into play - tokenization, redaction, and masking.

  1. Tokenization: Tokenization replaces sensitive PHI with unique identifiers, or tokens, that hold no exploitable meaning or value outside the system that generated them. These tokens can be safely handled, stored, and transmitted without the risk of a data breach revealing sensitive PHI.
  2. Redaction: Redaction is a technique for removing sensitive data elements within a dataset or document. This process involves eliminating or obscuring individual identifiers, rendering the PHI untraceable to the individual. While maintaining the rest of the data's utility, redaction is particularly useful in reports or documents that require publishing or sharing.
  3. Masking: Masking involves concealing a portion of the data element to prevent it from being fully recognized or used. Masking is often used when some part of the data can be exposed without risk, like showing only the last four digits of a Social Security Number or a birth year without the specific date. Masking is typically used for display purposes, where authorized users can access the full data if necessary.

Let's explore how these techniques transform PHI handling across four broad use cases:

  1. Collecting PHI from Patients
  2. Displaying PHI to Patients or Authorized Users
  3. Sending PHI to Third-Party Partners
  4. Performing Analytics on PHI

Collecting PHI from Patients

The collection of sensitive PHI is done via input fields that are part of an iFrame, hosted by the tokenization provider. The provider tokenizes the PHI and returns the tokens to the user interface application. Sensitive PHI never touches the business' server, significantly reducing the risk of exposure.

Form to upload Patient Name and Upload Lab Test Result
Collect PHI (Protected Health Information) Data

Strac exposes Tokenization APIs and UX Widget Components to collect patient data. For example, you can use Strac's createToken API to create tokens

Strac Tokenization
Strac Tokenization


Displaying PHI to Patients or Authorized Users

In the case of PHI, authorized users often need to view specific data. By employing a combination of tokenization and masking, these users can see relevant information without exposing the entire PHI. For instance, providers like Strac could use their UI components to show only masked tokenized PHI data, keeping the sensitive information secure.

To display patient data, use Strac's detokenizeTokens API or UX widget component that displays patient data.

Strac Detokenization API
Strac Detokenization API

Sending PHI to Third-Party Partners

When sending PHI to third-party partners, tokens representing PHI are used, with APIs such as Strac Interceptor API. Since tokens, not actual PHI, are being transmitted, the data remains protected even during transmission.

Graph showing Outbound Proxy
Strac Outbound Proxy: Send sensitive data without ever touching it

Performing Analytics on PHI

Analytics on PHI can be performed using tokens, enabling insights to be drawn from patient data without risking PHI exposure. Tokenization allows queries against sensitive data, such as string equality on dates of birth or gender, without revealing sensitive information.

Redacting PHI for Document Sharing

Redaction is critical when healthcare entities need to share or publish documents containing PHI. Specific identifiers within the PHI are removed using redaction, ensuring the document no longer contains information traceable to the individual. Redacted PHI documents allow for safe sharing while preserving the document's overall utility.

Use Strac's redact APIs to redact any sensitive data from text or documents.

W2 Tax Return Redacted Form
Redacted Document via Strac APIs

De-Identify PHI Data in Databases

Strac will work with any database (Relational database or NoSQL Database) and can mask, tokenize or redact sensitive data in databases. Learn more about here: https://www.strac.io/integrations/postgres-data-masking

Table showing Postgres Database after Strac Redaction

Strac API

Checkout our API Docs at https://docs.strac.io to learn how you can protect sensitive PHI data.

Real World Scenarios

  1. Collecting PHI for Health History
  2. Healthcare entities can collect and store PHI securely using tokenization, reducing the risk of data breaches while building health histories, making diagnoses, and providing treatments.
  3. Performing Patient Identification and Verification
  4. Patient identification and verification often involve sensitive PHI handling. Using tokenization and masking, healthcare entities can perform these tasks securely without exposing the sensitive data.
  5. Analyzing PHI for Research and Quality Improvement
  6. Healthcare entities often need to analyze PHI to improve service quality, perform research, or make strategic decisions. Tokenization, redaction, and masking allow these entities to handle PHI for such analyses without accessing sensitive information.
  7. Sharing PHI Containing Documents
  8. When sharing documents containing PHI, redaction comes into play, removing specific identifiers, thereby ensuring safe sharing or publishing.

In summary, the integration of tokenization, redaction, and masking offers a comprehensive solution for PHI deidentification. By employing these methods, healthcare entities can ensure the secure handling of PHI, reduce the risk of data breaches, ensure compliance with privacy regulations, and continue to perform essential tasks that require access to PHI. The rise of these techniques represents a crucial development in the secure and compliant handling of PHI in the healthcare industry.

To learn more about Strac's tokenization, redaction, and masking solutions, book a demo.

Discover & Protect Data on SaaS, Cloud, Generative AI
Strac provides end-to-end data loss prevention for all SaaS and Cloud apps. Integrate in under 10 minutes and experience the benefits of live DLP scanning, live redaction, and a fortified SaaS environment.
Trusted by enterprises
Discover & Remediate PII, PCI, PHI, Sensitive Data

Latest articles

Browse all

Get Your Datasheet

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Close Icon