Calendar Icon White
May 16, 2023
Clock Icon
3
 min read

How to remove PII from a CSV file?

Learn why and how to remove sensitive data from .csv or .xslx or Google Sheets

How to remove PII from a CSV file?

TL;DR

CSV, Excel and Google Sheets are used daily for various business application use cases. These files are then shared daily over email or Slack or over tickets (Zendesk, Intercom, Salesforce, Service Now, SAP). These files may contain sensitive PII, PHI, or confidential data, which must be handled securely for various reasons. Let's learn a) why one should remove sensitive data from csv or excel files and b) how to achieve it.

Why remove PII or PHI from .csv or .xslx?

Personally Identifiable Information (PII) refers to any data that could potentially identify a specific individual. This can include data such as name, social security number, date and place of birth, mother's maiden name, or biometric records.

There are several reasons why one might want to remove PII from a CSV (Comma Separated Values) file or XSLX (Microsoft Excel Open XML Spreadsheet):

  1. Legal Reasons: Laws like GDPR in the EU, CCPA in California, and other privacy laws worldwide require companies to protect PII. Non-compliance with these regulations can result in hefty fines.
  2. Privacy Concerns: By removing PII, you safeguard individuals' privacy. Even with no malicious intent, mishandling PII can lead to unintentional consequences like identity theft.
  3. Data Breaches: If the CSV file doesn't contain PII in case of a data breach, the impact is significantly reduced. There's less risk for the individuals involved and potentially less legal and reputational damage for the company.
  4. Ethical Reasons: It's often considered ethical best practice to minimize the use of PII whenever possible and only use it when absolutely necessary for the task at hand.
  5. Data Minimization: This is a key principle in many privacy regulations, which suggests that you should only collect and store the minimum amount of data necessary for your purposes. If the PII is not needed, it's best to remove it. Data Minimization goes by other terms like Tokenization or Pseudonymization.

How to remove PII or PHI from .csv or .xslx?

If you know which cells in an Excel sheet or data values in CSV are PII or PHI, it is very straightforward to delete that entire column. However, most of the time, it is unknown what is PII or sensitive data in those files as the schema of the files is unknown.

This is where Strac's machine-learning technology and the depth of PII data elements come into the picture. Strac will automatically remove, mask, or redact any sensitive data from all types of documents - .pdf, .jpeg, .docx, .csv, and more.

Let's take an example. Below is a screenshot of an excel file that has Id, Gender, Birthdate, last name, first name, address, Policy Creation Date, Utm Campaign and Utm Content.  We would want to remove all PII data automatically.

Example 1

Before (CSV with PII)

CSV that has PII data
CSV that has PII data

After (CSV without PII)

With Strac's automatic redaction, masking or removal of data powered by its proprietary machine learning technology, we will automatically remove gender, birthdate, last name, first name and address. It will look like below:

CSV file without PII data
CSV file without PII data


‎The above experience removes PII data from csv or excel or sheets. You can configure to mask or tokenize data within the file. You can learn more about different masking techniques here: https://www.strac.io/integrations/postgres

Example 2

Before (CSV with PII in Columns)

CSV that has PII embedded within the column
CSV that has PII embedded within the column

After (CSV with PII in Columns)

Redacted CSV: All PII are removed
Redacted CSV: All PII are removed

Any questions?

If you have any questions or want to learn how to remove PII from CSV files, whether in email, zendesk ticket, slack message, or any SaaS app, or want API access, please book a meeting with us.

Discover & Protect Data on SaaS, Cloud, Generative AI
Strac provides end-to-end data loss prevention for all SaaS and Cloud apps. Integrate in under 10 minutes and experience the benefits of live DLP scanning, live redaction, and a fortified SaaS environment.
Trusted by enterprises
Discover & Remediate PII, PCI, PHI, Sensitive Data

Latest articles

Browse all

Get Your Datasheet

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Close Icon