This repo contains a reference implementation of an end to end data tokenization solution designed to migrate sensitive data in BigQuery. Please check out the links below for reference guides:
Run the following commands to trigger an automated deployment in your GCP project. Script handles following topics:
Create a bucket ({project-id}-demo-data) in us-central1 and uploads a sample dataset with mock PII data.
Create a BigQuery dataset in US (demo_dataset) to store the tokenized data.
Create a KMS wrapped key(KEK) by creating an automatic TEK (Token Encryption Key).
Create DLP inspect and re-identification template with the KEK and crypto based transformations identified in this section of the guide
Trigger an automated Dataflow pipeline by passing all the required parameters e.g: data, configuration & dataset name.
Please allow 5-10 mins for the deployment to be completed.
gcloud config set project <project_id>
sh deploy-data-tokeninzation-solution.sh
You can run some quick validations in BigQuery table to check on tokenized data.
For re-identification (getting back the original data in a Pub/Sub topic), please follow this instruction here.
This is a hybrid solution for customers who would like to use Cloud DLP to scan PII data stored in a S3 bucket. Solution stores the inspection result in a BigQuery table.
gcloud config set project <project_id>
sh deploy-s3-inspect-solution.sh