Migrate Sensitive Data in BigQuery Using Dataflow & Cloud DLP

This repo contains a reference implementation of an end to end data tokenization solution designed to migrate sensitive data in BigQuery. Please check out the links below for reference guides:

  1. Concept & Overview.
  2. Create & Manage Cloud DLP Configurations.
  3. Automated Dataflow Pipeline to De-identify PII Dataset.
  4. Validate Dataset in BigQuery and Re-identify using Dataflow.

Table of Contents

Reference Architecture

Reference Architecture

Quick Start

Open in Cloud Shell

Run the following commands to trigger an automated deployment in your GCP project. Script handles following topics:

gcloud config set project <project_id>
sh deploy-data-tokeninzation-solution.sh

You can run some quick validations in BigQuery table to check on tokenized data.

For re-identification (getting back the original data in a Pub/Sub topic), please follow this instruction here.

Quick Start To S3 Inspection PoC

This is a hybrid solution for customers who would like to use Cloud DLP to scan PII data stored in a S3 bucket. Solution stores the inspection result in a BigQuery table.

Open in Cloud Shell

gcloud config set project <project_id>
sh deploy-s3-inspect-solution.sh

To Do