PIICatcher is a data catalog and scanner for PII and PHI information. It finds PII data in your databases and file systems and tracks critical data. The data catalog can be used as a foundation to build governance, compliance and security applications.
Check out AWS Glue & Lake Formation Privilege Analyzer for an example of how piicatcher is used in production.
PIICatcher is available as a command-line application.
To install use pip:
python3 -m venv .env
source .env/bin/activate
pip install piicatcher
# Install Spacy English package
python -m spacy download en_core_web_sm
# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
PIICatcher supports the following filesystems:
PIICatcher supports the following databases:
For advanced usage refer documentation PIICatcher Documentation.
Please take this survey if you are a user or considering using PIICatcher. The responses will help to prioritize improvements to the project.
For Contribution guidelines, PIICatcher Developer documentation.