GAE python based app which regularly collects metadata about BigQuery tables and stores it in BigQuery.
GCP Census was created to answer the following questions:
Now every question above can be easily answered by querying metadata tables in BigQuery or looking at our dashboard created in Google Data Studio.
Count all data to which GCP Census has access
SELECT sum(numBytes) FROM `YOUR-PROJECT-ID.bigquery_views.table_metadata_v1_0`
Count all tables and partitions
SELECT count(*) FROM `YOUR-PROJECT-ID.bigquery_views.table_metadata_v1_0`
SELECT count(*) FROM `YOUR-PROJECT-ID.bigquery_views.partition_metadata_v1_0`
Select top 100 largest datasets
SELECT projectId, datasetId, sum(numBytes) as totalNumBytes
FROM `YOUR-PROJECT-ID.bigquery_views.table_metadata_v1_0`
GROUP BY projectId, datasetId ORDER BY totalNumBytes DESC LIMIT 100
Select top 100 largest tables
SELECT projectId, datasetId, tableId, numBytes
FROM `YOUR-PROJECT-ID.bigquery_views.table_metadata_v1_0`
ORDER BY numBytes DESC LIMIT 100
Select top 100 largest partitions
SELECT projectId, datasetId, tableId, partitionId, numBytes
FROM `YOUR-PROJECT-ID.bigquery_views.partition_metadata_v1_0`
ORDER BY numBytes DESC LIMIT 100
Provide cost estimation for the top 100 largest datasets
SELECT * FROM `YOUR-PROJECT-ID.bigquery_views.datasets_cost_estimation_v1_0`
ORDER BY estimatedMonthlyCostInUSD DESC LIMIT 100
Views with costs are only estimation because:
GCP Census retrieves BigQuery metadata using REST API:
GCP Census will retrieve metadata of tables it has read access to, which means the scope is derived from GCP IAM settings.
pip install -r requirements.txt
pip install -t lib -r requirements.txt
gcloud app deploy --project YOUR-PROJECT-ID -v v1 app.yaml config/cron.yaml config/queue.yaml
/createModels
to create BigQuery dataset and table/bigQuery
to start collecting BigQuery metadataGCP Census endpoints are accessible only for GAE Administrators, i.e. all endpoints are secured with login: admin in app.yaml. Still, anyone may attempt to access your app and will be redirected to Google Account login page.
That's why we strongly recommend enabling GAE Firewall on your project. You can enable it with three simple gcloud commands:
gcloud app firewall-rules create 500 --action allow --source-range 0.1.0.1 --description "Allow GAE cron" --project YOUR-PROJECT-ID
gcloud app firewall-rules create 510 --action allow --source-range 0.1.0.2 --description "Allow GAE tasks" --project YOUR-PROJECT-ID
gcloud app firewall-rules update default --action deny --project YOUR-PROJECT-ID
You can find all development setup/steps in .travis.yml
Subscribe or post to gcp-census-discuss@googlegroups.com to follow along and ask questions about the GCP Census.
Feel free to submit feature requests and bug reports under Issues.
If you would like to contribute to our development efforts, please review our contributing guidelines and submit a pull request.