Google Cloud Setup Cheatsheet¶
This document captures the steps taken to set up Validibot on Google Cloud Platform.
Before You Start¶
Set these environment variables before running the commands below:
# Source your config file (copy from .envs.example/.production/.google-cloud/.just first)
source .envs/.production/.google-cloud/.just
# Or set manually:
export GCP_PROJECT_ID="your-project-id"
export GCP_REGION="us-central1"
export GCP_APP_NAME="validibot" # Resource name prefix used for all GCP resources
# Get your project number (used for some IAM bindings):
export GCP_PROJECT_NUMBER=$(gcloud projects describe $GCP_PROJECT_ID --format="value(projectNumber)")
Commands in this guide use $GCP_PROJECT_ID, $GCP_REGION, $GCP_PROJECT_NUMBER, and $GCP_APP_NAME as placeholders.
Prerequisites¶
Install gcloud CLI¶
# Install via official installer (recommended over Homebrew)
curl https://sdk.cloud.google.com | bash -s -- --disable-prompts --install-dir=$HOME
# Add to your shell profile (~/.zshrc)
source ~/google-cloud-sdk/path.zsh.inc
source ~/google-cloud-sdk/completion.zsh.inc
Initial Setup¶
1. Authenticate with Google Cloud¶
# Log in (opens browser for OAuth)
gcloud auth login
# To switch accounts, revoke and re-login
gcloud auth revoke --all
gcloud auth login
# Check current authenticated accounts
gcloud auth list
2. List and Select Project¶
# List available projects
gcloud projects list
# Set the active project
gcloud config set project PROJECT_ID
3. Rename Project (Display Name Only)¶
Note: Project IDs cannot be changed after creation, only the display name.
4. Set Default Region¶
5. Verify Configuration¶
Enable Required APIs¶
Enable all the APIs needed for a Django app on Cloud Run:
gcloud services enable \
run.googleapis.com \
sqladmin.googleapis.com \
secretmanager.googleapis.com \
artifactregistry.googleapis.com \
cloudbuild.googleapis.com \
cloudtasks.googleapis.com \
storage.googleapis.com
| API | Purpose |
|---|---|
run.googleapis.com |
Cloud Run (serverless containers) |
sqladmin.googleapis.com |
Cloud SQL (PostgreSQL database) |
secretmanager.googleapis.com |
Secret Manager (credentials storage) |
artifactregistry.googleapis.com |
Artifact Registry (Docker images) |
cloudbuild.googleapis.com |
Cloud Build (CI/CD) |
cloudtasks.googleapis.com |
Cloud Tasks (async task queue) |
storage.googleapis.com |
Cloud Storage (media files) |
Create Cloud Tasks Queue¶
Cloud Tasks is available for async orchestration and retries (for example, moving web→worker work off-request). Validator Cloud Run Jobs are triggered directly via the Jobs API today, but we still provision the queue so we can adopt Cloud Tasks where it adds reliability.
Verify the queue was created:
Grant permissions to create tasks¶
The Cloud Run service account needs permission to add tasks to the queue:
gcloud tasks queues add-iam-policy-binding $GCP_APP_NAME-tasks \
--location=$GCP_REGION \
--member="serviceAccount:$GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/cloudtasks.enqueuer"
Next Steps¶
After completing the above:
- Provision Cloud SQL - Create PostgreSQL instance
- Set up Secret Manager - Store database credentials
- Create Artifact Registry - Docker image repository
- Build and Deploy - Push Docker image and deploy to Cloud Run
Provision Cloud SQL¶
Create a PostgreSQL 17 instance (the latest stable version):
gcloud sql instances create $GCP_APP_NAME-db \
--database-version=POSTGRES_17 \
--edition=ENTERPRISE \
--tier=db-f1-micro \
--region=$GCP_REGION \
--storage-type=SSD \
--storage-size=10GB \
--availability-type=zonal \
--backup \
--backup-start-time=03:00
| Option | Value | Notes |
|---|---|---|
--database-version |
POSTGRES_17 |
Latest stable PostgreSQL (as of Dec 2024) |
--edition |
ENTERPRISE |
Required for smaller tiers; ENTERPRISE_PLUS requires larger tiers |
--tier |
db-f1-micro |
Smallest/cheapest tier for dev; use db-g1-small or larger for production |
--region |
$GCP_REGION |
Sydney region |
--storage-type |
SSD |
Better performance |
--storage-size |
10GB |
Minimum; can auto-grow |
--availability-type |
zonal |
Single zone; use regional for HA |
--backup |
enabled | Daily backups |
--backup-start-time |
03:00 |
UTC time for backup window |
After creation, create the database and user:
# Create database
gcloud sql databases create validibot --instance=$GCP_APP_NAME-db
# Generate a strong password
DB_PASSWORD=$(openssl rand -base64 32)
echo "Save this password: $DB_PASSWORD"
# Create user
gcloud sql users create validibot_user \
--instance=$GCP_APP_NAME-db \
--password="$DB_PASSWORD"
# Store password in Secret Manager
echo -n "$DB_PASSWORD" | gcloud secrets create db-password --data-file=-
Change Database Password¶
To change the database password later:
# Generate new password and store in Secret Manager
NEW_DB_PASSWORD=$(openssl rand -base64 32)
echo -n "$NEW_DB_PASSWORD" | gcloud secrets versions add db-password --data-file=-
# Apply to the database user
gcloud sql users set-password validibot_user \
--instance=$GCP_APP_NAME-db \
--password="$(gcloud secrets versions access latest --secret=db-password)"
# Redeploy Cloud Run services to pick up new secret (after deployment)
# gcloud run services update $GCP_APP_NAME-web --region=$GCP_REGION
Create Artifact Registry¶
Create a Docker repository for storing container images:
gcloud artifacts repositories create $GCP_APP_NAME \
--repository-format=docker \
--location=$GCP_REGION \
--description="Validibot Docker images"
Configure Docker to authenticate with Artifact Registry:
The image URL format is:
Set Up Secrets¶
The production environment variables are stored in Secret Manager as a single secret file.
Why a single .env file instead of per-key secrets?¶
Cloud Run supports two approaches for secrets:
- Per-key secrets - Each environment variable is a separate secret, injected via
--set-secrets=VAR=secret:version - File-mounted secret - A single
.envfile mounted as a volume, sourced by the start script
We use the file-mounted approach because:
- Simpler management - One secret to create/update instead of 20+
- Matches local development - Same
.envfile format used locally - Easier migration - Can copy the local
.envs/.production/.google-cloud/.djangofile directly - Atomic updates - All variables update together when you add a new secret version
The tradeoff is less granular access control (all-or-nothing), but for a single-developer project this is acceptable. The start script (compose/production/django/start.sh) sources /secrets/.env before starting Django.
Create the django-env secret¶
Important: GCP environment files are in
.envs/.production/.google-cloud/. Cloud deployments and Docker Compose use.envs/, andsource set-env.shloads local env vars for host-run commands.
First, update .envs/.production/.google-cloud/.django with production values:
DJANGO_SECRET_KEY- Generate withpython3 -c "import secrets; print(secrets.token_urlsafe(50))"DJANGO_ALLOWED_HOSTS-.run.app,.validibot.comSITE_URL- Public base URL (typicallyhttps://validibot.comonce the load balancer + DNS is set up)WORKER_URL- Worker service*.run.appURL (used for validator callbacks and scheduled tasks)DATABASE_URL- Cloud SQL Unix socket format (see below)
The DATABASE_URL format for Cloud SQL:
Note: URL-encode special characters in the password (e.g., / becomes %2F, = becomes %3D).
Get the connection name:
gcloud sql instances describe $GCP_APP_NAME-db --format="value(connectionName)"
# Returns: $GCP_PROJECT_ID:$GCP_REGION:$GCP_APP_NAME-db
Then upload the env file as a secret:
gcloud secrets create django-env \
--data-file=.envs/.production/.google-cloud/.django \
--replication-policy=user-managed \
--locations=$GCP_REGION
Grant Cloud Run access to secrets¶
PROJECT_NUMBER=$(gcloud projects describe $GCP_PROJECT_ID --format="value(projectNumber)")
gcloud secrets add-iam-policy-binding django-env \
--member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Grant Cloud Run access to Cloud SQL¶
The Cloud Run service account also needs permission to connect to Cloud SQL:
gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
--member="serviceAccount:$GCP_PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
--role="roles/cloudsql.client"
Note for dev environments: If you create a separate dev Cloud Run service with its own service account, you'll need to grant
roles/cloudsql.clientto that service account as well.
Update a secret¶
When you change .envs/.production/.google-cloud/.django, add a new version:
gcloud secrets versions add django-env --data-file=.envs/.production/.google-cloud/.django
# Then redeploy Cloud Run to pick up changes
gcloud run services update $GCP_APP_NAME-web --region=$GCP_REGION
List secrets¶
Create Dedicated Service Account¶
By default, Cloud Run uses the Compute Engine default service account. For production, create a dedicated service account with only the permissions needed, following the principle of least privilege.
Why a dedicated service account?¶
- Isolation - Permissions are specific to Validibot, not shared with other GCP services
- Auditability - Logs clearly show which service performed actions
- Security - Blast radius is limited if credentials are compromised
- Environment separation - Production and staging can have different SAs with different access
Create the service account¶
gcloud iam service-accounts create $GCP_APP_NAME-cloudrun-prod \
--display-name="Validibot Cloud Run SA (Production)" \
--description="Service account for Validibot production Cloud Run services" \
--project $GCP_PROJECT_ID
Grant required roles¶
The service account needs these roles:
| Role | Purpose |
|---|---|
roles/cloudsql.client |
Connect to Cloud SQL |
roles/secretmanager.secretAccessor |
Access secrets mounted via --set-secrets |
roles/storage.objectAdmin |
Read/write media files (when GCS is configured) |
# Cloud SQL access
gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
--member="serviceAccount:$GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
# Secret Manager access (required for custom service accounts with --set-secrets)
gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
--member="serviceAccount:$GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Note: When using a custom service account, Cloud Run requires the SA to have
secretmanager.secretAccessorto access secrets via--set-secrets. The default compute SA has special implicit access, but custom SAs do not.
For staging environment (future)¶
Create a separate service account for staging:
gcloud iam service-accounts create $GCP_APP_NAME-cloudrun-staging \
--display-name="Validibot Cloud Run SA (Staging)" \
--project $GCP_PROJECT_ID
# Grant same roles (but could be more restrictive, e.g., read-only storage)
Create GCS Bucket for Storage¶
Create a Cloud Storage bucket with prefix-based access control:
# Production bucket
gcloud storage buckets create gs://$GCP_APP_NAME-storage \
--location=$GCP_REGION \
--default-storage-class=STANDARD \
--uniform-bucket-level-access \
--project $GCP_PROJECT_ID
# Development bucket
gcloud storage buckets create gs://$GCP_APP_NAME-storage-dev \
--location=$GCP_REGION \
--default-storage-class=STANDARD \
--uniform-bucket-level-access \
--project $GCP_PROJECT_ID
Grant bucket access to service accounts¶
# Production SA -> Production bucket (full access)
gcloud storage buckets add-iam-policy-binding gs://$GCP_APP_NAME-storage \
--member="serviceAccount:$GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
# Make public/ prefix publicly readable (for avatars, workflow images)
gcloud storage buckets add-iam-policy-binding gs://$GCP_APP_NAME-storage \
--member="allUsers" \
--role="roles/storage.objectViewer" \
--condition='expression=resource.name.startsWith("projects/_/buckets/$GCP_APP_NAME-storage/objects/public/"),title=public-prefix-only'
# Staging SA -> Dev bucket (when staging is set up)
# gcloud storage buckets add-iam-policy-binding gs://$GCP_APP_NAME-storage-dev \
# --member="serviceAccount:$GCP_APP_NAME-cloudrun-staging@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
# --role="roles/storage.objectAdmin"
Bucket structure:
$GCP_APP_NAME-storage/
├── public/ # Publicly readable (avatars, workflow images)
└── private/ # Private (validation submissions, artifacts)
Bucket naming:
$GCP_APP_NAME-storage- Production (public/ and private/ prefixes)$GCP_APP_NAME-storage-dev- Development/staging
The STORAGE_BUCKET environment variable in .envs/.production/.google-cloud/.django should be set to $GCP_APP_NAME-storage.
Build and Push Docker Image¶
Build the production Docker image:
docker build --platform linux/amd64 -f compose/production/django/Dockerfile \
-t $GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_APP_NAME/$GCP_APP_NAME-web:v1 .
Push to Artifact Registry:
# Authenticate Docker (one-time setup)
gcloud auth configure-docker $GCP_REGION-docker.pkg.dev
# Push image
docker push $GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_APP_NAME/$GCP_APP_NAME-web:v1
Deploy to Cloud Run¶
Deploy the web service with the dedicated service account, secrets, and Cloud SQL connection:
gcloud run deploy $GCP_APP_NAME-web \
--image $GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_APP_NAME/$GCP_APP_NAME-web:v1 \
--region $GCP_REGION \
--service-account $GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com \
--add-cloudsql-instances $GCP_PROJECT_ID:$GCP_REGION:$GCP_APP_NAME-db \
--set-secrets=/secrets/.env=django-env:latest \
--min-instances 0 \
--max-instances 4 \
--memory 1Gi \
--allow-unauthenticated \
--project $GCP_PROJECT_ID
| Option | Purpose |
|---|---|
--service-account |
Use dedicated SA instead of default compute SA |
--add-cloudsql-instances |
Enables Cloud SQL Auth Proxy sidecar |
--set-secrets |
Mounts secret as file at /secrets/.env (sourced by start.sh) |
--min-instances 0 |
Scale to zero when idle (cost savings) |
--max-instances 4 |
Limit max instances for cost control |
--allow-unauthenticated |
Public web access (remove for internal services) |
After deployment, get the service URL:
Running Management Commands¶
Since Cloud Run doesn't support exec into containers, use Cloud Run Jobs for one-off management commands.
Important: When using --command to override the container entrypoint, the entrypoint script (which loads secrets) is bypassed. You must explicitly source the secrets file in your command.
Using the justfile (recommended)¶
The justfile provides convenient commands for common operations:
# Run database migrations
just gcp migrate
# Run setup_validibot (configures site, seeds default data, creates superuser)
just gcp setup-data
# Verify setup is correct
just gcp run-command "check_validibot --verbose"
# View job logs
just gcp job-logs $GCP_APP_NAME-migrate
just gcp job-logs $GCP_APP_NAME-setup
Manual job creation¶
If you need to run a custom management command:
gcloud run jobs create $GCP_APP_NAME-manage \
--image $GCP_REGION-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_APP_NAME/$GCP_APP_NAME-web:latest \
--region $GCP_REGION \
--service-account $GCP_APP_NAME-cloudrun-prod@$GCP_PROJECT_ID.iam.gserviceaccount.com \
--set-cloudsql-instances $GCP_PROJECT_ID:$GCP_REGION:$GCP_APP_NAME-db \
--set-secrets=/secrets/.env=django-env:latest \
--memory 1Gi \
--command "/bin/bash" \
--args "-c,set -a && source /secrets/.env && set +a && python manage.py YOUR_COMMAND" \
--project $GCP_PROJECT_ID
Key points:
- Use
--set-cloudsql-instances(not--add-cloudsql-instances) for jobs - Use
--command "/bin/bash"with--args "-c,..."to run shell commands - Must
source /secrets/.envbecause--commandbypasses the entrypoint set -aexports all variables,set +astops exporting after sourcing
Execute the job¶
Check job logs¶
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=$GCP_APP_NAME-manage" \
--project $GCP_PROJECT_ID \
--limit 50 \
--format="table(timestamp,textPayload)"
Pausing and Resuming the Service¶
To temporarily block public access without deleting the service:
Pause (block public traffic)¶
gcloud run services update $GCP_APP_NAME-web \
--region $GCP_REGION \
--ingress internal \
--project $GCP_PROJECT_ID
This sets ingress to internal-only. The URL will return 403 Forbidden to public requests. The service can still scale to zero when idle, so you won't incur compute costs.
Resume (allow public traffic)¶
gcloud run services update $GCP_APP_NAME-web \
--region $GCP_REGION \
--ingress all \
--project $GCP_PROJECT_ID
Note: You cannot set
--max-instances 0on Cloud Run - it requires a positive integer. Using--ingress internalis the recommended way to pause a service.
Your Configuration¶
Store your project-specific configuration in .envs/.production/.google-cloud/.just (not committed to git). See .envs.example/.production/.google-cloud/.just for a template.
Key values you'll need:
| Setting | Example Value |
|---|---|
| Project ID | my-project-123456 |
| Region | us-central1 |
| Cloud SQL Instance | $GCP_APP_NAME-db |
| Cloud SQL Connection | my-project-123456:us-central1:$GCP_APP_NAME-db |
| Artifact Registry | us-central1-docker.pkg.dev/my-project-123456/$GCP_APP_NAME/ |
| Service Account (prod) | $GCP_APP_NAME-cloudrun-prod@my-project-123456.iam.gserviceaccount.com |
| Secrets | django-env, db-password |
| GCS Bucket (prod) | $GCP_APP_NAME-storage (with public/ and private/ prefixes) |
| Cloud Tasks Queue | $GCP_APP_NAME-tasks |