Post-Deployment Verification (PDV)¶

After deploying to any environment, run the PDV smoke tests to verify that critical functionality is working correctly.

Quick Start¶

# After deploying to dev
just gcp deploy-all dev
just verify-deployment dev

# After deploying to production
just gcp deploy-all prod
just verify-deployment prod

Application Health Check¶

For Docker Compose deployments or to verify application-level configuration, use the built-in health check command:

# Basic health check
python manage.py check_validibot

# Verbose output with details
python manage.py check_validibot --verbose

# Attempt to auto-fix common issues
python manage.py check_validibot --fix

# JSON output for scripting/monitoring
python manage.py check_validibot --json

The check_validibot command verifies:

Check	What it verifies
Database	Connection, PostgreSQL version
Migrations	All migrations applied
Cache	Redis/cache connectivity
Storage	File storage read/write access
Site	Django Sites configuration
Roles & Permissions	Required roles and permissions exist
Validators	System validators configured
Background Tasks	Celery broker connectivity, schedules
Docker	Docker availability, validator images
Email	SMTP server reachability
Security	DEBUG mode, SECRET_KEY, ALLOWED_HOSTS, HTTPS settings

This is similar to GitLab's gitlab:check rake task or Zulip's health check plugins.

When to Run¶

After initial setup_validibot to verify everything is configured
After upgrades to catch configuration drift
When troubleshooting issues
As part of monitoring/alerting (use --json output)

What Gets Tested¶

The PDV suite verifies:

Web Service¶

Homepage is accessible (returns 200)
Static files are served (robots.txt)
API documentation is accessible (/api/v1/docs/)
API endpoints require authentication

Worker Service Security¶

IAM protection: Unauthenticated requests are rejected with 403
Callback endpoint: Cannot be spoofed by external attackers
Scheduled task endpoints: Protected from external access
Authenticated requests: Reach Django when properly authenticated

The worker security tests are particularly important because the callback endpoint is how validator jobs report their results. If this endpoint were exposed, attackers could spoof validation results.

Commands¶

Full Verification¶

Runs the complete pytest smoke test suite:

just verify-deployment <stage>

Options: - just verify-deployment dev - Test dev environment - just verify-deployment staging - Test staging environment - just verify-deployment prod - Test production

You can pass pytest arguments:

# Run only callback tests
just verify-deployment prod -k "callback"

# Show more detail
just verify-deployment prod -vv

# Stop on first failure
just verify-deployment prod -x

Quick Verification¶

A faster check that just verifies services are up and IAM is working:

just verify-deployment-quick <stage>

This uses curl to check: 1. Web service returns 200 2. Worker service returns 403 (IAM protected)

Good for a quick sanity check, but doesn't test as thoroughly as the full suite.

Prerequisites¶

Deployed services: Both web and worker services must be deployed to the target stage
gcloud CLI: Must be installed and in your PATH
Valid credentials: Must be logged in with gcloud auth login
Cloud Run Invoker role: Your account needs permission to invoke the worker service (for authenticated tests)

How It Works¶

All smoke tests run locally on your machine and make HTTP requests to the remote deployed services. No code runs on the server as part of PDV.

Your laptop                          GCP Cloud Run
─────────────────                    ─────────────────
just verify-deployment prod
    │
    ├─► gcloud: resolve service URLs
    │
    ├─► pytest tests/smoke/
    │       │
    │       ├─► HTTP requests ──────────────────► Web Service
    │       │
    │       ├─► HTTP requests ──────────────────► Worker Service
    │       │
    │       └─► Verify responses
    │
    └─► Results displayed locally

This approach: - Tests the real deployed infrastructure (Cloud Run, IAM, load balancers, DNS) - Requires no additional deployment or management commands on the server - Is simple to run - just needs gcloud credentials locally

The tests use the SMOKE_TEST_STAGE environment variable to know which stage to test. The just verify-deployment command sets this automatically - you don't need to add it to any secrets or environment files.

Test Structure¶

tests/smoke/
├── __init__.py           # Module docstring
├── conftest.py           # Fixtures (URLs, HTTP sessions)
├── test_web_service.py   # Web service health tests
└── test_worker_security.py   # Worker IAM/security tests

Adding New Tests¶

To add a new smoke test:

Create a new test file in tests/smoke/ or add to an existing file
Use the provided fixtures:
web_url - The deployed web service URL
worker_url - The deployed worker service URL
http_session - Unauthenticated requests session
authenticated_http_session - Session with gcloud identity token
stage - The current stage (dev/staging/prod)

Example:

def test_my_new_endpoint(web_url: str, http_session):
    """Verify my new endpoint works."""
    response = http_session.get(f"{web_url}/api/v1/my-endpoint/", timeout=30)
    assert response.status_code == 200

Troubleshooting¶

"SMOKE_TEST_STAGE must be set"¶

Run via just verify-deployment <stage> instead of running pytest directly, or set the environment variable:

SMOKE_TEST_STAGE=dev pytest tests/smoke/ -v

"Failed to get service URL"¶

The service isn't deployed or you don't have permission to describe it:

# Check if service exists
gcloud run services describe $GCP_APP_NAME-web-dev --region=us-west1

"Authenticated request was rejected by IAM"¶

Your gcloud account doesn't have the Cloud Run Invoker role on the worker service:

# Grant yourself invoker access (if you're an admin)
gcloud run services add-iam-policy-binding $GCP_APP_NAME-worker-dev \
  --region=us-west1 \
  --member="user:you@example.com" \
  --role="roles/run.invoker"

Worker returns 200 instead of 403¶

The worker service may have been deployed with --allow-unauthenticated. This is a security issue - redeploy with:

just gcp deploy-worker <stage>

The deploy script uses --no-allow-unauthenticated by default.