Skip to content

Configuring Storage

Validibot uses a single storage location with prefix-based separation for public and private files:

storage/                        # Local: ./storage/  |  GCS: gs://bucket-name/
├── public/                     # Publicly accessible files
│   ├── avatars/                # User profile pictures
│   └── workflow_images/        # Workflow featured images
└── private/                    # Private files (authenticated access only)
    └── runs/                   # Validation run data
        └── {run_id}/           # Each run gets its own directory
            ├── input/          # Written by web app before validation
            │   ├── envelope.json
            │   └── submission.idf
            └── output/         # Written by validator container
                ├── envelope.json
                ├── findings.json
                └── artifacts/
                    └── report.html

This guide explains how to configure storage for different deployment scenarios.

Storage Systems Overview

Public Files (Django STORAGES "default")

The default Django storage handles publicly accessible files:

  • User profile pictures (avatars)
  • Workflow featured images
  • Organization logos

These files are served directly via URL. In production, GCS serves them through the public/ prefix which has public read access via IAM conditions.

Private Files (Data Storage)

The validibot.core.storage module handles private validation pipeline files:

  • User-submitted files for validation (IDF, FMU, etc.)
  • Input envelopes (JSON configuration for validators)
  • Output envelopes (JSON results from validators)
  • Generated artifacts (reports, transformed files)

These files are stored under the private/ prefix and require authenticated access. Users download their files via time-limited signed URLs.

Validation Run Storage Structure

Each validation run gets its own directory under private/runs/{run_id}/. This structure is standardized across all deployment platforms (Docker, Kubernetes, GCS, etc.):

private/runs/{run_id}/
├── input/                      # Written by web app
│   ├── envelope.json           # Validation configuration
│   └── {submission_files}      # User-uploaded files (e.g., model.idf)
└── output/                     # Written by validator container
    ├── envelope.json           # Validation results
    ├── findings.json           # Detailed findings
    └── artifacts/              # Generated files
        ├── report.html
        └── transformed.idf

Ownership and Access

Directory Written By Read By
input/ Web app Validator container
output/ Validator container Web app (worker)

Container Access

Validator containers receive their run path via environment variable:

# Container receives:
RUN_PATH=runs/{run_id}

# Container reads from:
${STORAGE_ROOT}/${RUN_PATH}/input/

# Container writes to:
${STORAGE_ROOT}/${RUN_PATH}/output/

This standardized structure allows containers to work identically across platforms:

Platform How Container Accesses Storage
Docker Compose Shared volume mount at STORAGE_ROOT
Kubernetes Shared PVC or object storage (MinIO/S3)
Cloud Run Jobs (GCP) GCS bucket via service account

Configuration by Environment

Local Development (Default)

When running without cloud storage, files are stored locally:

./storage/
├── public/      # MEDIA_ROOT - avatars, workflow images
└── private/     # DATA_STORAGE_ROOT - validation files

No configuration needed - this is the default behavior.

Docker Compose Deployments

For Docker deployments, use a shared volume so both web app and validator containers can access the same storage:

# docker-compose.yml
services:
  web:
    volumes:
      - storage_data:/app/storage
    environment:
      - STORAGE_ROOT=/app/storage

  validator-energyplus:
    volumes:
      - storage_data:/app/storage # Same volume!
    environment:
      - STORAGE_ROOT=/app/storage
      - RUN_PATH=runs/${RUN_ID} # Set per-invocation

volumes:
  storage_data:

Local Development with GCS

To test GCS integration locally:

  1. Authenticate with Google Cloud:
gcloud auth application-default login
  1. Set the environment variable:
export STORAGE_BUCKET=your-dev-bucket
  1. The storage will use GCS with:
  2. public/ prefix for media files
  3. private/ prefix for validation files

Production (Google Cloud Storage)

Production uses a single GCS bucket with prefix-based access control:

# config/settings/production.py

STORAGE_BUCKET = env("STORAGE_BUCKET")  # e.g., "myapp-storage"

STORAGES = {
    "default": {
        "BACKEND": "storages.backends.gcloud.GoogleCloudStorage",
        "OPTIONS": {
            "bucket_name": STORAGE_BUCKET,
            "location": "public",  # Files stored under public/ prefix
        },
    },
}

DATA_STORAGE_BACKEND = "gcs"
DATA_STORAGE_BUCKET = STORAGE_BUCKET
DATA_STORAGE_PREFIX = "private"

GCS Bucket Setup

Creating the Bucket

# Create bucket (choose your region)
gcloud storage buckets create gs://your-bucket-name \
    --location=us-west1 \
    --uniform-bucket-level-access

Configuring IAM for Public/Private Separation

The key to this architecture is using IAM Conditions to make only the public/ prefix readable while keeping private/ restricted:

# Make public/ prefix publicly readable
gcloud storage buckets add-iam-policy-binding gs://your-bucket-name \
    --member="allUsers" \
    --role="roles/storage.objectViewer" \
    --condition='expression=resource.name.startsWith("projects/_/buckets/your-bucket-name/objects/public/"),title=public-prefix-only'

This grants allUsers read access only to objects under the public/ prefix. Objects under private/ remain accessible only to authenticated service accounts.

Service Account Permissions

Your Cloud Run service account needs full access to the bucket:

gcloud storage buckets add-iam-policy-binding gs://your-bucket-name \
    --member="serviceAccount:YOUR_SERVICE_ACCOUNT@PROJECT.iam.gserviceaccount.com" \
    --role="roles/storage.objectAdmin"

Verifying Access Control

Test that the configuration is correct:

# This should work (public file)
curl -I https://storage.googleapis.com/your-bucket-name/public/test.txt

# This should return 403 Forbidden (private file)
curl -I https://storage.googleapis.com/your-bucket-name/private/test.txt

Environment Variables Reference

Variable Default Description
STORAGE_BUCKET (none) GCS bucket name (required in production)
STORAGE_ROOT ./storage Local filesystem root for storage
DATA_STORAGE_BACKEND local Backend type: local or gcs
DATA_STORAGE_PREFIX private Prefix for private files in bucket

Using the Data Storage API

from validibot.core.storage import get_data_storage

storage = get_data_storage()

# Write input files (before validation)
storage.write("runs/run-123/input/envelope.json", json_content)
storage.write_file("runs/run-123/input/model.idf", local_path)

# Read output files (after validation)
content = storage.read("runs/run-123/output/envelope.json")

# Write/read Pydantic envelopes
from validibot_shared.energyplus.envelopes import EnergyPlusInputEnvelope
storage.write_envelope("runs/run-123/input/envelope.json", envelope)

output = storage.read_envelope(
    "runs/run-123/output/envelope.json",
    EnergyPlusOutputEnvelope,
)

# Generate signed download URL (for user downloads)
url = storage.get_download_url(
    "runs/run-123/output/artifacts/report.pdf",
    expires_in=3600,  # 1 hour
    filename="validation-report.pdf",
)

Platform-Specific Implementations

The standardized run directory structure (runs/{run_id}/input/ and output/) works across platforms, but each platform may require specific implementation details:

Docker/Kubernetes (Shared Filesystem)

  • Uses shared volume mounts
  • Both web app and containers access the same filesystem path
  • Simple and reliable for Docker Compose deployments

Google Cloud Storage

  • Web app and Cloud Run Jobs both access the same GCS bucket
  • Authentication via service accounts (no credentials needed in code)
  • Requires IAM configuration for public/private separation

Future Platforms (S3, Azure Blob, etc.)

When implementing new storage backends:

  1. Follow the standard structure - Use runs/{run_id}/ with input/ and output/ subdirectories.
  2. Implement the DataStorage interface - Inherit from validibot.core.storage.base.DataStorage
  3. Handle platform-specific auth - Each platform has its own credential mechanism
  4. Document IAM/access setup - Public/private separation varies by platform

See validibot/core/storage/gcs.py for an example implementation.

Security Model

How It Works

Validibot separates public and private files using prefix-based access control:

For GCS deployments:

  1. Bucket uses uniform bucket-level access (no per-object ACLs)
  2. An IAM Condition grants allUsers read access only to the public/ prefix
  3. The private/ prefix is only accessible to authenticated service accounts
  4. Users download their files via time-limited signed URLs

For local/Docker deployments:

  1. Public files are served directly via Django's media handling
  2. Private files are served through authenticated Django views
  3. Signed URLs use HMAC signatures with Django's SECRET_KEY

Security Checklist (GCS)

  • [ ] Bucket has uniform bucket-level access enabled
  • [ ] IAM condition restricts allUsers to public/ prefix only
  • [ ] Service account has storage.objectAdmin role
  • [ ] Application never stores sensitive data under public/ prefix
  • [ ] Signed URLs have reasonable expiration times (1 hour default)

Security Checklist (Local/Docker)

  • [ ] Storage volume is not exposed outside the Docker network
  • [ ] SECRET_KEY is set consistently across app instances
  • [ ] Download endpoints require authentication

Troubleshooting

"Permission denied" when uploading

Check that:

  1. Service account has storage.objectAdmin role on the bucket
  2. Cloud Run is using the correct service account
  3. For local development, you've run gcloud auth application-default login

Public files returning 403

Check that:

  1. IAM condition is correctly configured (verify the bucket name in the condition)
  2. Files are being stored under the public/ prefix
  3. Run gcloud storage buckets get-iam-policy gs://bucket to verify

Signed URL errors

Check that:

  1. The file exists in storage
  2. Service account can sign blobs (needs iam.serviceAccounts.signBlob permission)
  3. For local storage, SECRET_KEY is set consistently

Container can't access files

Check that:

  1. Volume is mounted at the correct path in both containers
  2. STORAGE_ROOT environment variable matches the mount path
  3. RUN_PATH is set correctly for the validation run