Configuring Storage¶
Validibot uses a single storage location with prefix-based separation for public and private files:
storage/ # Local: ./storage/ | GCS: gs://bucket-name/
├── public/ # Publicly accessible files
│ ├── avatars/ # User profile pictures
│ └── workflow_images/ # Workflow featured images
└── private/ # Private files (authenticated access only)
└── runs/ # Validation run data
└── {run_id}/ # Each run gets its own directory
├── input/ # Written by web app before validation
│ ├── envelope.json
│ └── submission.idf
└── output/ # Written by validator container
├── envelope.json
├── findings.json
└── artifacts/
└── report.html
This guide explains how to configure storage for different deployment scenarios.
Storage Systems Overview¶
Public Files (Django STORAGES "default")¶
The default Django storage handles publicly accessible files:
- User profile pictures (avatars)
- Workflow featured images
- Organization logos
These files are served directly via URL. In production, GCS serves them through the public/ prefix which has public read access via IAM conditions.
Private Files (Data Storage)¶
The validibot.core.storage module handles private validation pipeline files:
- User-submitted files for validation (IDF, FMU, etc.)
- Input envelopes (JSON configuration for validators)
- Output envelopes (JSON results from validators)
- Generated artifacts (reports, transformed files)
These files are stored under the private/ prefix and require authenticated access. Users download their files via time-limited signed URLs.
Validation Run Storage Structure¶
Each validation run gets its own directory under private/runs/{run_id}/. This structure is standardized across all deployment platforms (Docker, Kubernetes, GCS, etc.):
private/runs/{run_id}/
├── input/ # Written by web app
│ ├── envelope.json # Validation configuration
│ └── {submission_files} # User-uploaded files (e.g., model.idf)
└── output/ # Written by validator container
├── envelope.json # Validation results
├── findings.json # Detailed findings
└── artifacts/ # Generated files
├── report.html
└── transformed.idf
Ownership and Access¶
| Directory | Written By | Read By |
|---|---|---|
input/ |
Web app | Validator container |
output/ |
Validator container | Web app (worker) |
Container Access¶
Validator containers receive their run path via environment variable:
# Container receives:
RUN_PATH=runs/{run_id}
# Container reads from:
${STORAGE_ROOT}/${RUN_PATH}/input/
# Container writes to:
${STORAGE_ROOT}/${RUN_PATH}/output/
This standardized structure allows containers to work identically across platforms:
| Platform | How Container Accesses Storage |
|---|---|
| Docker Compose | Shared volume mount at STORAGE_ROOT |
| Kubernetes | Shared PVC or object storage (MinIO/S3) |
| Cloud Run Jobs (GCP) | GCS bucket via service account |
Configuration by Environment¶
Local Development (Default)¶
When running without cloud storage, files are stored locally:
./storage/
├── public/ # MEDIA_ROOT - avatars, workflow images
└── private/ # DATA_STORAGE_ROOT - validation files
No configuration needed - this is the default behavior.
Docker Compose Deployments¶
For Docker deployments, use a shared volume so both web app and validator containers can access the same storage:
# docker-compose.yml
services:
web:
volumes:
- storage_data:/app/storage
environment:
- STORAGE_ROOT=/app/storage
validator-energyplus:
volumes:
- storage_data:/app/storage # Same volume!
environment:
- STORAGE_ROOT=/app/storage
- RUN_PATH=runs/${RUN_ID} # Set per-invocation
volumes:
storage_data:
Local Development with GCS¶
To test GCS integration locally:
- Authenticate with Google Cloud:
- Set the environment variable:
- The storage will use GCS with:
public/prefix for media filesprivate/prefix for validation files
Production (Google Cloud Storage)¶
Production uses a single GCS bucket with prefix-based access control:
# config/settings/production.py
STORAGE_BUCKET = env("STORAGE_BUCKET") # e.g., "myapp-storage"
STORAGES = {
"default": {
"BACKEND": "storages.backends.gcloud.GoogleCloudStorage",
"OPTIONS": {
"bucket_name": STORAGE_BUCKET,
"location": "public", # Files stored under public/ prefix
},
},
}
DATA_STORAGE_BACKEND = "gcs"
DATA_STORAGE_BUCKET = STORAGE_BUCKET
DATA_STORAGE_PREFIX = "private"
GCS Bucket Setup¶
Creating the Bucket¶
# Create bucket (choose your region)
gcloud storage buckets create gs://your-bucket-name \
--location=us-west1 \
--uniform-bucket-level-access
Configuring IAM for Public/Private Separation¶
The key to this architecture is using IAM Conditions to make only the public/ prefix readable while keeping private/ restricted:
# Make public/ prefix publicly readable
gcloud storage buckets add-iam-policy-binding gs://your-bucket-name \
--member="allUsers" \
--role="roles/storage.objectViewer" \
--condition='expression=resource.name.startsWith("projects/_/buckets/your-bucket-name/objects/public/"),title=public-prefix-only'
This grants allUsers read access only to objects under the public/ prefix. Objects under private/ remain accessible only to authenticated service accounts.
Service Account Permissions¶
Your Cloud Run service account needs full access to the bucket:
gcloud storage buckets add-iam-policy-binding gs://your-bucket-name \
--member="serviceAccount:YOUR_SERVICE_ACCOUNT@PROJECT.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
Verifying Access Control¶
Test that the configuration is correct:
# This should work (public file)
curl -I https://storage.googleapis.com/your-bucket-name/public/test.txt
# This should return 403 Forbidden (private file)
curl -I https://storage.googleapis.com/your-bucket-name/private/test.txt
Environment Variables Reference¶
| Variable | Default | Description |
|---|---|---|
STORAGE_BUCKET |
(none) | GCS bucket name (required in production) |
STORAGE_ROOT |
./storage |
Local filesystem root for storage |
DATA_STORAGE_BACKEND |
local |
Backend type: local or gcs |
DATA_STORAGE_PREFIX |
private |
Prefix for private files in bucket |
Using the Data Storage API¶
from validibot.core.storage import get_data_storage
storage = get_data_storage()
# Write input files (before validation)
storage.write("runs/run-123/input/envelope.json", json_content)
storage.write_file("runs/run-123/input/model.idf", local_path)
# Read output files (after validation)
content = storage.read("runs/run-123/output/envelope.json")
# Write/read Pydantic envelopes
from validibot_shared.energyplus.envelopes import EnergyPlusInputEnvelope
storage.write_envelope("runs/run-123/input/envelope.json", envelope)
output = storage.read_envelope(
"runs/run-123/output/envelope.json",
EnergyPlusOutputEnvelope,
)
# Generate signed download URL (for user downloads)
url = storage.get_download_url(
"runs/run-123/output/artifacts/report.pdf",
expires_in=3600, # 1 hour
filename="validation-report.pdf",
)
Platform-Specific Implementations¶
The standardized run directory structure (runs/{run_id}/input/ and output/) works across platforms, but each platform may require specific implementation details:
Docker/Kubernetes (Shared Filesystem)¶
- Uses shared volume mounts
- Both web app and containers access the same filesystem path
- Simple and reliable for Docker Compose deployments
Google Cloud Storage¶
- Web app and Cloud Run Jobs both access the same GCS bucket
- Authentication via service accounts (no credentials needed in code)
- Requires IAM configuration for public/private separation
Future Platforms (S3, Azure Blob, etc.)¶
When implementing new storage backends:
- Follow the standard structure - Use
runs/{run_id}/withinput/andoutput/subdirectories. - Implement the
DataStorageinterface - Inherit fromvalidibot.core.storage.base.DataStorage - Handle platform-specific auth - Each platform has its own credential mechanism
- Document IAM/access setup - Public/private separation varies by platform
See validibot/core/storage/gcs.py for an example implementation.
Security Model¶
How It Works¶
Validibot separates public and private files using prefix-based access control:
For GCS deployments:
- Bucket uses uniform bucket-level access (no per-object ACLs)
- An IAM Condition grants
allUsersread access only to thepublic/prefix - The
private/prefix is only accessible to authenticated service accounts - Users download their files via time-limited signed URLs
For local/Docker deployments:
- Public files are served directly via Django's media handling
- Private files are served through authenticated Django views
- Signed URLs use HMAC signatures with Django's
SECRET_KEY
Security Checklist (GCS)¶
- [ ] Bucket has uniform bucket-level access enabled
- [ ] IAM condition restricts
allUserstopublic/prefix only - [ ] Service account has
storage.objectAdminrole - [ ] Application never stores sensitive data under
public/prefix - [ ] Signed URLs have reasonable expiration times (1 hour default)
Security Checklist (Local/Docker)¶
- [ ] Storage volume is not exposed outside the Docker network
- [ ]
SECRET_KEYis set consistently across app instances - [ ] Download endpoints require authentication
Troubleshooting¶
"Permission denied" when uploading¶
Check that:
- Service account has
storage.objectAdminrole on the bucket - Cloud Run is using the correct service account
- For local development, you've run
gcloud auth application-default login
Public files returning 403¶
Check that:
- IAM condition is correctly configured (verify the bucket name in the condition)
- Files are being stored under the
public/prefix - Run
gcloud storage buckets get-iam-policy gs://bucketto verify
Signed URL errors¶
Check that:
- The file exists in storage
- Service account can sign blobs (needs
iam.serviceAccounts.signBlobpermission) - For local storage,
SECRET_KEYis set consistently
Container can't access files¶
Check that:
- Volume is mounted at the correct path in both containers
STORAGE_ROOTenvironment variable matches the mount pathRUN_PATHis set correctly for the validation run