Submissions¶

A Submission is the entry point into the system.
It represents:

The file being validated (JSON, XML, EnergyPlus IDF, etc.).
The workflow version to run.
The organization, project, and user context.
Metadata such as content type, size, and SHA-256 checksum.

Submissions can have multiple Validation Runs over time, but typically point to the latest run.

User Context¶

Submission.user captures the human (or service) that supplied the payload. We store it even though each ValidationRun also has a user field because a single submission can be re-run many times by different operators:

A data engineer uploads a file, but an admin later replays the same submission to verify fixes.
An API integration pushes content using an org-level API token where no Django User instance exists.
Background processes can enqueue submissions on behalf of a workflow (for example, nightly batch imports) without an authenticated user object.

Those flows mean Submission.user is nullable. When it is NULL we rely on the organization/project ForeignKeys and metadata provided in the payload (API key, signed request, etc.) to decide who owns the submission.

Relationship to Validation Runs¶

A ValidationRun references the submission that triggered it, but it records the user who executed the run. Keeping both fields lets the audit trail answer two questions:

Who provided the content? → submission.user
Who triggered this execution? → validation_run.user

When you launch a run via the UI, both values usually match. When executions are scheduled, retried by Celery, or invoked via an API key, the run user may be NULL or different from the submission user. Treat the submission record as the ownership anchor for the payload itself, and the run record as the executor context for a single processing attempt.

Data Retention¶

Submissions support configurable retention policies that control how long the actual content (file or inline text) is stored. This supports compliance requirements and reduces storage costs for workflows that don't need to retain user data.

Retention Policies¶

Policy	Behavior
`DO_NOT_STORE`	Content deleted after validation completes
`STORE_1_DAY`	Content retained for 1 day
`STORE_7_DAYS`	Content retained for 7 days
`STORE_30_DAYS`	Content retained for 30 days
`STORE_PERMANENTLY`	Content retained indefinitely

Key Fields¶

retention_policy: Snapshot of the workflow's retention setting at submission time
expires_at: When content should be purged (null for DO_NOT_STORE or already purged)
content_purged_at: Timestamp when content was purged (audit trail)

Content Purge vs Record Deletion¶

When a submission's retention expires, we purge the content but preserve the record. This means:

The Submission row remains in the database with its metadata intact
content is cleared to empty string
input_file is deleted from storage
checksum_sha256, original_filename, size_bytes are preserved for audit
Associated GCS execution bundles (gs://bucket/runs/{org}/{run}/) are deleted

This approach preserves the audit trail while removing the actual user data.

Defensive FK: ValidationRun.submission¶

ValidationRun.submission uses SET_NULL instead of CASCADE. This means:

If a Submission record is accidentally deleted, the ValidationRun survives
Code accessing run.submission must handle None
API responses show "submission": null when unavailable

Management Commands¶

Two commands handle retention:

# Purge submissions past their expires_at date (run hourly)
python manage.py purge_expired_submissions --batch-size 100

# Process failed purge attempts (run every 5 minutes)
python manage.py process_purge_retries --batch-size 50

PurgeRetry Model¶

When a purge fails (e.g., GCS unavailable), a PurgeRetry record is created for automatic retry with exponential backoff. After 5 failed attempts, manual intervention is required.