Skip to content

How Validibot Works

This document provides a detailed technical walkthrough of how Validibot executes validation workflows, from initial API request to final results.

System Overview

Validibot operates as an orchestration layer that coordinates validators to process submitted content according to predefined workflows. The system is designed around an asynchronous, event-driven architecture that can handle both quick validations and long-running processes.

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Client API    │────│  Django Views   │────│    Services     │
│   Request       │    │   & ViewSets    │    │     Layer       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                       │
                                ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Models   │    │   Submissions   │    │ Worker / Jobs   │
│   (Workflows,   │◄───│   & Validation  │◄───│ (Cloud Run)     │
│   Runs, etc.)   │    │      Runs       │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                              ┌─────────────────┐
                                              │   Validation    │
                                              │   Validators    │
                                              │   (JSON, XML,   │
                                              │    Custom)      │
                                              └─────────────────┘

The Complete Validation Lifecycle

Phase 1: Workflow Preparation

Before any validation can occur, workflows must be configured:

  1. Workflow Creation: Administrators create workflows through the UI or API

  2. Define workflow metadata (name, version, organization)

  3. Configure validation steps in the desired order
  4. Assign validators and rulesets to each step

  5. Validator Registration: The system maintains a registry of available validators

  6. Built-in validators (JSON Schema, XML Schema)

  7. Custom validators specific to the organization
  8. Each validator defines its capabilities and input requirements

  9. Ruleset Management: Validation rules are managed separately from workflows

  10. JSON Schema documents for structure validation
  11. XML Schema (XSD) files for XML validation
  12. Custom rule files for specialized validators
  13. Rulesets can be versioned and shared across workflows

Phase 2: Validation Initiation

When a client wants to validate content, they trigger the process through the API:

2.1 Request Processing

# Example API call to start validation (org-scoped per ADR-2026-01-06)
POST /api/v1/orgs/{org_slug}/workflows/{workflow_identifier}/runs/
Content-Type: application/json

{
  "name": "user-upload.json",
  "content": "{ \"user\": { \"name\": \"John\" } }"
}

The workflow_identifier can be either the workflow's slug (preferred) or its numeric database ID.

The system supports multiple submission modes:

  • Raw Body Mode: Content sent directly in request body
  • JSON Envelope Mode: Content wrapped in JSON with metadata
  • Multipart Mode: File uploads with additional form data

2.2 Submission Creation

The WorkflowViewSet.start_validation() method processes the request:

  1. Content Ingestion: The system extracts and normalizes the submitted content

  2. Determines content type (JSON, XML, plain text, etc.)

  3. Calculates SHA256 checksum for deduplication
  4. Validates content size and format

  5. Submission Persistence: A Submission record is created

  6. Links to the target workflow and organization

  7. Stores content either inline (for small text) or as file reference
  8. Captures metadata about the submission source

  9. Validation Run Creation: A ValidationRun record is created to track execution

  10. Links the submission to the specific workflow version
  11. Initializes with PENDING status
  12. Records the user who triggered the validation

2.3 Execution Dispatch

The validation work executes inline for simple validators or triggers a Cloud Run Job (GCP) or Docker container (Docker Compose) for heavier workloads. The API returns:

  • 201 Created: If validation completes quickly
  • 202 Accepted: If a Cloud Run Job was launched or execution is still running; clients poll for status

Phase 3: Validation Execution

The ValidationRunService facade delegates execution to the StepOrchestrator, which handles the actual validation work (either inline or coordinating Cloud Run Jobs). See Service Layer Architecture for the full decomposition.

3.1 Execution Setup

  1. Run Initialization: The orchestrator loads the ValidationRun and associated data

  2. Validates the run is in PENDING state

  3. Loads the workflow and its configured steps
  4. Marks the run as RUNNING with start timestamp

  5. Step Sequencing: The orchestrator executes workflow steps in order

  6. Each step is processed sequentially (parallel execution is planned for future versions)
  7. Step execution is isolated - failures in one step don't prevent others from running

3.2 Step Routing: Validators vs Actions

The StepOrchestrator routes each step to the appropriate handler based on its type:

Validator Steps use the ValidationStepProcessor abstraction:

if step.validator:
    from validibot.validations.services.step_processor import get_step_processor

    processor = get_step_processor(validation_run, step_run)
    result = processor.execute()

Action Steps use the StepHandler protocol:

else:
    handler = get_action_handler(step.action.action_type)
    result = handler.execute(run_context)

For detailed documentation on the processor pattern, see Validation Step Processor Architecture.

3.3 Validator Step Execution

For validator steps, the processor pattern provides a clean separation of concerns:

  1. Processor Selection: The factory chooses the appropriate processor

  2. SimpleValidationProcessor: For inline validators (JSON Schema, XML Schema, Basic, THERM)

  3. AdvancedValidationProcessor: For validators requiring dedicated compute (container-based: EnergyPlus, FMU; compute-intensive: AI)

  4. Validator Dispatch: The processor calls the validator

engine = get_validator(validator.validation_type)
result = engine.validate(
    validator=validator,
    submission=submission,
    ruleset=ruleset,
    run_context=run_context,
)
  1. Result Processing: The validator returns a ValidationResult object
  2. passed: Boolean (True/False) or None for async
  3. issues: List of ValidationIssue objects with details
  4. assertion_stats: Structured counts of evaluated assertions
  5. signals: Extracted metrics for downstream steps (advanced validators)
  6. output_envelope: Container output (advanced validators, sync mode only)
  7. For advanced validators, a container ValidationStatus.SUCCESS is treated as a pass even if ERROR messages are present; the processor emits a warning and logs the discrepancy. Output-stage assertion failures still fail the step.

  8. Finding Persistence: The processor saves findings to the database

  9. Creates ValidationFinding records for each issue
  10. Stores assertion counts for run summary
  11. For async validators, preserves input-stage findings when callback arrives

How Findings Are Persisted

Every issue emitted by a validator becomes a ValidationFinding row. The model links to both the ValidationStepRun that produced the issue and the parent ValidationRun. The direct run foreign key is intentionally denormalized so dashboards and APIs can aggregate findings by run or organization without an extra join through step runs. To keep the duplication safe, the model copies the run from the step run during save and raises a ValidationError if someone attempts to associate a finding with a different run. This keeps the ORM focused on read performance while guaranteeing relational integrity. After the run completes we roll these rows up into the ValidationRunSummary/ValidationStepRunSummary tables so long-term reporting remains possible even if old findings are purged.

How Processors Handle Step Lifecycle

The ValidationStepProcessor abstraction consolidates step lifecycle management:

  1. Processor creationget_step_processor() routes to the appropriate processor class based on validator type.

  2. Validator dispatch -- The processor calls engine.validate() (and for advanced validators, engine.post_execute_validate()).

  3. Finding persistencepersist_findings() creates ValidationFinding records:

  4. For sync validators: all findings are persisted at once
  5. For async validators: input-stage findings are persisted on launch, output-stage findings are appended when callback arrives

  6. Signal storage – For advanced validators, metrics extracted from container output are stored in run.summary for downstream step assertions.

  7. Step finalizationfinalize_step() sets ended_at, duration_ms, status, and output fields.

What happens inside the processor

The processor pattern provides clean separation of concerns:

  1. Step metadata is pulled from the database – the WorkflowStep instance supplies the linked Validator, optional Ruleset, and the JSON config column.

  2. Action steps use handlers instead – when workflow_step.action is set, the service uses the StepHandler protocol instead of processors. Handlers like SlackMessageActionHandler or SignedCertificateActionHandler expose strongly typed fields.

  3. Submission content is materialized once -- the active Submission is hydrated from the ValidationRun so every validator works with the same snapshot of data.

  4. Assertion evaluation happens in validators -- Validators evaluate CEL assertions during validate() (input-stage) and post_execute_validate() (output-stage for advanced validators). The validator merges assertions from both the validator's default_ruleset (evaluated first) and the step-level ruleset (evaluated second) into a single pass. Processors just persist the results.

  5. Error handling is centralized -- Any exception raised by the validator is caught by the processor, which creates an error finding and finalizes the step as FAILED.

This design keeps the processor focused on lifecycle orchestration: the step definition owns the configuration, the validator owns domain logic and assertion evaluation, and the processor handles persistence and state transitions.

Authoring walkthrough: see How to Author Workflow Steps for the complete UI flow.

3.4 Validator Implementation

Each validator implements the BaseValidator interface:

class JsonSchemaValidator(BaseValidator):
    def validate(self, submission, ruleset=None, config=None):
        # Load JSON Schema from ruleset
        schema = load_schema(ruleset)

        # Parse submission content as JSON
        data = json.loads(submission.get_content())

        # Run JSON Schema validation
        validator = jsonschema.Draft7Validator(schema)
        errors = list(validator.iter_errors(data))

        # Convert to Validibot format
        issues = [
            ValidationIssue(
                path=error.absolute_path,
                message=error.message,
                severity=Severity.ERROR
            )
            for error in errors
        ]

        return ValidationResult(
            passed=len(issues) == 0,
            issues=issues
        )

Sequence Diagram: Basic Validation Run

sequenceDiagram
    actor Client
    participant API as WorkflowViewSet.start_validation()
    participant Facade as ValidationRunService (facade)
    participant Orch as StepOrchestrator
    participant Registry as Validator Registry
    participant Val as BaseValidator

    Client->>API: POST /orgs/{org}/workflows/{id}/runs/
    API->>Facade: launch(request, workflow, submission)
    Facade->>Facade: ValidationRun.objects.create(...)
    Facade->>Orch: execute_workflow_steps(run_id, user_id)
    API-->>Client: 201 Created or 202 Accepted (if still running)

    Orch->>Orch: mark run RUNNING\nlog start event
    Orch->>Orch: load ordered workflow steps

    loop For each workflow step
        Orch->>Orch: resolve validator, ruleset, config
        Orch->>Registry: get(validation_type)
        Registry-->>Orch: validator class
        Orch->>Val: validate(submission, ruleset, config)
        Val-->>Orch: ValidationResult (passed, issues, stats)
        Orch->>Orch: append step summary\nstop loop on first failure
    end

    Orch->>Orch: aggregate summary\nupdate ValidationRun status
    Orch-->>Facade: ValidationRunTaskResult
    note over Client,Orch: Client polls run detail\nendpoint until status terminal

Phase 4: Result Aggregation

After all workflow steps complete we capture both the detailed findings and a durable summary:

4.1 Summary Generation

Each ValidationIssue emitted by a validator becomes a ValidationFinding row tied to the current ValidationStepRun and ValidationRun. Once all steps finish, build_run_summary_record() aggregates those rows into two lightweight tables:

  • ValidationRunSummary -- run-level severity counts, assertion totals, and status.
  • ValidationStepRunSummary -- per-step severity counts and status.

The summary builder queries ValidationFinding and ValidationStepRun rows directly from the database rather than relying on in-memory metrics. This makes it safe to call in resume scenarios (async callbacks, retries) where earlier steps' findings are already persisted but not in the current process's memory.

These summary tables keep severity totals, assertion hit rates, and per-step health available even after old ValidationFinding rows are purged for retention.

4.2 Run Status Updates

The ValidationRun is updated with final results:

  • Status: One of PENDING, RUNNING, SUCCEEDED, FAILED, CANCELED, TIMED_OUT
  • State: A simplified lifecycle state (PENDING, RUNNING, COMPLETED) derived from Status
  • Result: A stable automation-friendly conclusion (PASS, FAIL, ERROR, CANCELED, TIMED_OUT, UNKNOWN) derived from Status, findings, and error_category
  • End Timestamp: When execution completed
  • Duration: Total execution time in milliseconds
  • Summary Record: One-to-one link to ValidationRunSummary (accessed via run.summary_record)
  • Error: Empty string for success, error message for terminal failures

4.3 Artifact Storage

Any files generated during validation are stored as Artifact records:

  • Validation reports in various formats
  • Transformed or processed versions of input data
  • Debug logs from validators
  • Performance profiling data

Phase 5: Result Delivery

5.1 Synchronous Response (Quick Validations)

If validation completes within the timeout window, the API returns immediately:

{
  "id": "2dd379f6-2425-4bae-8c23-61ed05ff1ebf",
  "status": "SUCCEEDED",
  "state": "COMPLETED",
  "result": "PASS",
  "workflow": {
    "id": 42,
    "name": "JSON Product Validation",
    "version": "1.0"
  },
  "submission": {
    "name": "product.json",
    "checksum_sha256": "abc123..."
  },
  "steps": [
    {
      "step_id": 1,
      "name": "Schema Validation",
      "status": "PASSED",
      "issues": []
    }
  ],
  "started_at": "2023-10-05T14:30:00Z",
  "ended_at": "2023-10-05T14:30:02Z",
  "duration_ms": 2000
}

5.2 Asynchronous Response (Long-running Validations)

For longer validations, the client receives a polling URL:

{
  "id": "2dd379f6-2425-4bae-8c23-61ed05ff1ebf",
  "status": "RUNNING",
  "state": "RUNNING",
  "result": "UNKNOWN",
  "poll_url": "/api/v1/orgs/{org_slug}/runs/2dd379f6-2425-4bae-8c23-61ed05ff1ebf/",
  "started_at": "2023-10-05T14:30:00Z"
}

Clients can poll this URL to check status and retrieve results when complete.

Advanced Features

Content Type Detection

The system automatically detects content types using multiple strategies:

  1. HTTP Headers: Content-Type header from the request
  2. File Extensions: For uploaded files, extension-based detection
  3. Content Sniffing: Analyzing content structure for JSON, XML, CSV, etc.
  4. Magic Bytes: Binary file type detection for non-text formats

Submission File Types

Deterministic MIME headers are not enough for authoring decisions, so we classify every submission into a small set of logical SubmissionFileType values (JSON, XML, TEXT, YAML, BINARY, etc.). Those values power three complementary contracts:

  • Workflows store an allowed_file_types array. Authors decide whether a workflow accepts a single format or multiple formats (for example, an EnergyPlus workflow can allow both TEXT/IDF and JSON/epJSON inputs). The workflow builder surfaces only validators that intersect with the selected file types.
  • Validators declare supported_file_types. System validators receive defaults (JSON Schema → JSON, XML Schema → XML, EnergyPlus → TEXT + JSON, and so on) and custom validators must be explicit.
  • Launch-time enforcement verifies that the selected payload type is included in the workflow allow-list and that every validator in the run can process it. When something doesn’t align, the UI form surfaces a validation error and the API returns FILE_TYPE_UNSUPPORTED along with the offending step name.

Incoming requests still provide concrete MIME types (Content-Type headers, multipart metadata, etc.) so storage helpers can pick safe extensions. After we ingest the payload we re-run lightweight detection; if the actual content clearly differs from the transport hint, we update the stored SubmissionFileType so downstream automation, reporting, and billing all see the canonical format.

Error Handling and Recovery

The system implements robust error handling:

  • Graceful Degradation: Individual step failures don't crash the entire run
  • Timeout Management: Long-running validations are killed after configurable timeouts
  • Retry Logic: Transient failures trigger automatic retries with exponential backoff
  • Circuit Breakers: Repeated failures temporarily disable problematic validators

Performance Optimization

Several optimizations ensure good performance:

  • Content Deduplication: Identical submissions (by SHA256) reuse previous results
  • Lazy Loading: Large files are streamed rather than loaded entirely into memory
  • Connection Pooling: Database connections are pooled for efficiency
  • Result Caching: Validation results are cached for repeated access

Security and Access Control

Security is enforced at multiple levels:

  • Authentication: All API endpoints require valid authentication
  • Organization Isolation: Users can only access resources within their organizations
  • Role-Based Access: Different roles have different permissions (viewer, executor, admin)
  • Content Validation: Uploaded content is scanned for malicious patterns
  • Audit Logging: All actions are logged for security auditing

Monitoring and Observability

Application Metrics

The system exposes rich metrics for monitoring:

  • Validation Throughput: Runs per minute/hour/day
  • Success Rates: Percentage of validations that pass/fail
  • Performance Metrics: Average validation time by workflow and step
  • Error Rates: Frequency of different types of validation failures
  • Resource Usage: CPU, memory, and storage consumption

Event Streaming

Key events are published for external consumption:

# Events published during validation lifecycle
AppEventType.VALIDATION_RUN_CREATED
AppEventType.VALIDATION_RUN_STARTED
AppEventType.VALIDATION_RUN_SUCCEEDED
AppEventType.VALIDATION_RUN_FAILED
AppEventType.VALIDATION_RUN_STEP_STARTED
AppEventType.VALIDATION_RUN_STEP_PASSED
AppEventType.VALIDATION_RUN_STEP_FAILED

These events enable integration with monitoring systems, alerting platforms, and analytics tools.

Debugging Support

When validations fail, the system provides rich debugging information:

  • Step-by-step Execution Logs: Detailed logs from each validation step
  • Content Snapshots: Preserved copies of submitted content for reproduction
  • Configuration Snapshots: The exact configuration used for the validation
  • Timing Information: Performance profiling to identify bottlenecks
  • Stack Traces: Full error details when validators throw exceptions

Future Enhancements

Several enhancements are planned for future versions:

Parallel Step Execution

Currently, workflow steps execute sequentially. Future versions will support:

  • Parallel execution of independent steps
  • Dependency graphs to control execution order
  • Resource allocation and scheduling optimization

Advanced Workflow Features

  • Conditional Steps: Execute steps based on previous results
  • Dynamic Workflows: Generate workflow steps programmatically
  • Workflow Templates: Reusable workflow patterns
  • Step Libraries: Shared marketplace of validation steps

Enhanced Reporting

  • Real-time Dashboards: Live updating validation metrics
  • Trend Analysis: Historical analysis of validation patterns
  • Custom Reports: User-defined reports and visualizations
  • Export Capabilities: Export results in multiple formats

Integration Enhancements

  • Webhook Improvements: Rich webhook payloads with filtering
  • GraphQL API: Alternative API interface for complex queries
  • CLI Tools: Command-line interface for automation
  • IDE Plugins: Integration with development environments

This architecture provides a solid foundation for scalable, reliable data validation while remaining flexible enough to adapt to evolving requirements.