Real-time Data Validation Platform

A full-stack platform providing real-time data quality feedback across connected systems — built to surface synchronization errors, business rule violations, and data drift as they happen, not when a frustrated user files a ticket.

JavaSpring BootReactAstroTypeScriptSQL ServerWebSockets

The most expensive data quality problems are the ones nobody notices for days.

A student’s grade level changes incorrectly in the SIS. The provisioning system creates an account with the wrong policy group. An address field fails a format check but gets written anyway because the receiving system had lenient validation. Each of these is small in isolation; each compounds if it flows downstream into five other systems before anyone catches it.

The data validation platform was built to make the invisible visible — to provide real-time feedback on the quality of data as it moves through the district’s connected systems, so problems are caught at the source rather than discovered in a report a week later.

The design challenge: feedback for whom?

The first and hardest question was not technical. It was: who needs to know about a data quality problem, and in what form?

A data engineer wants structured logs and queryable violation records. A school registrar wants a plain-language summary of which student records have issues. An IT director wants a dashboard showing system-wide data health trends, not individual record details. The platform needs to serve all three without being designed for none of them.

The solution was a tiered output model. The validation engine produces structured violation records — normalized, machine-readable, queryable. Downstream presenters transform those records into the right format for each audience: a React dashboard for the IT team, targeted digest emails for data owners at each school, and a REST API for other systems that want to consume quality signals.

The validation engine

The engine is a Spring Boot service that sits in the event stream alongside the data pipelines. Every data record that flows through the integration layer is validated asynchronously — it does not block the main data flow, but it evaluates each record against a set of configured rules and publishes any violations.

Rules are organized into four categories:

Format rules — field-level checks: does this look like a date? Is this phone number the right length? Is this email address syntactically valid? These are the cheapest checks and catch the most common data entry errors.

Business rules — domain-specific invariants: a student cannot have a graduation year more than 8 years in the future. A staff member cannot be assigned to a building that doesn’t exist in the district’s building registry. A course enrollment cannot reference a section with zero capacity.

Cross-system consistency rules — checks that compare the same conceptual field across multiple systems: does the grade level in the SIS match the LMS? Does the email address in Active Directory match the address in Google Workspace?

Drift rules — temporal checks: has this record been updated within the expected window? A student’s emergency contact that hasn’t been verified in 18 months is flagged for review, not errored, but flagged.

@Component
public class GradeLevelConsistencyRule implements CrossSystemRule {

    @Override
    public List<Violation> evaluate(CrossSystemRecord record) {
        String sisGrade  = record.get(SIS, "gradeLevel");
        String lmsGrade  = record.get(LMS, "gradeLevel");

        if (sisGrade == null || lmsGrade == null) return List.of(); // absence handled separately

        if (!sisGrade.equals(lmsGrade)) {
            return List.of(Violation.builder()
                .ruleId("GRADE_CONSISTENCY")
                .severity(Severity.HIGH)
                .message("Grade level mismatch: SIS=%s, LMS=%s".formatted(sisGrade, lmsGrade))
                .authoritySystem(SIS) // SIS is authoritative; LMS should be corrected
                .affectedRecord(record.getId())
                .build());
        }
        return List.of();
    }
}

Real-time feedback in the UI

The React dashboard updates in real time via WebSocket. When the validation engine publishes a new violation, the dashboard reflects it within seconds — not on the next page refresh, not in the next morning’s report.

This matters most for the data entry workflows: staff entering student records in the SIS get near-immediate feedback if a record they just created has failed a validation rule. The round-trip — submit record, validation fires, violation appears in dashboard, data owner sees alert — is typically under 10 seconds.

The Astro framework handles the page shell and navigation. React component islands handle the interactive validation dashboard and the real-time violation feed. Static routes — rule documentation, audit history, data owner directories — ship as prerendered HTML. This split was intentional: the interactive surfaces need reactivity; the documentation surfaces need fast loads and no JavaScript overhead.

What it caught

In the first quarter of operation, the platform identified:

  • 847 grade level discrepancies between the SIS and the LMS, concentrated in two schools that had performed manual roster adjustments outside the standard workflow
  • 1,200+ student records with phone number format violations, traced to a data entry field in the legacy SIS that accepted free text
  • 43 staff accounts where the Active Directory email and Google Workspace email had diverged — in all cases, an alias had been added in one system but not replicated to the other

None of these were new problems. All of them were invisible before the platform existed.