Back to work
Case study 01 Health Equity · Operational Analytics · Platform Delivery

Access‑IQ

An NHS Trust access & inequality analytics platform — a simulated consultancy engagement, built end to end.

Role
Solo — Data & Analytics Engineer
Engagement
Simulated NHS consultancy
Client
"Northshire Trust"
Status
Delivered · 2025–2026

Tech stack

Infrastructure
AWS CDKS3Redshift ServerlessECS FargateRDS PostgresTransfer FamilyGlue CatalogLambdaVPC peeringKMS · Secrets · IAMCloudWatchECR
Transform & orchestration
dbtPrefect 3Great Expectationsdbt-expectationsSpectrumDuckDBParquetPython
Dashboard
StreamlitPlotly
Dev tooling
uvMakeRuffMypyPytestpre-commitGitHub Actionsstructlog
01

Executive summary

Access-IQ is a data engineering and analytics platform built for Northshire NHS Trust — a simulated engagement, scoped and delivered as a real consultancy project. Northshire Trust generates thousands of operational data points a day across electronic health records, appointments, urgent care and diagnostics — yet leadership can’t answer the questions that matter most:

Where are our A&E pressures concentrated, and which populations are most affected?

Are we meeting our referral-to-treatment targets equitably across all patient groups?

Which communities are experiencing the worst access outcomes — and are we closing the gap?

Is our performance on health inequality improving or deteriorating over time?

Access-IQ is the platform that answers them: a production-grade pipeline spanning two AWS accounts that ingests three sources into a governed Bronze / Silver / Gold lakehouse, models inequality metrics to public-health standard, and surfaces them in a dashboard a Trust director can use — defined entirely as code that tears itself down to $0 when idle.

Business outcomes

Single source of truth

One governed, auditable dataset behind every board report — down from 4+ conflicting extracts.

Inequality made visible

Disparities measurable across IMD, ethnicity, age and sex — for the first time, simultaneously.

Operational action

Managers can see where inequity emerges, trace the cause, and track interventions over time.

Sustainable by design

$0 idle, ~90 min cold deploy to live dashboard, auto-protected against budget overruns.

02

Discovery

The engagement began with a 2.5-hour discovery workshop with six stakeholders, spanning operational, clinical, strategic, and EDI leadership. The goal was to understand the current reporting landscape and surface the questions the Trust needed to answer.

We report RTT performance to the board monthly — but it's a single Trust-wide number. When NHSE ask whether the 18-week target is equitable across population groups, we can't answer.
Dr Sarah Mitchell Chief Operating Officer
Are DNA rates higher in certain ethnic groups? Is that language, transport, digital access? I can't even get the first number reliably — let alone the "why".
Rachel Dunmore Equality, Diversity & Inclusion Lead
My A&E breach rates look fine Trust-wide, but certain postcodes present later, with higher acuity, and wait longer in department.
Dr Amir Patel Clinical Lead, Urgent & Emergency Care
We know the data exists — three systems, five extracts, none of them talking to each other.
James Okafor Director of Strategy & Transformation

The workshop included a dedicated data landscape session, mapping every source system the Trust relied on for operational reporting. Five data sources across three systems — an EHR (Postgres), a daily SFTP appointment feed, and a diagnostics S3 export — each with its own identifiers, refresh cadence, and quality characteristics. Ethnicity coding was present but incomplete (~85% coverage). IMD was derivable from postcode but not stored. Community care records existed in a separate system with no patient identifier linkage, and were descoped to Phase 2. Across all conversations, five themes emerged consistently:

FragmentationInconsistent definitionsRetrospective-only reportingEquity blind spotsManual toil
The group converged on one definition of success: tell a clear story of where inequities exist, and how interventions are moving them — backed by governed data.
03

Requirements

The discovery workshop produced a signed-off Business Requirements Document covering 39 requirements across three tiers: functional, non-functional, and compliance. In NHS-adjacent work, compliance requirements carry equal weight to functional ones — they shape architecture from day one, not as an afterthought. MoSCoW prioritisation was used to make explicit scope decisions for Phase 1.

23 Must
10 Should
5 Won't
Must - the core platform Should - enhanced capability Won't (Phase 1)

Deliberately out of Phase 1: community-care linkage, predictive models, real-time streaming, multi-Trust federation, and clinical write-back — descoped to keep the timeline credible.

Functional

25
  • Idempotent ingestion of three heterogeneous sources
  • RTT (>18wk) & DM01 (>6wk) breach detection
  • Wait times by IMD, ethnicity, age & sex
  • Slope & Relative Index of Inequality
  • Small-cell suppression for counts < 5
  • Self-serve dashboard for executive and clinical users

Non-functional

6
  • Under $500/month, alerting at 80% of ceiling
  • Fully defined as code — no console drift
  • CI quality gates on every PR
  • Operable by a 3-person analytics team

Compliance

8
  • Non-reversible, keyed pseudonymisation (DSPT / Caldicott)
  • Encryption at rest & in transit
  • Least-privilege, role-based access
  • Full source-to-dashboard audit trail
  • Network isolation — no public path to source
04

Solution design

The architecture spans two AWS accounts separated by a VPC peering connection. The Trust account holds source systems — RDS Postgres, Transfer Family SFTP, and S3 exports. The Platform account runs the analytics stack with controlled, auditable cross-account IAM access. Every data path is explicit — no direct connections between environments.

Trust account · source
RDS PostgresEHR + urgent care
Transfer FamilySFTP appointment drops
S3diagnostics & provider exports
VPC peering
Platform account · analytics
ECS Fargateparallel ingestion + Prefect
S3 data lakeBronze · KMS-encrypted
Redshift ServerlessSilver + Gold · Spectrum
LambdaHMAC UDF · budget teardown
Bronzeraw append-only parquet
Silverconformed · pseudonymised
Golddimensional marts · SII/RII
Streamlitstatic Parquet via DuckDB

The entire platform is defined in AWS CDK across 10 single-responsibility stacks. Splitting them lets the budget Lambda drop compute, warehouse and network without ever touching the lake, secrets or catalog.

The decision ledger

Each architectural decision was evaluated against alternatives. Each row documents the choice, the trade-off, and the mitigation.

Because

AWS-native, Spectrum reads Bronze Parquet directly, scales to $0 idle, and dbt-redshift is mature.

Trade-off

Cold-start latency of 30–90 seconds on the first query after a deploy.

Mitigation

A pre-warm SELECT 1 baked into make up.

Because

$0 idle, Python-native flows the team already writes, and it reuses the existing Fargate cluster.

Trade-off

Flow history is lost between sessions.

Mitigation

Append-only Bronze S3 manifests are the durable audit log.

Because

Bare SHA-256 is rainbow-trivial on 10-digit NHS numbers; HMAC needs key compromise to reverse, yet stays deterministic so records still join.

Trade-off

One-way — the original NHS number can't be recovered.

Mitigation

By design, per Caldicott principles — analytics never needs it.

Because

$0/month, available 24/7, with no Redshift dependency for the dashboard to stay up.

Trade-off

Data is only as fresh as the last pipeline run.

Mitigation

Export date shown in the sidebar; re-run make pipeline to refresh.

Because

The controller holds the key policy — DSPT-aligned, supports rotation and per-role decryption.

Trade-off

KMS API charges (~$0.003 / 10K requests).

Mitigation

S3 Bucket Key enabled to amortise the calls.

Because

$0 idle versus roughly $32/month for an always-on NAT Gateway.

Trade-off

3–5 minute recreation on every make up.

Mitigation

Parallel CDK stack deployment absorbs the wait.

Because

The source imd_decile column is unreliable; the LSOA-to-IMD lookup is the canonical ONS methodology.

Trade-off

Requires seed-data maintenance.

Mitigation

The lsoa_imd_lookup seed is versioned in dbt.

Because

Decouples dashboard availability from warehouse uptime — and enables $0 idle.

Trade-off

Adds an export step to the pipeline.

Mitigation

Automated inside the daily Prefect flow.

05

Engineering deep dive

Access-IQ models a consultancy engagement across two AWS accounts: a Trust account standing in for the NHS client (operational source systems — Postgres EHR, SFTP appointment drops, S3 diagnostic exports) and a Platform account owned by the ‘vendor’ (ingestion, warehouse, analytics). The two are VPC-peered, and the boundary is deliberate — it forces every cross-org data movement to be explicit.

Data moves through a Bronze → Silver → Gold medallion architecture. Bronze lands in an S3 data lake as Parquet; Silver and Gold are modelled in dbt on Redshift. Bronze is exposed to Redshift through a Glue Data Catalog and Redshift Spectrum external tables, so dbt can query raw data in place without first loading it into the warehouse. Environments are split dev/prod as separate accounts, with secrets namespaced per environment.

Two operational facts shape the runtime design:

The infrastructure is ephemeral. Compute, warehouse, and networking are deployed at the start of a working session and destroyed at the end to avoid idle cost. Between sessions, the only durable state is what lives in S3 — there is no always-on database or scheduler holding pipeline state. The dataset is a 12-month backfill that then runs live. History is generated to look as though the platform has been ingesting daily for a year; a Trust-side Lambda then releases one new business day of data every 30 minutes, which the Platform pipeline ingests on a schedule. Consequently the pipeline processes one business day at a time, and ingest_date throughout the system denotes the business date of the data, not the wall-clock time of ingestion — a deliberate choice so backfilled history partitions by the day each record actually belongs to.

daily-ingest flow · ECS Fargate
PostgresEHR · PyArrow
SFTPappts · SHA-256
Trust S3copy_object
↓ all three complete
Spectrum refresh
dbt Silver
GE gate
dbt Gold
Gold export

The Great Expectations gate is blocking — Gold is never built on unvalidated person-level data.

By concern:

Idempotent Ingestion

Every ingestion run is governed by a manifest: a small JSON record written to S3 at the end of the run that serves as both the audit record and the control signal. Four properties fall out of that single primitive.

Idempotency. Before opening a database connection, an SFTP transport, or listing a single S3 object, each ingestion path checks whether the latest manifest for that source and date already succeeded, and returns early if so (Snippet 1, Snippet 2). The check fails open: a corrupt or unreadable manifest re-ingests rather than skipping, so malformed state is never mistaken for success. Because the skip is keyed on business date, a scheduler that fires repeatedly only does work when a date hasn't yet been ingested successfully.

Append-only auditability. Append-only is structural, not enforced by a policy (Snippet 3). Every run mints a fresh uuid4 run_id and writes its manifest to a key namespaced by that id, so two runs can never collide, and nothing in the codebase deletes or overwrites a manifest. A failed run and its successful re-run coexist as a complete audit trail; "latest wins" is simply max(LastModified) over the prefix — exactly what the idempotency check reads back.

Source-matched integrity. Each manifest records source, environment, run ID, business date, status, start/end timing, an error list, and a per-entity outputs block. Integrity controls are matched to the source rather than applied uniformly (Snippet 4): SFTP is an opaque third-party file drop with no native integrity metadata, so every file is SHA-256'd on read to fingerprint exactly what landed; S3 objects already carry an ETag content hash; and the Postgres path reads a live query result, where a byte digest of the derived Parquet would be meaningless. An integrity control for the Postgres path would capture row count, though that had not yet been implemented and is in the backlog.

Incremental resume. Because the pipeline processes one business day at a time and there is no state between sessions, every run must answer "which day do I process next?" with no database and no cursor file. The manifests are the cursor (Snippet 5). discover_latest_successful_date scans the durable manifest history for a source, newest-first, and returns the most recent date with a successful run; discover_next_business_date then advances all sources together as min(latest_successful_date) + 1 day. The min is deliberate — if one source fails or lags, taking the minimum guarantees none races ahead and leaves a gap, so dbt and the Gold layer always see complete days. The mechanism is stateless and crash-safe: an interrupted run leaves no successful manifest, so the next run rediscovers the same target and continues, letting a freshly redeployed pipeline pick up exactly where the previous session stopped.

Taken together, the manifest is a single primitive behind four properties — idempotency, append-only auditability, source-matched integrity, and incremental resume — rather than four mechanisms bolted on.

idempotency.py
# src/access_iq/ingestion/idempotency.py:25-42
def should_skip_if_already_successful(*, s3: Any, bucket: str, manifest_prefix: str) -> bool:
    manifest_prefix = normalize_manifest_prefix(manifest_prefix)
    key = _latest_manifest_key(s3=s3, bucket=bucket, prefix=manifest_prefix)
    if not key:
        return False

    body = s3.get_object(Bucket=bucket, Key=key)["Body"].read()
    try:
        manifest = json.loads(body)
    except (TypeError, json.JSONDecodeError):
        log.warning("manifest_decode_failed", bucket=bucket, key=key)  #fail open: re-ingest, don't skip on bad data
        return False

    if not isinstance(manifest, dict):
        log.warning("manifest_not_dict", bucket=bucket, key=key)  #fail open: never treat malformed manifest as success
        return False

    return bool(manifest.get("status") == "success")
postgres.py
# src/access_iq/ingestion/postgres.py:104-115
if should_skip_if_already_successful(
    s3=s3, bucket=platform_bucket, manifest_prefix=manifest_prefix
):
    bound_log.info("ingest_skipped", reason="latest_manifest_success")
    return {
        "source": db,
        "run_id": run_id,
        "env": env,
        "ingest_date": ingest_date.isoformat(),
        "status": "skipped",
        "reason": "latest_manifest_success",
    }
manifests.py
# src/access_iq/ingestion/manifests.py:48-49, 58-80

# run_id is defined as str(uuid.uuid4())  #fresh per run — guarantees a unique manifest key

def build_manifest_key(*, source: str, ingest_date: str, run_id: str) -> str:
    #run_id namespaces the key, so two runs can never target the same object
    return f"_manifests/source={source}/ingest_date={ingest_date}/run_id={run_id}.json"


def write_manifest(
    *, s3: Any, bucket: str, manifest: Manifest, kms_key_arn: str | None = None
) -> str:
    key = build_manifest_key(
        source=manifest.source,
        ingest_date=manifest.ingest_date,
        run_id=manifest.run_id,
    )
    body = json.dumps(manifest.model_dump(), indent=2, default=str).encode("utf-8")
    #only ever put_object to a fresh run_id key — no delete, no overwrite anywhere
    s3.put_object(
        Bucket=bucket,
        Key=key,
        Body=body,
        ContentType="application/json",
        **s3_kms_args(kms_key_arn),
    )
    return key
sftp.py
# src/access_iq/ingestion/sftp.py:31-32, 134-137
def sha256_bytes(b: bytes) -> str:
    return hashlib.sha256(b).hexdigest()

# ... within ingest_sftp_directory_to_bronze:
with sftp.open(remote_path, "rb") as f:
    data = f.read()

digest = sha256_bytes(data)  #fingerprint exactly what landed — SFTP gives no integrity metadata
manifests.py
# src/access_iq/ingestion/manifests.py:83-119
def discover_latest_successful_date(*, s3: Any, bucket: str, source: str) -> date | None:
    """Find the latest ingest_date with a successful manifest for a source.

    Scans manifest prefixes by ingest_date, checks from newest to oldest,
    and returns the first date whose latest manifest has status=success.
    """
    prefix = normalize_manifest_prefix(f"_manifests/source={source}/")
    paginator = s3.get_paginator("list_objects_v2")

    dates: set[date] = set()
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter="/"):
        for cp in page.get("CommonPrefixes", []):
            match = _re.search(r"ingest_date=(\d{4}-\d{2}-\d{2})", cp["Prefix"])
            if match:
                dates.add(date.fromisoformat(match.group(1)))

    if not dates:
        return None

    for d in sorted(dates, reverse=True):  #newest first — stop at the first proven-good date
        mp = build_manifest_prefix(source=source, ingest_date=d.isoformat())
        latest: dict[str, Any] | None = None
        for page in paginator.paginate(Bucket=bucket, Prefix=mp):
            for obj in page.get("Contents", []):
                if latest is None or obj["LastModified"] > latest["LastModified"]:
                    latest = obj
        if not latest:
            continue
        body = s3.get_object(Bucket=bucket, Key=latest["Key"])["Body"].read()
        try:
            manifest_data = json.loads(body)
            if isinstance(manifest_data, dict) and manifest_data.get("status") == "success":
                return d
        except (TypeError, json.JSONDecodeError):
            continue

    return None
06

Showcase

Three dashboard pages, each structured around questions Trust leadership needs to answer. The dashboard reads static Gold Parquet exports via DuckDB — no live warehouse connection required, available 24/7 at $0/month hosting cost.

Wait Times dashboard
Business question
Are our referral-to-treatment times within acceptable limits across the Trust?
Insight
Trust-wide KPIs suggest moderate pressure — but provider-level breakdown reveals two sites with P90 waits above 250 days, more than double the Trust average. The aggregate number was masking a concentrated problem.
Decision enabled
Commission a targeted capacity review at the two outlier sites. The provider-level breakdown provides the evidence base — without it, the issue would remain invisible in Trust-wide reporting.
Inequality — Median wait by IMD
Business question
Are we providing equitable access to care across all socioeconomic groups?
Insight
Patients in the most deprived areas wait nearly three times longer than those in the least deprived — a 91-day gap between Decile 1 and Decile 10. The gradient is consistent across all deprivation levels, not an isolated outlier at either extreme. The SII of 147.32 quantifies the full slope for board reporting.
Decision enabled
Target pathway redesign and outreach at the highest-deprivation deciles. The SII gives leadership a single trackable figure for quarterly inequality-reduction commitments — and a baseline to measure whether interventions are closing the gap.
Urgent Care dashboard
Business question
When does A&E come under the most pressure — and are we staffed to match it?
Insight
Demand concentrates heavily on Saturday and Sunday evenings between 17:00 and 21:00 — a pattern invisible in weekly or monthly aggregate reporting. Weekday staffing patterns don't reflect where the actual pressure falls.
Decision enabled
Redesign weekend triage staffing rotas to match observed demand peaks. The heatmap provides the evidence base for a commissioner-facing business case — visual, specific, and impossible to dismiss with aggregate data.
07

Outcomes

The platform runs end to end from a cold deploy — three parallel ingests, 20 dbt models, automated quality gates and dashboard export — in about 90 minutes, and back to $0 in 40.

Must requirements 22 delivered · 1 partial 23
Should requirements 9 delivered · 1 gap 10
0+
lines of Python across the platform
0+
lines of infrastructure as code
0
CDK stacks · full deploy/destroy lifecycle
0
dbt models · 10 Silver + 10 Gold
0
Architecture Decision Records
$0
idle cost · from $2 a working session
What worked
  • The two-account boundary forced every access path through explicit IAM and VPC peering — a more secure design than a single account.
  • Manifest-based idempotency handled every re-run and backfill without special cases.
  • Ephemeral infrastructure made configuration drift impossible — nothing survives a teardown unless it's in code.
What I'd change
  • Build one vertical slice end-to-end before scaling horizontally — building all of each layer first meant cross-layer issues surfaced late, as expensive rework.
  • Wire live-stack integration tests from week one — several Spectrum and cross-account IAM bugs only surfaced during manual deploys.

What's next

From moving data to learning from it.

The successor turns the Gold marts into features: demand forecasting to predict capacity pressure weeks ahead, and a DNA-risk model that flags at-risk patients for proactive outreach — measured back through the same inequality dashboard, closing the loop between prediction and outcome. That's the next case study.