Architecture Overview

The NESO Data Portal is built on a layered, decoupled architecture: a thin presentation layer, a pair of API services that mediate every request, and a shared persistence and ingestion backbone.

The datastore is variant-aware. BigQuery is the immediately available option for production deployments; DuckLake is the future-proof choice targeted at smaller, self-contained installations.

flowchart TB
    subgraph PL [Presentation Layer]
        Admin[PortalJS Admin]
        DXP[DXP Frontend]
    end

    subgraph API [API & Core Layer]
        CKAN[CKAN Service]
        DataAPI[Data API Service]
    end

    subgraph DI [Data & Infrastructure]
        Storage[(Object Storage)]
        Solr[(Solr)]
        Postgres[(Postgres)]
        Redis[(Redis)]
        Datastore[(Datastore<br/>BigQuery or DuckLake)]
    end

    subgraph Ing [Ingestion]
        Airflow[Airflow ETL]
        Ext[External Data Source]
    end

    subgraph Mon [Monitoring]
        Prom[Prometheus]
        Graf[Grafana]
    end

    PL --> API
    API --> DI
    Ing --> DI
    API -.events.-> Mon

Layers

Presentation Layer

The user-facing surface, served by two distinct applications:

PortalJS Admin — used by Admin and Publisher users to manage datasets, organizations, groups, and users.
DXP Frontend — used by general users to browse, search, preview, and download data.

API & Core Layer

Two services that together expose the portal's functionality:

CKAN Service — source of truth for users, organizations, dataset metadata, and resource records. Manages search index updates.
Data API Service — sits between the frontend and the datastore. Enforces authorization (delegating to CKAN) and exposes the Datastore API surface for data preview, query building, and bulk consumption.

Data & Infrastructure

The persistence and search backbone:

Object Storage — resource files (CSV, Parquet, etc.) uploaded by publishers.
Solr — CKAN's search index for metadata.
Postgres — CKAN's relational database (users, organizations, dataset records).
Redis — CKAN cache and background job queue.
Datastore — the actual rows of tabular data; this is where the variants diverge (BigQuery or DuckLake).

Ingestion

Airflow runs scheduled ETL DAGs that pull from external data sources and load into the datastore.
ETL emits resource events that downstream systems consume.

Monitoring

Prometheus scrapes metrics from CKAN and the Data API.
Grafana dashboards consumed by Admins.

Components

Last reviewed: 2026-05-04