Architecture Overview

The NESO Data Portal is built on a layered, decoupled architecture: a thin presentation layer, a pair of API services that mediate every request, and a shared persistence and ingestion backbone.

The datastore is variant-aware. BigQuery is the immediately available option for production deployments; DuckLake is the future-proof choice targeted at smaller, self-contained installations.

flowchart TB
    subgraph PL [Presentation Layer]
        Admin[PortalJS Admin]
        DXP[DXP Frontend]
    end

    subgraph API [API & Core Layer]
        CKAN[CKAN Service]
        DataAPI[Data API Service]
    end

    subgraph DI [Data & Infrastructure]
        Storage[(Object Storage)]
        Solr[(Solr)]
        Postgres[(Postgres)]
        Redis[(Redis)]
        Datastore[(Datastore<br/>BigQuery or DuckLake)]
    end

    subgraph Ing [Ingestion]
        Airflow[Airflow ETL]
        Ext[External Data Source]
    end

    subgraph Mon [Monitoring]
        Prom[Prometheus]
        Graf[Grafana]
    end

    PL --> API
    API --> DI
    Ing --> DI
    API -.events.-> Mon

Layers

Presentation Layer

The user-facing surface, served by two distinct applications:

  • PortalJS Admin — used by Admin and Publisher users to manage datasets, organizations, groups, and users.
  • DXP Frontend — used by general users to browse, search, preview, and download data.

API & Core Layer

Two services that together expose the portal's functionality:

  • CKAN Service — source of truth for users, organizations, dataset metadata, and resource records. Manages search index updates.
  • Data API Service — sits between the frontend and the datastore. Enforces authorization (delegating to CKAN) and exposes the Datastore API surface for data preview, query building, and bulk consumption.

Data & Infrastructure

The persistence and search backbone:

  • Object Storage — resource files (CSV, Parquet, etc.) uploaded by publishers.
  • Solr — CKAN's search index for metadata.
  • Postgres — CKAN's relational database (users, organizations, dataset records).
  • Redis — CKAN cache and background job queue.
  • Datastore — the actual rows of tabular data; this is where the variants diverge (BigQuery or DuckLake).

Ingestion

  • Airflow runs scheduled ETL DAGs that pull from external data sources and load into the datastore.
  • ETL emits resource events that downstream systems consume.

Monitoring

  • Prometheus scrapes metrics from CKAN and the Data API.
  • Grafana dashboards consumed by Admins.

Components


Last reviewed: 2026-05-04

Built with LogoFlowershow