Architecture Overview
NESO Data Portal system architecture
Architecture Overview
The NESO Data Portal is built on a layered, decoupled architecture: a thin presentation layer, a pair of API services that mediate every request, and a shared persistence and ingestion backbone.
The datastore is variant-aware. BigQuery is the immediately available option for production deployments; DuckLake is the future-proof choice targeted at smaller, self-contained installations.
flowchart TB
subgraph PL [Presentation Layer]
Admin[PortalJS Admin]
DXP[DXP Frontend]
end
subgraph API [API & Core Layer]
CKAN[CKAN Service]
DataAPI[Data API Service]
end
subgraph DI [Data & Infrastructure]
Storage[(Object Storage)]
Solr[(Solr)]
Postgres[(Postgres)]
Redis[(Redis)]
Datastore[(Datastore<br/>BigQuery or DuckLake)]
end
subgraph Ing [Ingestion]
Airflow[Airflow ETL]
Ext[External Data Source]
end
subgraph Mon [Monitoring]
Prom[Prometheus]
Graf[Grafana]
end
PL --> API
API --> DI
Ing --> DI
API -.events.-> Mon
Layers
Presentation Layer
The user-facing surface, served by two distinct applications:
- PortalJS Admin — used by Admin and Publisher users to manage datasets, organizations, groups, and users.
- DXP Frontend — used by general users to browse, search, preview, and download data.
API & Core Layer
Two services that together expose the portal's functionality:
- CKAN Service — source of truth for users, organizations, dataset metadata, and resource records. Manages search index updates.
- Data API Service — sits between the frontend and the datastore. Enforces authorization (delegating to CKAN) and exposes the Datastore API surface for data preview, query building, and bulk consumption.
Data & Infrastructure
The persistence and search backbone:
- Object Storage — resource files (CSV, Parquet, etc.) uploaded by publishers.
- Solr — CKAN's search index for metadata.
- Postgres — CKAN's relational database (users, organizations, dataset records).
- Redis — CKAN cache and background job queue.
- Datastore — the actual rows of tabular data; this is where the variants diverge (BigQuery or DuckLake).
Ingestion
- Airflow runs scheduled ETL DAGs that pull from external data sources and load into the datastore.
- ETL emits resource events that downstream systems consume.
Monitoring
- Prometheus scrapes metrics from CKAN and the Data API.
- Grafana dashboards consumed by Admins.
Components
- Servers
- PortalJS Admin
- DXP Frontend
- CKAN
- Data API
- Datastore — variant-aware
- Ingestion
- Monitoring
Last reviewed: 2026-05-04