ArchitectureComponentsDatastore

Datastore

Purpose

Holds the actual rows of tabular data behind every resource in the catalogue. CKAN owns the metadata; the datastore owns the data.

  • Store every row of every published tabular resource.
  • Serve fast point queries and previews to the Data API.
  • Support bulk export for download/streaming endpoints.

NESO replaces CKAN's built-in PostgreSQL datastore extension with a dedicated datastore. PostgreSQL still backs CKAN's metadata, but tabular resource rows live here, not in CKAN's database.

The Data API is the only service that reads from it; ingestion (Airflow) is the only path that writes to it.

Approach

Instead of CKAN's datastore extension (rows in PostgreSQL, queried via datastore_search / datastore_search_sql), NESO uses a separate query backend sized for the workload. The datastore is variant-aware — the choice depends on deployment scale and longevity goals.

VariantWhen to useStatus
BigQueryProduction deployments, large or growing data volumes, need for managed scaling and SQL analytics.Immediately available — primary target for NESO.
DuckLakeSmaller, self-contained deployments where running BigQuery is overkill or undesirable.Future-proof option, planned.

Both variants expose the same query surface to the Data API, so consumers do not need to know which is in use — and neither needs PostgreSQL on the read path.

Tech stack

LayerTech
Production datastoreBigQuery (GCP)
Alternative datastoreDuckLake (DuckDB-backed lakehouse)
Query interfaceStandard SQL via the Data API
Storage format (DuckLake)Parquet on object storage

See also


Last reviewed: 2026-05-04

Built with LogoFlowershow