Datastore

Purpose

Holds the actual rows of tabular data behind every resource in the catalogue. CKAN owns the metadata; the datastore owns the data.

Store every row of every published tabular resource.
Serve fast point queries and previews to the Data API.
Support bulk export for download/streaming endpoints.

NESO replaces CKAN's built-in PostgreSQL datastore extension with a dedicated datastore. PostgreSQL still backs CKAN's metadata, but tabular resource rows live here, not in CKAN's database.

The Data API is the only service that reads from it; ingestion (Airflow) is the only path that writes to it.

Approach

Instead of CKAN's datastore extension (rows in PostgreSQL, queried via datastore_search / datastore_search_sql), NESO uses a separate query backend sized for the workload. The datastore is variant-aware — the choice depends on deployment scale and longevity goals.

Variant	When to use	Status
BigQuery	Production deployments, large or growing data volumes, need for managed scaling and SQL analytics.	Immediately available — primary target for NESO.
DuckLake	Smaller, self-contained deployments where running BigQuery is overkill or undesirable.	Future-proof option, planned.

Both variants expose the same query surface to the Data API, so consumers do not need to know which is in use — and neither needs PostgreSQL on the read path.

Tech stack

Layer	Tech
Production datastore	BigQuery (GCP)
Alternative datastore	DuckLake (DuckDB-backed lakehouse)
Query interface	Standard SQL via the Data API
Storage format (DuckLake)	Parquet on object storage

Datastore

Purpose

Approach

Tech stack

See also