Datastore
The structured data store behind the Data API — NESO's replacement for CKAN's PostgreSQL datastore extension.
Datastore
Purpose
Holds the actual rows of tabular data behind every resource in the catalogue. CKAN owns the metadata; the datastore owns the data.
- Store every row of every published tabular resource.
- Serve fast point queries and previews to the Data API.
- Support bulk export for download/streaming endpoints.
NESO replaces CKAN's built-in PostgreSQL datastore extension with a dedicated datastore. PostgreSQL still backs CKAN's metadata, but tabular resource rows live here, not in CKAN's database.
The Data API is the only service that reads from it; ingestion (Airflow) is the only path that writes to it.
Approach
Instead of CKAN's datastore extension (rows in PostgreSQL, queried via datastore_search / datastore_search_sql), NESO uses a separate query backend sized for the workload. The datastore is variant-aware — the choice depends on deployment scale and longevity goals.
| Variant | When to use | Status |
|---|---|---|
| BigQuery | Production deployments, large or growing data volumes, need for managed scaling and SQL analytics. | Immediately available — primary target for NESO. |
| DuckLake | Smaller, self-contained deployments where running BigQuery is overkill or undesirable. | Future-proof option, planned. |
Both variants expose the same query surface to the Data API, so consumers do not need to know which is in use — and neither needs PostgreSQL on the read path.
Tech stack
| Layer | Tech |
|---|---|
| Production datastore | BigQuery (GCP) |
| Alternative datastore | DuckLake (DuckDB-backed lakehouse) |
| Query interface | Standard SQL via the Data API |
| Storage format (DuckLake) | Parquet on object storage |
See also
Last reviewed: 2026-05-04