Ingestion
Pipelines that load data into the datastore — triggered by CKAN events or run on a schedule.
Ingestion
Purpose
Brings data into the portal. Pipelines write to the datastore in two modes:
- Event-driven — when a dataset or resource is created or updated in CKAN,
ckanext-aircantriggers an Airflow DAG to load it. - Scheduled — for external sources that publish on their own cadence, DAGs run on a fixed schedule and pull on each tick.
Tech stack
| Layer | Tech |
|---|---|
| Orchestrator | Apache Airflow |
| DAG language | Python |
| Target | Datastore (BigQuery or DuckLake) |
| CKAN integration | ckanext-aircan |
Repo locations
- CKAN extension — https://github.com/datopian/ckanext-aircan
- Aircan DAGs — https://github.com/datopian/aircan
See also
Last reviewed: 2026-05-04