Skip to content

Onboarding a Database

Snowpack discovers Iceberg tables through a PyIceberg catalog and runs maintenance for tables that have explicitly opted in. Onboarding a new database is a two-step process: opt tables in at the catalog level, then register the database in the Helm chart so the health-sync and orchestrator CronJobs know about it.

Step 1 — Opt tables in via Spark SQL

Each table must declare that it wants Snowpack maintenance by setting the snowpack.maintenance_enabled table property. Connect to Spark (or Kyuubi) and run:

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'true');

Replace <database> and <table> with the actual database and table names. Repeat for every table in the database that should receive automated maintenance.

Per-table cadence override. By default the orchestrator respects the cluster-wide cadenceHours value (6 hours in dev). To override the cadence for a specific table, set the snowpack.maintenance_cadence_hours property at the same time:

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES (
'snowpack.maintenance_enabled' = 'true',
'snowpack.maintenance_cadence_hours' = '12'
);

Tables without the maintenance_enabled property, or with it set to any value other than true, are ignored by the orchestrator.

Step 2 — Add the database to Helm values

Open charts/snowpack/values-dev.yaml and add the database name to both healthSync.databases and orchestrator.includeDatabases. These are comma-separated strings:

healthSync:
databases: "offer_service,points_service,<new_database>"
orchestrator:
includeDatabases: "offer_service,points_service,<new_database>"

Step 3 — Deploy via Terraform

All Snowpack infrastructure changes are deployed through Terraform. Never run helm install or helm upgrade directly — Terraform owns the Helm release and direct Helm commands cause state drift.

Terminal window
terraform apply

If you modified any files under charts/snowpack/templates/, remember to bump the version field in charts/snowpack/Chart.yaml as well. Terraform detects chart changes by comparing the chart version; template-only edits without a version bump are invisible to the plan.

Step 4 — Verify

After Terraform applies successfully, wait for the next CronJob firing. In the dev environment the orchestrator runs hourly at :30 past the hour.

Check recent orchestrator runs to confirm the new database’s tables were assessed:

Terminal window
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'

A successful run includes tables_assessed, jobs_submitted, and jobs_completed counts. If the new tables do not appear, verify that:

  1. The table property snowpack.maintenance_enabled is set to true in the catalog.
  2. The database is listed in both healthSync.databases and orchestrator.includeDatabases in the deployed values.
  3. The health-sync CronJob has completed at least one cycle since the deploy (runs every 15 minutes).

You can also confirm a specific table is visible in the cache:

Terminal window
curl -s https://<snowpack-host>/tables?database=<new_database>

This returns the list of tables Snowpack knows about for that database. If the list is empty, health-sync has not yet populated the cache for the new database.