Drain Mode
Drain mode is a cluster-wide pause switch that quiesces Snowpack for planned maintenance windows. When enabled, the API rejects new maintenance submissions and the internal stale-lock sweeper is paused.
What it does
When drainMode is set to "on":
POST /tables/{db}/{table}/maintenancereturns503with{"error": "drain_mode"}. All read endpoints (GET /tables,GET /jobs,/healthz,/readyz) continue to work normally.- The
reclaim_stalesweeper pauses. This background thread normally reclaims locks from workers that died without releasing them (SIGKILL, OOM, node crash). During drain mode the sweeper is paused so it does not release locks from workers that are still running their final actions as the cluster quiesces.
Existing running jobs are not cancelled. They are allowed to finish naturally. Drain mode only prevents new work from being submitted.
When to use
Enable drain mode before any operation that could disrupt in-flight maintenance work:
- Postgres maintenance or upgrades (schema migrations, version bumps, PVC resize)
- Spark Thrift Server / Kyuubi maintenance windows
- Snowpack version upgrades that require a clean queue
How to enable
Set drainMode to "on" in the Helm values and apply via Terraform:
# values-dev.yaml (or the environment-specific values file)drainMode: "on"cd terraform/snowpack-api/env/devterraform applyThe API pods will restart with SNOWPACK_DRAIN_MODE=on and immediately begin
rejecting new submissions.
Pre-enable checklist
Before enabling drain mode, make sure in-flight work has completed:
-
Check for running jobs:
Terminal window curl -s https://<snowpack-host>/jobs?status=running | jq .If any jobs are still running, wait for them to complete or cancel them explicitly before proceeding.
-
Verify the job queue is empty:
Terminal window curl -s https://<snowpack-host>/jobs?status=pending | jq .Pending jobs will not be picked up while drain mode is active, but they will resume when drain mode is disabled. If you want a clean slate, cancel pending jobs before enabling drain mode.
How to disable
Set drainMode back to "off" and apply:
drainMode: "off"cd terraform/snowpack-api/env/devterraform applyPost-disable verification
After disabling drain mode, confirm that Snowpack is fully operational:
-
Check readiness:
Terminal window curl -s https://<snowpack-host>/readyzExpect a
200response. If the health-sync CronJob has not run since the maintenance window,/readyzmay return503until the next sync cycle populates the table cache. -
Submit a test dry-run job:
Terminal window curl -X POST https://<snowpack-host>/tables/<database>/<table>/maintenance \-H "Content-Type: application/json" \-d '{"actions": ["expire_snapshots"], "dry_run": true}'A
202 Acceptedresponse confirms that the submission path is working end-to-end.