Configuration

Snowpack is configured through environment variables. Each component reads its own set of variables at startup. Variables without a default are required.

API

The API server connects to Spark Thrift Server for query execution and to Postgres for job state and cached health data.

Variable	Default	Description
`SNOWPACK_SPARK_HOST`	`localhost`	Spark Thrift Server hostname.
`SNOWPACK_SPARK_PORT`	`10000`	Spark Thrift Server port.
`SNOWPACK_CATALOG`	`glue_catalog`	Iceberg catalog name used in Spark SQL statements.
`SNOWPACK_POSTGRES_HOST`	`localhost`	PostgreSQL hostname.
`SNOWPACK_POSTGRES_PORT`	`5432`	PostgreSQL port.
`SNOWPACK_POSTGRES_DATABASE`	`snowpack`	PostgreSQL database name.
`SNOWPACK_POSTGRES_USER`	`snowpack`	PostgreSQL username.
`SNOWPACK_POSTGRES_PASSWORD`	—	PostgreSQL password. No default; must be provided.
`SNOWPACK_DRAIN_MODE`	`off`	Set to `on` to reject new maintenance submissions. Existing running jobs are unaffected. Useful during planned Spark downtime.

Health Sync Worker

The health sync worker periodically loads table metadata from the PyIceberg catalog and writes health snapshots to Postgres. It also optionally pushes metrics to Mimir via OTLP.

Variable	Default	Description
`SNOWPACK_HEALTH_SYNC_INTERVAL_SECONDS`	`900`	Health sync cadence in seconds (15 min). Set to `0` to disable the sync loop entirely.
`SNOWPACK_HEALTH_SYNC_DATABASES`	(all)	Comma-separated list of databases to sync. When unset, all databases in the catalog are synced.
`SNOWPACK_HEALTH_SYNC_CONCURRENCY`	`10`	Max concurrent PyIceberg table loads. Use `~2` on memory-constrained pods to avoid OOM kills.
`SNOWPACK_MIMIR_ENDPOINT`	(unset)	OTLP gRPC endpoint for Mimir metrics push. Leave empty to disable metrics push.
`SNOWPACK_GLUE_CATALOG`	`lakehouse_dev`	Glue catalog name used by PyIceberg for direct metadata access.
`AWS_REGION`	`us-east-1`	AWS region for Glue and S3 API calls.

Orchestrator

The orchestrator is a CronJob that queries the API for table health, decides which tables need maintenance, and submits jobs. It does not connect to Spark directly.

Variable	Default	Description
`SNOWPACK_API_URL`	`http://snowpack-api.snowpack.svc.cluster.local`	Snowpack API base URL. The orchestrator calls this for health checks and job submissions.
`SNOWPACK_MAINTENANCE_CADENCE_HOURS`	`6`	Global minimum hours between maintenance runs for a given table. Individual tables can override this via the `snowpack.maintenance_cadence_hours` table property.
`SNOWPACK_HEALTH_CONCURRENCY`	`10`	Max concurrent health check requests to the API during the discovery phase.
`SNOWPACK_MAX_SUBMIT`	`3`	Max jobs the orchestrator will queue in a single run. Prevents overloading Spark when many tables need maintenance simultaneously.
`SNOWPACK_POLL_INTERVAL`	`30`	Seconds between job status polls while waiting for submitted jobs to complete.
`SNOWPACK_OPT_IN_MODE`	`true`	When `true`, only tables with `snowpack.maintenance_enabled = true` are considered. When `false`, all tables are eligible unless explicitly excluded via `compaction_skip`.
`SNOWPACK_INCLUDE_DATABASES`	(unset)	Comma-separated database allowlist. When set, only tables in these databases are considered. This is the Helm `orchestrator.allowedDatabases` value.
`SNOWPACK_DRY_RUN`	`false`	When `true`, the orchestrator logs all decisions but does not submit any maintenance jobs. Useful for validating configuration changes.
`SNOWPACK_SLACK_WEBHOOK_URL`	(unset)	Slack incoming webhook URL. When set, the orchestrator posts a summary after each run. Optional.