Skip to content

What is Snowpack?

Snowpack is a control plane for Apache Iceberg table maintenance. It keeps Iceberg tables healthy by automatically discovering tables through a PyIceberg catalog, analyzing their health using Iceberg metadata, and running maintenance operations through Spark via Kyuubi.

Why it exists. Iceberg tables accumulate small files, stale snapshots, and redundant manifests over time. Left unchecked, this metadata and file bloat degrades query performance and inflates storage costs. Snowpack automates the maintenance that would otherwise require manual, per-table intervention.

Who uses it. The Data Platform team operates Snowpack. Data engineers across Fetch opt their tables in by setting a single table property — Snowpack handles the rest.

How it works. Snowpack follows a three-stage loop: discover tables from the catalog, analyze their health against configurable thresholds, and run the appropriate maintenance actions through Spark. Jobs are asynchronous and all state is persisted in Postgres. You can interact with Snowpack through the REST API, the CLI, or let the automated orchestrator CronJob handle everything on a schedule.

Snowpack architecture