May 29, 2026

Hermes: Policy-Driven Lazy Loading for Container Images in Go

Large ML images are brutal on cold starts. A vLLM image can be many gigabytes, and a Kubernetes node may spend minutes pulling and unpacking data before the first process starts. If that node has a GPU attached, those minutes are not just annoying. They are expensive.

Hermes is a Go project that makes SOCI-based lazy loading policy-driven. SOCI, short for Seekable OCI, builds an index for compressed image layers so containerd can fetch the parts a process actually reads instead of downloading the entire layer upfront.

The interesting bit is where Hermes puts the decision. Instead of requiring every application team to annotate workloads or pre-build SOCI artifacts in CI, Hermes introduces a HermesPolicy custom resource. Platform teams declare which images should be optimized, and Hermes handles the controller and node-side plumbing.

The split: controller and daemon

Hermes has two main binaries:

cmd/controller/main.go
cmd/daemon/main.go

That split maps cleanly onto Kubernetes responsibilities.

The controller watches HermesPolicy objects and Pods. When a Pod image matches a policy, the controller enqueues work, resolves or pulls the image through containerd, builds SOCI v1 metadata, and stores the result in its own artifact cache.

The daemon runs on worker nodes as a containerd snapshotter plugin. When containerd asks the soci snapshotter to mount a layer, the daemon can ask the controller for the SOCI index and zTOC blobs. If Hermes has metadata ready, the daemon uses it for lazy loading. If not, it can fall back to the normal container runtime path.

That fallback matters. A policy miss should not stop a Pod from starting.

HermesPolicy is small on purpose

The CRD type lives in pkg/apis/v1alpha1/hermes_policy.go. The part worth noticing is the selector shape:

type HermesPolicySpec struct {
	ImageSelectors []HermesImageSelector `json:"imageSelectors,omitempty"`
}

type HermesImageSelector struct {
	ImageRegex string `json:"imageRegex"`
}

pkg/controller/policy.go matches those regexes against raw image references from Pod specs. That keeps the user-facing API simple: define the image patterns that deserve lazy loading, then let the controller observe the cluster.

An example policy can target something broad like .*vllm.*, or something tighter for a particular registry and tag family. The repo’s Kubernetes example shows the CRD as the operator-facing control point, not a per-Deployment code change.

What the controller builds

The controller path is where most of the “policy-driven” value shows up. The README describes this flow:

A HermesPolicy is created.
A Pod appears with an image reference.
The controller checks the image against the in-memory policy store.
Matching images are queued for SOCI artifact generation.
The controller builds a SOCI v1 index and zTOCs.
Policy status is updated to Building, Ready, or an error state.

The source backs that up: pkg/controller/policy.go owns the watching and matching path, while pkg/controller/builder.go does the build work. The SOCI logic lives under pkg/common/soci/, including index building in soci_index.go and zTOC support under pkg/common/soci/ztoc/.

The key correction here is that Hermes is not just “using the upstream snapshotter.” It is keeping the snapshotter runtime idea, then adding a controller-managed artifact service and policy layer around it.

What the daemon does

The daemon side is under cmd/daemon/app/ and pkg/daemon/. pkg/daemon/grpc/service.go wires up the snapshotter service, and pkg/daemon/fs/fs.go contains the lazy filesystem path.

When a lazy-mounted layer is accessed, the daemon uses SOCI metadata to map file reads to compressed byte ranges. That is the whole trick: with a zTOC, the runtime can fetch the span it needs instead of pulling the whole layer before start.

Hermes also carries practical node-side pieces:

resolver and keychain code for registry authentication
local SOCI store handling
filesystem span management
fallback paths when no valid index exists

That is exactly where a Go daemon is comfortable: long-running process, local filesystem work, network calls to registry or controller, and tight integration with containerd APIs.

The benchmark is impressive, with a caveat

The README reports a vLLM benchmark using a 10.8 GB image:

normal overlayfs Pod Ready: about 5 minutes 34 seconds
Hermes lazy loading Pod Ready: about 15 seconds
reported speedup: 22.2x

There is a caveat, and it is important. Hermes does not remove the first SOCI build cost. One of the repo’s reports shows the index build itself taking several minutes for the large vLLM image. The win appears after the artifact is ready, when nodes can start matching Pods without pulling and unpacking everything first.

That still fits the target workload. If you repeatedly scale the same large model image, paying the index build once and getting much faster Pod startup later is a reasonable trade.

Why this is a useful Go codebase

Hermes is worth reading because it combines three patterns that show up in serious Kubernetes tooling:

a CRD-backed controller using Kubernetes informers
a per-node daemon that integrates with containerd
a shared internal package tree for registry, SOCI, cache, compression, and ID mapping utilities

The repo is not a toy. It has CRD types, generated deep-copy code, deployment manifests, benchmark reports, systemd units for the daemon, and a real controller/daemon split. If you are building an operator that needs node-local behavior, this is the shape to study.

The broader idea is also good: make optimization a platform policy. Application teams should not have to understand SOCI indexes just to get better cold starts. They should ship images. The platform should decide which ones need lazy loading.