Prometheus is written in Go. Here's how it uses Go's concurrency, interfaces, and standard library to power monitoring at scale.

How Prometheus uses Go to build a time-series database


Prometheus is one of the most widely deployed monitoring systems in the world. It scrapes metrics from your services, stores them in a custom time-series database, evaluates alerting rules, and exposes a powerful query language called PromQL. It’s also written entirely in Go.

But I don’t want to just tell you what Prometheus does. You can read the docs for that. Instead, let’s look at how it uses Go — the patterns, interfaces, concurrency model, and design decisions that make it work. If you write Go professionally, there’s a lot to learn from this codebase.

The Client Library: How Go Services Expose Metrics

Before Prometheus can scrape anything, your service needs to expose metrics. The client_golang library is how you do that.

At its core, the library revolves around a Collector interface:

type Collector interface {
    Describe(chan<- *Desc)
    Collect(chan<- Metric)
}

This is a clean Go pattern. Collectors push metric descriptors and metric values into channels. The registry iterates over all registered collectors when a scrape happens. Channels here aren’t used for concurrency between goroutines — they’re used as iteration mechanisms, which is an interesting choice.

Here’s how you’d register a custom counter and expose it:

package main

import (
	"net/http"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

var requestsTotal = prometheus.NewCounterVec(
	prometheus.CounterOpts{
		Name: "myapp_http_requests_total",
		Help: "Total number of HTTP requests.",
	},
	[]string{"method", "path"},
)

func init() {
	prometheus.MustRegister(requestsTotal)
}

func main() {
	http.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) {
		requestsTotal.WithLabelValues(r.Method, "/api/data").Inc()
		w.Write([]byte("ok"))
	})

	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":8080", nil)
}

Hit /metrics and you’ll see text in the OpenMetrics format. The promhttp.Handler() function calls Gather() on the default registry, which in turn calls Collect() on every registered collector, serializes the results, and writes them to the response.

The counter itself uses atomic.AddUint64 internally for thread safety — no mutexes for the hot path. This matters when you’re incrementing counters on every HTTP request. If you’re interested in how Go handles concurrent data access patterns, I wrote about context and cancellation which touches on similar concerns.

Scraping: Goroutines and the scrape Package

Prometheus pulls metrics from targets on a schedule. The scraping logic lives in the scrape package within prometheus/prometheus.

Each scrape target gets its own goroutine managed by a scrapeLoop. The scrapeManager coordinates everything. When the target list changes (via service discovery), it spins up or tears down scrape loops.

The simplified flow looks like this:

// Pseudocode based on the actual scrape package
func (sl *scrapeLoop) run(ctx context.Context) {
	ticker := time.NewTicker(sl.interval)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			sl.scrapeAndReport(ctx)
		}
	}
}

func (sl *scrapeLoop) scrapeAndReport(ctx context.Context) {
	// Create a new HTTP request with timeout
	scrapeCtx, cancel := context.WithTimeout(ctx, sl.timeout)
	defer cancel()

	// Perform the actual HTTP GET to /metrics
	resp, err := sl.scraper.scrape(scrapeCtx)
	if err != nil {
		sl.reportError(err)
		return
	}

	// Parse the response and append samples to TSDB
	sl.append(resp)
}

This is textbook Go concurrency. Each scrape loop is a goroutine with a ticker and a select on context.Done(). When you cancel the parent context, all scrape loops shut down cleanly. No thread pool management, no executor service — just goroutines, channels, and context propagation.

The actual scraper uses Go’s net/http client with configurable TLS, timeouts, and connection pooling. The scrape package also implements a custom body parser that avoids allocating new byte slices on every scrape. It reuses buffers via bytes.Buffer, which keeps GC pressure low when you’re scraping thousands of targets every 15 seconds.

TSDB: A Custom Time-Series Database in Go

This is where things get really interesting. Prometheus doesn’t use an external database. It ships its own time-series database (TSDB) built from scratch in Go. The code lives at prometheus/prometheus/tsdb.

The Write Path

Incoming samples first go into a “head” block — an in-memory structure. The head uses a combination of sync.RWMutex for metadata and lock-free structures for the actual sample data.

Each time series is identified by a set of labels. The head maintains a hash map from label fingerprints to series objects:

// Simplified from tsdb/head.go
type Head struct {
	series     *stripeSeries
	chunkRange atomic.Int64
	// ...
}

// stripeSeries uses sharding to reduce lock contention
type stripeSeries struct {
	size    atomic.Uint64
	series  []map[chunks.HeadSeriesRef]*memSeries
	hashes  []seriesHashmap
	locks   []stripeLock
}

The stripeSeries struct shards the series map across multiple buckets, each with its own lock. This is a well-known pattern for reducing contention in concurrent maps. When a sample comes in, Prometheus hashes the series labels and routes to the correct shard. Only that shard’s lock is acquired.

This is worth studying. If you’ve ever built a concurrent cache in Go and used a single sync.Map or sync.RWMutex, you’ve likely hit contention issues at scale. Sharded locking is the answer, and Prometheus implements it cleanly.

Chunks and Compression

Samples within a series are stored in “chunks” — compressed blocks of timestamp-value pairs. Prometheus uses a variant of Gorilla compression (originally from Facebook’s time-series paper). The Go implementation encodes deltas of deltas for timestamps and XOR-based encoding for float values.

// Simplified from tsdb/chunkenc/xor.go
func (a *xorAppender) Append(t int64, v float64) {
	if a.numSamples == 0 {
		// First sample: store raw values
		a.t = t
		a.v = v
		a.numSamples++
		return
	}

	tDelta := uint64(t - a.t)

	if a.numSamples == 1 {
		a.tDelta = tDelta
		a.writeBits(tDelta, 64)
	} else {
		// Encode delta-of-delta for timestamps
		dod := int64(tDelta - a.tDelta)
		a.writeDeltaOfDelta(dod)
		a.tDelta = tDelta
	}

	// XOR current value with previous
	// If identical, write a zero bit
	vDelta := math.Float64bits(v) ^ math.Float64bits(a.v)
	if vDelta == 0 {
		a.writeBit(zero)
	} else {
		a.writeXORValue(vDelta)
	}

	a.t = t
	a.v = v
	a.numSamples++
}

This compression is critical. A raw float64 + int64 pair takes 16 bytes. With Gorilla compression, the average sample drops to about 1.37 bytes. That’s more than a 10x reduction, and it’s all happening in pure Go with bit-level manipulation.

Compaction and the WAL

The head periodically compacts into persistent blocks on disk. Each block is a directory containing chunks, an index, and metadata. Prometheus also maintains a write-ahead log (WAL) for crash recovery. The WAL implementation uses os.File with explicit fsync calls to ensure durability.

The compaction process runs in its own goroutine and merges overlapping blocks. It’s careful about not blocking the write path — it creates new blocks in temporary directories and atomically renames them.

PromQL: A Query Engine in Go

PromQL is Prometheus’s query language. The engine parses queries into an AST, then evaluates them against the TSDB.

The parser is hand-written (not generated), which gives Prometheus fine-grained control over error messages. The evaluator walks the AST and returns promql.Vector or promql.Matrix results.

What’s interesting from a Go perspective is how the engine handles concurrency. Range queries can be expensive, so the engine supports configurable concurrency limits:

// From promql/engine.go (simplified)
type EngineOpts struct {
	MaxSamples           int
	Timeout              time.Duration
	ActiveQueryTracker   *ActiveQueryTracker
	LookbackDelta        time.Duration
	EnableNegativeOffset bool
}

The ActiveQueryTracker limits concurrent queries using a semaphore pattern. It memory-maps a file that tracks active queries, so if Prometheus crashes during a query, you can see what was running. This is a practical debugging tool built directly into the engine.

If you’ve worked with functional options in Go, you’ll notice Prometheus takes a simpler approach here with an options struct. Both patterns work fine — the options struct is easier to understand when the number of parameters is large and mostly set at initialization.

Service Discovery: Interfaces and Pluggability

Prometheus needs to know what to scrape. It supports dozens of service discovery mechanisms — Kubernetes, Consul, EC2, DNS, file-based, and more.

All of these implement a common Go interface:

// From discovery package
type Discoverer interface {
	Run(ctx context.Context, up chan<- []*targetgroup.Group)
}

Each discoverer runs in its own goroutine and pushes target groups into a channel when the target list changes. The discovery manager merges results from all active discoverers and passes the unified list to the scrape manager.

This is a great example of Go interfaces enabling pluggability without frameworks. Adding a new discovery mechanism means implementing one method. The rest of the system doesn’t change.

The Kubernetes service discovery, for instance, uses the client-go library and watches Kubernetes API resources. When pods or services change, it sends updated target groups through the channel. Because context.Context is threaded through everything, shutdown is clean — cancel the context and all discoverers stop.

Alerting Rules and Evaluation

Prometheus evaluates alerting rules on a schedule. Each rule group runs in its own goroutine with a ticker, similar to the scrape loop pattern. When a rule fires, it sends alerts to Alertmanager.

The alert notification uses Go’s net/http client to POST JSON to Alertmanager’s API. The notifier maintains a queue and handles retries with backoff — all implemented with goroutines and channels.

If you’re building systems that need similar periodic evaluation loops, the Prometheus codebase is a good reference. The pattern of “goroutine + ticker + context cancellation” appears everywhere and is worth internalizing. For more on how Go handles these coordination patterns, check out how context works in Go.

What Go Developers Can Learn from Prometheus

A few patterns from the Prometheus codebase worth stealing:

  1. Sharded locking — When a single mutex becomes a bottleneck, shard your data and use per-shard locks. The stripeSeries pattern is a clean implementation of this.

  2. Interface-based pluggability — The Discoverer and Collector interfaces show how small interfaces keep systems composable without dependency injection frameworks.

  3. Buffer reuse — The scrape package reuses byte buffers aggressively. If you’re processing high-throughput data in Go, avoid allocating on every request.

  4. Goroutine-per-task with context — Instead of worker pools, Prometheus often runs one goroutine per logical unit (scrape target, rule group) and coordinates with context cancellation.

  5. Bit-level encoding in pure Go — The chunk encoding shows that Go is capable of the kind of low-level bit manipulation you’d normally associate with C. No unsafe needed.

Prometheus is a large codebase, but it’s well-organized and idiomatic. If you’re looking to level up your Go skills, reading through the tsdb and scrape packages is time well spent. The source is on GitHub — start with tsdb/head.go and follow the write path from there.