How to use eBPF uprobes to watch goroutine creation, scheduling, and blocking in live Go programs — no code changes required.

Tracing Goroutines in Realtime with eBPF


Goroutines are cheap to create. That’s the promise Go makes, and it delivers. But “cheap” doesn’t mean “free to reason about.” When your Go service is running thousands of goroutines and you’re trying to figure out which ones are blocked, which are leaking, and where the scheduling bottlenecks hide, pprof and runtime.NumGoroutine() only tell you so much.

eBPF lets you attach tiny programs to kernel and userspace probe points, including the Go runtime itself. You can trace goroutine creation, scheduling events, and channel operations in a running process with zero code changes and minimal overhead. This picked up attention on Hacker News recently, and I think the excitement is warranted: it gives you runtime visibility that Go’s standard tooling simply can’t.

How the Go runtime manages goroutines

Before we attach probes, we need to understand what we’re probing. The Go runtime uses an M:N scheduling model. Goroutines (G) get multiplexed onto OS threads (M), which are bound to logical processors (P). The runtime functions we care about:

  • runtime.newproc — called when you use the go keyword
  • runtime.gopark — parks (blocks) a goroutine
  • runtime.goready — marks a goroutine as runnable again
  • runtime.execute — begins executing a goroutine on a thread

None of these are exported. They’re internal runtime functions. But they have stable symbol names in the compiled binary, which means we can attach uprobes to them.

If you want to understand goroutine lifecycle management better, it’s worth reading about how context in Go interacts with goroutine cancellation. Context cancellation ultimately triggers gopark and goready calls internally.

Attaching eBPF uprobes to Go binaries

The Go library we’ll use is cilium/ebpf, a pure-Go library for loading and managing eBPF programs. It compiles eBPF C code into bytecode and loads it via the kernel’s BPF syscall, all from Go.

Here’s a minimal eBPF C program that attaches to runtime.newproc:

// trace_goroutine.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

struct event {
    __u32 pid;
    __u64 timestamp;
    __u64 goroutine_id;
};

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
} events SEC(".maps");

SEC("uprobe/runtime_newproc")
int trace_newproc(struct pt_regs *ctx) {
    struct event e = {};
    e.pid = bpf_get_current_pid_tgid() >> 32;
    e.timestamp = bpf_ktime_get_ns();
    // The goroutine ID is in the goid field of the current G struct.
    // Its offset varies by Go version — we read it from the TLS slot.
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &e, sizeof(e));
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

This fires every time a new goroutine is created and emits an event to a perf buffer. Simple enough on the eBPF side.

The Go side: loading and reading events

On the Go side, we use cilium/ebpf to load the compiled eBPF object, attach the uprobe, and read events from the perf buffer:

package main

import (
	"bytes"
	"encoding/binary"
	"fmt"
	"log"
	"os"
	"os/signal"
	"syscall"

	"github.com/cilium/ebpf"
	"github.com/cilium/ebpf/link"
	"github.com/cilium/ebpf/perf"
)

type Event struct {
	PID         uint32
	Timestamp   uint64
	GoroutineID uint64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("usage: tracer <path-to-go-binary>")
	}
	targetBinary := os.Args[1]

	// Load the compiled eBPF program
	spec, err := ebpf.LoadCollectionSpec("trace_goroutine.o")
	if err != nil {
		log.Fatalf("loading spec: %v", err)
	}

	coll, err := ebpf.NewCollection(spec)
	if err != nil {
		log.Fatalf("creating collection: %v", err)
	}
	defer coll.Close()

	// Open the target binary's executable for uprobe attachment
	ex, err := link.OpenExecutable(targetBinary)
	if err != nil {
		log.Fatalf("opening executable: %v", err)
	}

	// Attach uprobe to runtime.newproc
	up, err := ex.Uprobe("runtime.newproc", coll.Programs["trace_newproc"], nil)
	if err != nil {
		log.Fatalf("attaching uprobe: %v", err)
	}
	defer up.Close()

	// Read events from the perf buffer
	reader, err := perf.NewReader(coll.Maps["events"], os.Getpagesize()*8)
	if err != nil {
		log.Fatalf("creating perf reader: %v", err)
	}
	defer reader.Close()

	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)

	fmt.Println("Tracing goroutine creation... Ctrl+C to stop.")

	go func() {
		<-sig
		reader.Close()
	}()

	for {
		record, err := reader.Read()
		if err != nil {
			return
		}

		var event Event
		if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
			log.Printf("parsing event: %v", err)
			continue
		}

		fmt.Printf("PID=%d goroutine created at t=%d\n", event.PID, event.Timestamp)
	}
}

A few things worth calling out. The link.OpenExecutable call parses the ELF binary to find symbol addresses. Go binaries are statically linked by default, so all runtime symbols are present. The Uprobe method resolves runtime.newproc by name and inserts a breakpoint at its entry.

Dealing with Go’s ABI and stack layout

Here’s where it gets annoying. Go doesn’t follow the standard C calling convention. Before Go 1.17, all arguments were passed on the stack. Since Go 1.17, the register-based ABI (ABIInternal) passes arguments in registers.

This matters because when your eBPF probe fires, you need to know where to find function arguments. For runtime.newproc, the interesting argument is the function pointer being scheduled as a new goroutine. On ABIInternal, it’s in a register (typically RAX). On the stack-based ABI, it’s at a stack offset.

You can check which ABI your binary uses:

package main

import (
	"debug/elf"
	"fmt"
	"log"
	"os"
	"strings"
)

func main() {
	f, err := elf.Open(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	symbols, err := f.Symbols()
	if err != nil {
		log.Fatal(err)
	}

	for _, s := range symbols {
		if strings.Contains(s.Name, "runtime.newproc") {
			fmt.Printf("Symbol: %s, Value: 0x%x, Size: %d\n", s.Name, s.Value, s.Size)
		}
	}
}

If you see runtime.newproc.abi0 alongside runtime.newproc, the binary uses the register ABI. You’ll want to attach to runtime.newproc (the ABIInternal variant) and read arguments from registers in your eBPF program using PT_REGS_PARM1(ctx).

Tracing goroutine blocking with gopark

Knowing when goroutines are created is interesting. Knowing when and why they block is where the real debugging power lives. runtime.gopark is called whenever a goroutine blocks — on a channel receive, a mutex, a select, a timer, a network operation. The reason parameter tells you why.

You can attach a second uprobe to runtime.gopark and extract the parking reason. The reason is an enum (waitReason) defined in runtime/runtime2.go:

// From Go source: src/runtime/runtime2.go
const (
	waitReasonZero                  waitReason = iota
	waitReasonGCAssistMarking
	waitReasonIOWait
	waitReasonChanReceive
	waitReasonChanSend
	waitReasonFinalizerWait
	waitReasonForceGCIdle
	waitReasonSemacquire
	waitReasonSleep
	waitReasonSyncCondWait
	waitReasonSyncMutexLock
	waitReasonTimerGoroutineIdle
	// ... more reasons
)

By reading this parameter from the uprobe context, you can build a real-time histogram of why your goroutines are blocked. Channel operations, mutex contention, network I/O — all visible without touching the target process.

If you’re debugging goroutine leaks specifically, this pairs well with patterns like singleflight that reduce redundant goroutine creation in the first place.

Extracting the goroutine ID

Go deliberately doesn’t expose goroutine IDs in its public API. The runtime has them internally though — every g struct has a goid field. To read it from eBPF, you need two things: the offset of goid within the g struct, and the current g pointer.

On amd64 Linux, the current goroutine pointer lives in thread-local storage, accessible via the FS segment register. The goid field offset changes between Go versions but is typically around 152 bytes (Go 1.21+). You can find it with:

$ go version -m ./your-binary | head -5
$ readelf -Wi ./your-binary | grep runtime.g

Or more reliably, parse the DWARF info:

package main

import (
	"debug/dwarf"
	"debug/elf"
	"fmt"
	"log"
	"os"
)

func main() {
	f, err := elf.Open(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	d, err := f.DWARF()
	if err != nil {
		log.Fatal(err)
	}

	reader := d.Reader()
	for {
		entry, err := reader.Next()
		if err != nil || entry == nil {
			break
		}

		if entry.Tag == dwarf.TagStructType {
			nameField := entry.AttrField(dwarf.AttrName)
			if nameField != nil && nameField.Val.(string) == "runtime.g" {
				fmt.Println("Found runtime.g struct")
				// Iterate children to find goid field and its offset
				for {
					child, err := reader.Next()
					if err != nil || child == nil || child.Tag == 0 {
						break
					}
					name := child.AttrField(dwarf.AttrName)
					offset := child.AttrField(dwarf.AttrDataMemberLoc)
					if name != nil && name.Val.(string) == "goid" {
						fmt.Printf("goid offset: %d\n", offset.Val.(int64))
						return
					}
				}
			}
		}
	}
}

This gives you the exact byte offset for the Go version that compiled your binary. Hardcoding offsets is fragile. Parsing DWARF is the right approach for anything production-grade.

Practical considerations

Performance overhead. Uprobes use software breakpoints (INT3 on x86). Each goroutine creation hits one. For most services, this is negligible. If you’re creating millions of goroutines per second, you’ll feel it. Use BPF ring buffers instead of perf buffers for better throughput, and consider sampling.

Stripped binaries. If your Go binary is stripped (-ldflags="-s -w"), symbol names are gone. Uprobes need symbols to resolve addresses. Either don’t strip, or resolve addresses manually from DWARF data before stripping.

Go version sensitivity. The runtime’s internal struct layouts change between releases. A tool built for Go 1.21 may break with Go 1.23. Always parse DWARF info at runtime rather than hardcoding offsets. I cannot stress this enough if you want your tooling to survive upgrades.

Permissions. eBPF requires CAP_BPF and CAP_PERFMON (or root). In containerized environments, you may need privileged pods or specific security contexts.

Why not just use pprof?

pprof is sampling-based. It gives you statistical profiles — where CPU time is spent, how many goroutines exist, what they’re doing at sample time. Great for aggregate views.

eBPF tracing is event-based. You see every goroutine creation, every block, every unblock. You can correlate events across time. You can build real-time dashboards showing goroutine lifecycle events as they happen.

The tradeoff is straightforward: pprof is easier to set up and sufficient for most profiling work. eBPF tracing makes sense when you need event-level detail — debugging a specific scheduling anomaly, finding the exact goroutine that’s leaking, or understanding latency distributions in the scheduler. It’s a scalpel, not a hammer.

For a refresher on Go’s standard debugging tools, check out the post on profiling and optimizing Go programs.

Where to go from here

eBPF uprobes let you observe the Go runtime without modifying it. By attaching to runtime.newproc, runtime.gopark, and runtime.goready, you get a complete picture of goroutine lifecycle events in real time. The cilium/ebpf library makes it possible to build these tools entirely in Go.

The hard parts are Go-specific: dealing with the register-based ABI, finding struct field offsets via DWARF, and locating the current goroutine pointer through TLS. Once you’ve solved those, you have a tracing system that works on any Go binary without recompilation.

I’d love to see someone build this into a proper CLI tool with version-aware DWARF parsing and a TUI for live goroutine state. The pieces are all here. The annoying plumbing is documented above. What’s missing is someone packaging it up and making it easy to reach for when pprof isn’t cutting it.