Tracing Goroutines in Realtime with eBPF
Goroutines are cheap to create. That’s the promise Go makes, and it delivers. But “cheap” doesn’t mean “free to reason about.” When your Go service is running thousands of goroutines and you’re trying to figure out which ones are blocked, which are leaking, and where the scheduling bottlenecks hide, pprof and runtime.NumGoroutine() only tell you so much.
eBPF lets you attach tiny programs to kernel and userspace probe points, including the Go runtime itself. You can trace goroutine creation, scheduling events, and channel operations in a running process with zero code changes and minimal overhead. This picked up attention on Hacker News recently, and I think the excitement is warranted: it gives you runtime visibility that Go’s standard tooling simply can’t.
How the Go runtime manages goroutines
Before we attach probes, we need to understand what we’re probing. The Go runtime uses an M:N scheduling model. Goroutines (G) get multiplexed onto OS threads (M), which are bound to logical processors (P). The runtime functions we care about:
runtime.newproc— called when you use thegokeywordruntime.gopark— parks (blocks) a goroutineruntime.goready— marks a goroutine as runnable againruntime.execute— begins executing a goroutine on a thread
None of these are exported. They’re internal runtime functions. But they have stable symbol names in the compiled binary, which means we can attach uprobes to them.
If you want to understand goroutine lifecycle management better, it’s worth reading about how context in Go interacts with goroutine cancellation. Context cancellation ultimately triggers gopark and goready calls internally.
Attaching eBPF uprobes to Go binaries
The Go library we’ll use is cilium/ebpf, a pure-Go library for loading and managing eBPF programs. It compiles eBPF C code into bytecode and loads it via the kernel’s BPF syscall, all from Go.
Here’s a minimal eBPF C program that attaches to runtime.newproc:
// trace_goroutine.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct event {
__u32 pid;
__u64 timestamp;
__u64 goroutine_id;
};
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
} events SEC(".maps");
SEC("uprobe/runtime_newproc")
int trace_newproc(struct pt_regs *ctx) {
struct event e = {};
e.pid = bpf_get_current_pid_tgid() >> 32;
e.timestamp = bpf_ktime_get_ns();
// The goroutine ID is in the goid field of the current G struct.
// Its offset varies by Go version — we read it from the TLS slot.
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &e, sizeof(e));
return 0;
}
char LICENSE[] SEC("license") = "GPL";
This fires every time a new goroutine is created and emits an event to a perf buffer. Simple enough on the eBPF side.
The Go side: loading and reading events
On the Go side, we use cilium/ebpf to load the compiled eBPF object, attach the uprobe, and read events from the perf buffer:
package main
import (
"bytes"
"encoding/binary"
"fmt"
"log"
"os"
"os/signal"
"syscall"
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/perf"
)
type Event struct {
PID uint32
Timestamp uint64
GoroutineID uint64
}
func main() {
if len(os.Args) < 2 {
log.Fatal("usage: tracer <path-to-go-binary>")
}
targetBinary := os.Args[1]
// Load the compiled eBPF program
spec, err := ebpf.LoadCollectionSpec("trace_goroutine.o")
if err != nil {
log.Fatalf("loading spec: %v", err)
}
coll, err := ebpf.NewCollection(spec)
if err != nil {
log.Fatalf("creating collection: %v", err)
}
defer coll.Close()
// Open the target binary's executable for uprobe attachment
ex, err := link.OpenExecutable(targetBinary)
if err != nil {
log.Fatalf("opening executable: %v", err)
}
// Attach uprobe to runtime.newproc
up, err := ex.Uprobe("runtime.newproc", coll.Programs["trace_newproc"], nil)
if err != nil {
log.Fatalf("attaching uprobe: %v", err)
}
defer up.Close()
// Read events from the perf buffer
reader, err := perf.NewReader(coll.Maps["events"], os.Getpagesize()*8)
if err != nil {
log.Fatalf("creating perf reader: %v", err)
}
defer reader.Close()
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("Tracing goroutine creation... Ctrl+C to stop.")
go func() {
<-sig
reader.Close()
}()
for {
record, err := reader.Read()
if err != nil {
return
}
var event Event
if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
log.Printf("parsing event: %v", err)
continue
}
fmt.Printf("PID=%d goroutine created at t=%d\n", event.PID, event.Timestamp)
}
}
A few things worth calling out. The link.OpenExecutable call parses the ELF binary to find symbol addresses. Go binaries are statically linked by default, so all runtime symbols are present. The Uprobe method resolves runtime.newproc by name and inserts a breakpoint at its entry.
Dealing with Go’s ABI and stack layout
Here’s where it gets annoying. Go doesn’t follow the standard C calling convention. Before Go 1.17, all arguments were passed on the stack. Since Go 1.17, the register-based ABI (ABIInternal) passes arguments in registers.
This matters because when your eBPF probe fires, you need to know where to find function arguments. For runtime.newproc, the interesting argument is the function pointer being scheduled as a new goroutine. On ABIInternal, it’s in a register (typically RAX). On the stack-based ABI, it’s at a stack offset.
You can check which ABI your binary uses:
package main
import (
"debug/elf"
"fmt"
"log"
"os"
"strings"
)
func main() {
f, err := elf.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
defer f.Close()
symbols, err := f.Symbols()
if err != nil {
log.Fatal(err)
}
for _, s := range symbols {
if strings.Contains(s.Name, "runtime.newproc") {
fmt.Printf("Symbol: %s, Value: 0x%x, Size: %d\n", s.Name, s.Value, s.Size)
}
}
}
If you see runtime.newproc.abi0 alongside runtime.newproc, the binary uses the register ABI. You’ll want to attach to runtime.newproc (the ABIInternal variant) and read arguments from registers in your eBPF program using PT_REGS_PARM1(ctx).
Tracing goroutine blocking with gopark
Knowing when goroutines are created is interesting. Knowing when and why they block is where the real debugging power lives. runtime.gopark is called whenever a goroutine blocks — on a channel receive, a mutex, a select, a timer, a network operation. The reason parameter tells you why.
You can attach a second uprobe to runtime.gopark and extract the parking reason. The reason is an enum (waitReason) defined in runtime/runtime2.go:
// From Go source: src/runtime/runtime2.go
const (
waitReasonZero waitReason = iota
waitReasonGCAssistMarking
waitReasonIOWait
waitReasonChanReceive
waitReasonChanSend
waitReasonFinalizerWait
waitReasonForceGCIdle
waitReasonSemacquire
waitReasonSleep
waitReasonSyncCondWait
waitReasonSyncMutexLock
waitReasonTimerGoroutineIdle
// ... more reasons
)
By reading this parameter from the uprobe context, you can build a real-time histogram of why your goroutines are blocked. Channel operations, mutex contention, network I/O — all visible without touching the target process.
If you’re debugging goroutine leaks specifically, this pairs well with patterns like singleflight that reduce redundant goroutine creation in the first place.
Extracting the goroutine ID
Go deliberately doesn’t expose goroutine IDs in its public API. The runtime has them internally though — every g struct has a goid field. To read it from eBPF, you need two things: the offset of goid within the g struct, and the current g pointer.
On amd64 Linux, the current goroutine pointer lives in thread-local storage, accessible via the FS segment register. The goid field offset changes between Go versions but is typically around 152 bytes (Go 1.21+). You can find it with:
$ go version -m ./your-binary | head -5
$ readelf -Wi ./your-binary | grep runtime.g
Or more reliably, parse the DWARF info:
package main
import (
"debug/dwarf"
"debug/elf"
"fmt"
"log"
"os"
)
func main() {
f, err := elf.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
defer f.Close()
d, err := f.DWARF()
if err != nil {
log.Fatal(err)
}
reader := d.Reader()
for {
entry, err := reader.Next()
if err != nil || entry == nil {
break
}
if entry.Tag == dwarf.TagStructType {
nameField := entry.AttrField(dwarf.AttrName)
if nameField != nil && nameField.Val.(string) == "runtime.g" {
fmt.Println("Found runtime.g struct")
// Iterate children to find goid field and its offset
for {
child, err := reader.Next()
if err != nil || child == nil || child.Tag == 0 {
break
}
name := child.AttrField(dwarf.AttrName)
offset := child.AttrField(dwarf.AttrDataMemberLoc)
if name != nil && name.Val.(string) == "goid" {
fmt.Printf("goid offset: %d\n", offset.Val.(int64))
return
}
}
}
}
}
}
This gives you the exact byte offset for the Go version that compiled your binary. Hardcoding offsets is fragile. Parsing DWARF is the right approach for anything production-grade.
Practical considerations
Performance overhead. Uprobes use software breakpoints (INT3 on x86). Each goroutine creation hits one. For most services, this is negligible. If you’re creating millions of goroutines per second, you’ll feel it. Use BPF ring buffers instead of perf buffers for better throughput, and consider sampling.
Stripped binaries. If your Go binary is stripped (-ldflags="-s -w"), symbol names are gone. Uprobes need symbols to resolve addresses. Either don’t strip, or resolve addresses manually from DWARF data before stripping.
Go version sensitivity. The runtime’s internal struct layouts change between releases. A tool built for Go 1.21 may break with Go 1.23. Always parse DWARF info at runtime rather than hardcoding offsets. I cannot stress this enough if you want your tooling to survive upgrades.
Permissions. eBPF requires CAP_BPF and CAP_PERFMON (or root). In containerized environments, you may need privileged pods or specific security contexts.
Why not just use pprof?
pprof is sampling-based. It gives you statistical profiles — where CPU time is spent, how many goroutines exist, what they’re doing at sample time. Great for aggregate views.
eBPF tracing is event-based. You see every goroutine creation, every block, every unblock. You can correlate events across time. You can build real-time dashboards showing goroutine lifecycle events as they happen.
The tradeoff is straightforward: pprof is easier to set up and sufficient for most profiling work. eBPF tracing makes sense when you need event-level detail — debugging a specific scheduling anomaly, finding the exact goroutine that’s leaking, or understanding latency distributions in the scheduler. It’s a scalpel, not a hammer.
For a refresher on Go’s standard debugging tools, check out the post on profiling and optimizing Go programs.
Where to go from here
eBPF uprobes let you observe the Go runtime without modifying it. By attaching to runtime.newproc, runtime.gopark, and runtime.goready, you get a complete picture of goroutine lifecycle events in real time. The cilium/ebpf library makes it possible to build these tools entirely in Go.
The hard parts are Go-specific: dealing with the register-based ABI, finding struct field offsets via DWARF, and locating the current goroutine pointer through TLS. Once you’ve solved those, you have a tracing system that works on any Go binary without recompilation.
I’d love to see someone build this into a proper CLI tool with version-aware DWARF parsing and a TUI for live goroutine state. The pieces are all here. The annoying plumbing is documented above. What’s missing is someone packaging it up and making it easy to reach for when pprof isn’t cutting it.