Parsing Norton Ghost disk images in pure Go
If you’ve ever tried to open a .gho file and hit a wall of “proprietary format, no documentation,” you already know why this library exists.
Norton Ghost was everywhere in the late ’90s and 2000s. Sysadmins used it to clone drives, image machines, roll out OS deployments. Then it faded, but the .gho files didn’t. They’re still sitting on old backup drives, in forensic evidence lockers, buried in firmware packages. And the format is barely documented.
nyarime/gho is a Go library that parses these images. Pure Go. No CGo. No external C libraries. It reads the binary format, handles multiple compression algorithms, and gives you access to the partition and file data inside. I find it interesting as a case study in the kind of low-level binary work people reflexively reach for C to do.
Why Go for binary format parsing?
Go is a better fit for this than most people expect. encoding/binary gives you direct control over byte order and struct packing. io.Reader and io.SectionReader let you work with slices of files without pulling everything into memory. And you get a single static binary at the end. No shared library headaches.
If you’ve worked with Go’s context package for managing request lifetimes, you’ve seen how Go favors explicit control flow. Same philosophy here: the gho library reads bytes, interprets them according to the format spec, and hands you structured data. No magic.
How the project is organized
The repository splits the GHO format into focused files:
defines.go holds constants and type definitions for the binary format: magic numbers, compression type enums, header structures. reader.go is the main parsing logic. It reads GHO images, interprets the header, and walks the partition table. writer.go handles writing GHO images back out, which matters if you need to modify and re-pack an image. span.go manages multi-span images (Norton Ghost could split a disk image across multiple files, think CDs or floppies). crypto.go deals with password-protected images. fastlz.go and zlib.go are two separate decompression implementations for the two algorithms the format supports. And cmd/gho/main.go is a CLI tool built on top of the library.
This follows a common Go convention: library at the package root, runnable binary in cmd/. Clean separation between reusable code and the application that consumes it.
Compression without CGo
This is the part that impressed me most. GHO images can use zlib or FastLZ compression. Zlib is straightforward since Go’s standard library has compress/flate and compress/zlib built in. The zlib.go file handles that path.
FastLZ is where things get interesting. It’s a lightweight LZ-family algorithm optimized for speed over compression ratio, and there’s no Go standard library for it. So the project includes a pure Go implementation in fastlz.go (decompression) and fastlz_compress.go (compression). No CGo bindings to a C library.
Writing your own decompressor means getting intimate with byte slices and bitwise operations. The core loop of any LZ decompressor reads control bytes that say either “copy the next N literal bytes” or “go back M bytes in the output and copy N bytes from there.” In Go, this maps naturally to slice operations:
// Conceptual LZ decompression pattern in Go
// (illustrative of the approach, not copied from the repo)
func decompress(src []byte, dstLen int) []byte {
dst := make([]byte, 0, dstLen)
pos := 0
for pos < len(src) {
ctrl := src[pos]
pos++
if ctrl < 32 {
// Literal run: copy ctrl+1 bytes
length := int(ctrl) + 1
dst = append(dst, src[pos:pos+length]...)
pos += length
} else {
// Back-reference: read offset and length
length := int(ctrl>>5) + 2
offset := int(ctrl&0x1f)<<8 | int(src[pos])
pos++
ref := len(dst) - offset - 1
for i := 0; i < length; i++ {
dst = append(dst, dst[ref+i])
}
}
}
return dst
}
Notice the back-reference copy uses dst[ref+i] inside the loop rather than a slice copy. That’s deliberate. The source and destination can overlap when the offset is smaller than the length, so you have to copy byte-by-byte. Get this wrong and you get garbled output that’s miserable to debug. I’ve burned hours on exactly this kind of bug in LZ implementations.
Multi-span image support
Norton Ghost could split images across multiple files, a necessity when your storage medium was a stack of CD-Rs. The span.go file stitches these back into a single logical image.
The current implementation builds a multiReaderAt across the primary .gho and discovered .ghs span files, skipping the 512-byte header on continuation spans. That gives the library a virtual random-access view over several files. Because Image.parse still works with *os.File and ReadAt, openWithSpans streams that concatenated view into a temporary file before handing it back to Open.
That is a useful compromise to call out. The span code keeps the boundary logic out of the parser, but it is not a zero-copy streaming parser yet. The TODO in the file points at the natural next refactor: make Image parse from io.ReaderAt rather than *os.File, then the virtual multi-file reader can be used directly. If you’re curious about other Go projects doing interesting file I/O work, the AList post covers a different angle on the same theme.
The CLI tool
cmd/gho/main.go is a thin command-line interface. The library does the real work; the CLI parses arguments, opens the image, calls into the library, and prints results.
Worth copying this pattern in your own projects. Logic in the library package, CLI in cmd/. Your library stays testable and reusable. Your CLI stays minimal. go build ./cmd/gho gives you a single static binary. No runtime dependencies. No installer. Just a tool you can drop onto a forensics workstation and run.
Binary format constants and header parsing
defines.go is where all the format constants live: magic bytes, compression type identifiers, partition flags. If you’re doing any kind of reverse engineering on a binary format, this is the file you write first. Pin down every known constant and give it a name.
Go’s const and iota make this clean:
const (
CompressionNone = 0
CompressionOld = 1 // unsupported legacy mode
CompressionFast = 2 // FastLZ / Z1
CompressionHigh3 = 3 // zlib-backed high compression
CompressionHigh4 = 4
CompressionHigh5 = 5
CompressionHigh6 = 6
CompressionHigh7 = 7
CompressionHigh8 = 8
CompressionHigh9 = 9
)
Naming your magic numbers is the bare minimum for a maintainable binary parser. When you’re debugging why a particular GHO image won’t parse, the difference between compressionType == 2 and compressionType == CompressionFastLZ in your logs saves real time. I don’t know why people resist this.
The fixup pass
There’s a fixup.go file that handles post-parse corrections. Binary formats in the wild are messy. Writers have bugs. Different versions of Norton Ghost wrote slightly different variants of the format. A fixup pass after initial parsing normalizes these inconsistencies without cluttering the main reader logic.
This is a practical architecture choice born from pain. If you’ve ever tried to parse real-world HTML, or JSON with trailing commas, or any format where producers don’t follow the spec perfectly, you know how fast your parser turns into spaghetti without a separate normalization step.
When would you use this?
Digital forensics is the obvious answer. Analyzing old disk images as part of an investigation requires tools that run anywhere and produce reproducible results. A single Go binary fits perfectly. Firmware analysis is another: embedded systems sometimes ship with Ghost-format images.
Then there’s the archival angle. Plenty of organizations are sitting on old GHO backups from the Windows XP era. Extracting data from those shouldn’t require hunting down a 20-year-old copy of Norton Ghost and a machine that can run it.
If you’re interested in other Go projects that solve niche problems with unusual elegance, the terminal spreadsheet post covers another project that shows what Go can do as a tool-building language.
What to take from this
nyarime/gho does one thing well: parse and write Norton Ghost GHO disk images in pure Go. But the patterns inside it, interface-based I/O, pure Go compression, clean library/CLI separation, apply to any binary format you might need to tackle.
The file organization alone is worth studying for your next parser: defines.go for constants, reader.go for parsing, separate files per compression algorithm. I’ve started organizing my own binary parsing projects this way, and the clarity it brings when you’re staring at hex dumps at 2 AM is hard to overstate.