How Octo Matter structures a Go microservice around Gin, LLM clients, and clean layers
Octo Matter is the matter/task service behind OCTO’s workplace platform. Gin and MySQL are the familiar pieces. The interesting part is how much operational detail is packed into a small Go service without turning the codebase into a framework-heavy maze.
The repo has a conventional shape, but the implementation is concrete enough to learn from. HTTP concerns live under internal/handler/. Business rules sit in internal/service/. Persistence is isolated in internal/repository/. LLM access is a small OpenAI-compatible client in internal/llm/. Notifications are pushed through a bounded worker in internal/notification/.
That separation matters because Octo Matter is not just CRUD. It creates matters from chat messages, checks channel membership through OCTO IM, links source channels for later access checks, asks an LLM to return structured fields, writes timelines, records activity, and emits notifications. The useful lesson is how the project keeps those moving parts understandable.
Startup wiring stays explicit
The whole application is assembled in cmd/main.go. Configuration is loaded and validated first, then the service opens a MySQL session, runs embedded migrations, constructs repositories, creates clients for OCTO IM and the LLM gateway, builds services, then passes those services into Gin handlers.
There is no dependency injection container. The constructor calls are ordinary Go:
repository.NewMatterRepo(sess)and the sibling repos wrap database access.service.NewMatterService(...)receives the repositories, transaction manager, activity store, and IM checker.service.NewTimelineService(...)receives the LLM client, timeline repositories, matter access checks, transaction adapter, and a limiter.handler.NewMatterHandler(...),handler.NewTimelineHandler(...), and friends expose those services to HTTP.
That makes cmd/main.go a useful map of the system. If you want to know what the timeline service can touch, you do not need to search reflection bindings or generated code. You read the constructor call.
The startup path also shows a practical production habit: dependencies are made visible in logs. It logs the app environment, OCTO IM URL, LLM gateway/model/timeout, and whether required tokens are missing. In production, a missing NOTIFY_INTERNAL_TOKEN is fatal; in development it logs a warning. That is a small detail, but it keeps local setup easy without quietly weakening production.
The Gin router has real edge-case handling
internal/handler/router.go builds a single Gin engine and registers the API under /api/v1. It applies a few cross-cutting guards before requests reach the handlers:
api := r.Group("/api/v1")
api.Use(RequestTimeout(30*time.Second), MaxBodySize(maxBodySize), authMW, spaceMW)
RequestTimeout wraps the request context with a 30-second deadline. MaxBodySize uses http.MaxBytesReader to cap request bodies at 1 MB. Then authentication and space membership middleware run before the handler sees the request.
One route-ordering comment jumps out:
// AI-powered: registered BEFORE /:id routes so "extract" does not match :id.
matters.POST("/extract", extractLimiter, extractH.Create)
matters.GET("/:id", matterH.Get)
That is the kind of bug that shows up in real APIs. If /matters/:id is registered before /matters/extract, a static endpoint can be swallowed by a parameter route depending on the router’s matching rules. Octo Matter makes the intended order explicit.
The router also exposes two health endpoints. /health is a simple liveness check. /health/ready calls a readiness function, which is wired in cmd/main.go as conn.Ping(). Failed readiness returns a structured 503 with a NOT_READY code and dependency reason.
Services own the workflow, repositories own the SQL
The service layer is not a thin pass-through. MatterService.CreateMatterWithAssignees creates a matter, auto-links the originating channel, inserts initial assignees, and records activity. The database writes happen inside a transaction using repository.TxManager, while the activity recording is best-effort: failures are logged but do not fail the user request.
That distinction is sensible. Creating the matter and assignees is part of the primary operation. Recording the audit-style activity is useful, but not important enough to roll back the user’s work.
Access control also sits in the service layer rather than being scattered through handlers. MatterService.ListMatters treats source-channel filtering differently for user and bot callers. If a caller has a user token, the service checks channel membership through OCTO IM before honoring the channel filter. If the caller is on the bot path with an empty token, it strips the channel filter so a leaked bot token cannot enumerate matters through channel links.
That is a concrete security boundary, not just package organization. The handler accepts request data; the service decides which filters are safe to apply.
Cursor pagination is opaque and stable
internal/repository/cursor.go implements cursor-based pagination with a composite position:
type Cursor struct {
CreatedAt time.Time
ID string
}
The ordering is (created_at DESC, id DESC). Including the ID as a tie-breaker avoids skipped or repeated records when multiple rows share the same timestamp at a page boundary.
The cursor is encoded as base64 of <unix-nano>|<uuid>. DecodeCursor rejects malformed cursors with ErrInvalidCursor instead of silently treating them as empty. That is the right default for API pagination: a corrupt cursor is a bad request, not an invitation to return the first page and hide the client bug.
The LLM client is deliberately narrow
internal/llm/client.go is an OpenAI-compatible chat completions client. It supports messages, tools, forced tool choice, temperature, max tokens, and a configurable model. It does not try to become a general-purpose SDK.
The important call path is ChatCompletion. It marshals the request, builds an HTTP request with http.NewRequestWithContext, adds Content-Type, conditionally adds a bearer token, and sends the request through an http.Client with a configured timeout.
Two implementation details are easy to miss:
- The response body is read through
io.LimitReaderwith a 2 MB cap, so a broken or hostile gateway cannot make the service buffer an unbounded response. - Non-2xx responses include a truncated body in the error, capped at 500 characters, which is enough to debug without dumping a huge upstream payload into logs.
CallTool sits one level higher. It builds a two-message prompt, attaches a single function tool, forces the model to call that tool, then returns the raw JSON argument string for the named function. If the model does not return a usable tool call, it returns the sentinel ErrEmptyToolCall.
That narrow API is a good fit for extraction work. The rest of the service does not need to know about OpenAI’s full response shape; it needs either structured tool arguments or a clear failure.
Extraction validates the model instead of trusting it
The extraction service in internal/service/extract_svc.go is where the LLM integration becomes product logic. It accepts selected chat messages and asks the model to return a matter title, description, deadline, source message IDs, and assignee UIDs.
The tool schema tells the model to return deadline as YYYY-MM-DD or null, and to choose source messages and assignees only from the supplied input. The server still validates that output afterwards. That is the important part.
The code includes several guardrails:
- Each request is capped at 200 messages.
- Each message body is truncated server-side to 500 characters.
- Input
message_idvalues are capped at 255 characters because they are persisted into a JSON column. - LLM title and description are clipped to DB-safe limits.
- Deadlines are bounded to a sane year window around the request date.
- Invalid or fabricated source message IDs are filtered back to known input IDs.
- Invalid assignee UIDs are filtered back to known message senders, with safe fallbacks.
There is also a custom flexibleDate decoder. If a model emits a number or object for deadline instead of a string or null, the service treats it as missing rather than failing the whole extraction. That is a pragmatic choice. A bad deadline should not throw away an otherwise useful title, description, and assignee list.
Rate limiting is keyed to the thing being protected
The rate limiter in internal/middleware/rate_limit.go is not a token bucket. It is a small in-memory cooldown limiter keyed by a caller-defined string. Internally it uses sync.Map, LoadOrStore, and CompareAndSwap so concurrent callers sharing a key do not both slip through after a cooldown expires.
In cmd/main.go, extraction is keyed by user ID because no matter exists yet:
uidKey := func(c *gin.Context) string { return c.GetString("uid") }
extractLimiter := middleware.NewRateLimiter(10 * time.Second).Middleware(uidKey)
Timeline LLM calls are different. They are keyed by (matter_id, uid) inside the service after access checks. The comment explains why: a forbidden caller should not be able to burn the legitimate user’s cooldown for a matter they cannot access.
That is a subtle but useful design rule. Rate-limit keys are part of your authorization model. If you choose them too early or too broadly, attackers can create denial-of-service edges against real users.
The limiter also runs a background eviction loop so high-cardinality keys do not grow the map forever. Close stops that goroutine cleanly.
Notifications use bounded async work
internal/notification/worker.go provides a fixed-size worker pool for notification delivery. It takes a buffer size and worker count, starts that many goroutines, and accepts submitted functions through a channel.
If the buffer is full, Submit drops the task and logs a warning instead of blocking the request path indefinitely:
select {
case w.ch <- fn:
default:
log.Printf("WARN: notification worker queue full, dropping task")
}
Each worker recovers from panics inside a submitted function, logs the panic, and keeps processing later tasks. Shutdown closes the channel and waits for workers to finish in-flight jobs.
This is not a queue you would use for guaranteed delivery. It is intentionally a bounded best-effort path for notifications. That matches the role notifications play in this service: useful side effects, but not worth taking down matter creation if OCTO IM is slow or a notification template panics.
Error responses are typed without being noisy
internal/apperr/errors.go defines an AppError with code, message, details, HTTP status, and an optional wrapped error. Handlers can turn service errors into consistent JSON like:
{"error":{"code":"VALIDATION_ERROR","message":"...","details":{"field":"..."}}}
The package also defines sentinel errors such as ErrNotFound, ErrForbidden, and ErrInvalidInput, plus constructors for domain cases: MatterNotFound, AssigneeNotFound, SpaceForbidden, RateLimited, DuplicateAssignee, and Upstream.
This gives the code two useful properties. Service functions can return meaningful domain errors, and handlers can render stable client-facing codes without matching strings.
What I would copy from Octo Matter
The repo is a good example of boring Go used carefully:
- Keep
cmd/main.goas the explicit dependency graph. - Put workflow and security decisions in services, not in HTTP handlers.
- Make repositories responsible for persistence details, including cursor encoding.
- Keep the LLM client narrow, context-aware, timeout-bound, and response-size-bound.
- Treat model output as untrusted data and validate it like any other external input.
- Choose rate-limit keys after thinking about authorization, not just convenience.
- Use bounded async workers for non-critical side effects.
None of those choices require a large framework. They require code that is honest about where failures happen: databases can reject writes, LLMs can return malformed JSON, users can send oversized bodies, route ordering can bite you, and notification delivery can fail. Octo Matter handles those cases directly, and that makes the repo a useful one to study.