Mar 29, 2026

Building a Customer Service Quality Agent in Go with AI

If you run customer support on Facebook Messenger or Zalo, you already know the problem. Hundreds of conversations a day. No real way to measure quality. Did the agent actually help, or did they paste a canned response and move on? The Chat Quality Agent project attacks this with Go and an LLM. It pulls chat transcripts, runs them through an AI model, and spits out structured quality scores.

But the thing that grabbed me wasn’t the use case itself. It’s how the project structures an AI agent workflow in Go. There are patterns here I’d steal for any project that connects Go services to LLMs.

Go as the orchestration layer

The Chat Quality Agent follows a pattern I keep seeing in AI-powered apps: Go handles orchestration, HTTP transport, and data wrangling. The LLM (via API) handles the actual thinking. The project connects to messaging platforms, pulls conversation data, and sends it to an AI model for scoring.

The flow:

Fetch chat conversations from platform APIs
Normalize them into a common format
Build a prompt with the conversation context
Send to an LLM, parse the structured response
Store and report results

This plays to what Go is good at: concurrent I/O, clean struct-based data modeling, and straightforward HTTP client work.

Normalizing chat data from multiple platforms

The first real Go problem is that Facebook Messenger and Zalo return completely different API shapes. The project defines a common conversation model that both platform adapters map into:

package model

import "time"

// Message represents a single message in a conversation,
// regardless of which platform it came from.
type Message struct {
	ID        string    `json:"id"`
	Sender    string    `json:"sender"`
	Role      string    `json:"role"` // "agent" or "customer"
	Content   string    `json:"content"`
	Timestamp time.Time `json:"timestamp"`
}

// Conversation holds the full exchange between agent and customer.
type Conversation struct {
	ID        string    `json:"id"`
	Platform  string    `json:"platform"` // "facebook_messenger" or "zalo"
	Messages  []Message `json:"messages"`
	AgentName string    `json:"agent_name"`
	StartedAt time.Time `json:"started_at"`
}

Each platform adapter implements a common interface:

package platform

import "context"

// ChatFetcher abstracts fetching conversations from any platform.
type ChatFetcher interface {
	FetchConversations(ctx context.Context, since time.Time) ([]model.Conversation, error)
}

One method. That’s it. Each platform gets its own implementation, and adding another messaging platform later means implementing ChatFetcher. Done. If you’re not familiar with this style, I wrote about functional options as another way to keep Go APIs flexible.

A Facebook Messenger fetcher might look like this:

package facebook

import (
	"context"
	"encoding/json"
	"fmt"
	"net/http"
	"time"

	"github.com/tanviet12/chat-quality-agent/model"
)

type Client struct {
	httpClient  *http.Client
	accessToken string
	pageID      string
}

func NewClient(accessToken, pageID string) *Client {
	return &Client{
		httpClient:  &http.Client{Timeout: 10 * time.Second},
		accessToken: accessToken,
		pageID:      pageID,
	}
}

func (c *Client) FetchConversations(ctx context.Context, since time.Time) ([]model.Conversation, error) {
	url := fmt.Sprintf(
		"https://graph.facebook.com/v18.0/%s/conversations?access_token=%s",
		c.pageID, c.accessToken,
	)

	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	if err != nil {
		return nil, fmt.Errorf("creating request: %w", err)
	}

	resp, err := c.httpClient.Do(req)
	if err != nil {
		return nil, fmt.Errorf("fetching conversations: %w", err)
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		return nil, fmt.Errorf("unexpected status: %d", resp.StatusCode)
	}

	var fbResp facebookConversationsResponse
	if err := json.NewDecoder(resp.Body).Decode(&fbResp); err != nil {
		return nil, fmt.Errorf("decoding response: %w", err)
	}

	return c.toConversations(fbResp, since), nil
}

Notice http.NewRequestWithContext. This matters in an AI agent pipeline where you might need to cancel long-running fetches. The context package propagates cancellation through the whole call chain, from the HTTP handler that kicks off the analysis, through the platform fetch, into the LLM call. If you need a refresher on context, check out the ByteSizeGo post on context.

Prompt construction in Go

The interesting Go work is turning a Conversation struct into a well-formatted prompt that tells the model how to evaluate quality.

Go’s text/template package is a natural fit:

package agent

import (
	"bytes"
	"text/template"

	"github.com/tanviet12/chat-quality-agent/model"
)

const qualityPromptTmpl = `You are a customer service quality analyst.
Analyze the following conversation and score it on these criteria:
- Responsiveness (1-10): How quickly and thoroughly did the agent respond?
- Empathy (1-10): Did the agent show understanding of the customer's issue?
- Resolution (1-10): Was the customer's problem resolved?
- Professionalism (1-10): Was the tone appropriate?

Respond in JSON format with fields: responsiveness, empathy, resolution, professionalism, summary.

Conversation (Platform: {{.Platform}}, Agent: {{.AgentName}}):
{{range .Messages}}
[{{.Role}}] {{.Content}}
{{end}}`

var promptTemplate = template.Must(template.New("quality").Parse(qualityPromptTmpl))

func BuildPrompt(conv model.Conversation) (string, error) {
	var buf bytes.Buffer
	if err := promptTemplate.Execute(&buf, conv); err != nil {
		return "", err
	}
	return buf.String(), nil
}

template.Must at package init time means template syntax errors crash on startup rather than silently failing at runtime. I like this. You want loud failures for things that should never be wrong.

The bytes.Buffer usage is clean too. It implements io.Writer, which is what template.Execute expects. No string concatenation, no intermediate allocations.

Calling the LLM and parsing structured output

Once you have a prompt, you call an LLM and parse the response back into a Go struct. The project works with OpenAI-compatible APIs, so you could point it at Ollama for local testing or any hosted model.

Here’s the LLM client and response parsing:

package agent

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"net/http"
)

// QualityScore holds the parsed LLM evaluation.
type QualityScore struct {
	Responsiveness  int    `json:"responsiveness"`
	Empathy         int    `json:"empathy"`
	Resolution      int    `json:"resolution"`
	Professionalism int    `json:"professionalism"`
	Summary         string `json:"summary"`
}

type LLMClient struct {
	httpClient *http.Client
	endpoint   string
	apiKey     string
	model      string
}

type chatRequest struct {
	Model    string        `json:"model"`
	Messages []chatMessage `json:"messages"`
}

type chatMessage struct {
	Role    string `json:"role"`
	Content string `json:"content"`
}

type chatResponse struct {
	Choices []struct {
		Message struct {
			Content string `json:"content"`
		} `json:"message"`
	} `json:"choices"`
}

func (c *LLMClient) Analyze(ctx context.Context, prompt string) (*QualityScore, error) {
	reqBody := chatRequest{
		Model: c.model,
		Messages: []chatMessage{
			{Role: "user", Content: prompt},
		},
	}

	bodyBytes, err := json.Marshal(reqBody)
	if err != nil {
		return nil, fmt.Errorf("marshalling request: %w", err)
	}

	req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.endpoint, bytes.NewReader(bodyBytes))
	if err != nil {
		return nil, fmt.Errorf("creating request: %w", err)
	}
	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Authorization", "Bearer "+c.apiKey)

	resp, err := c.httpClient.Do(req)
	if err != nil {
		return nil, fmt.Errorf("calling LLM: %w", err)
	}
	defer resp.Body.Close()

	var chatResp chatResponse
	if err := json.NewDecoder(resp.Body).Decode(&chatResp); err != nil {
		return nil, fmt.Errorf("decoding LLM response: %w", err)
	}

	if len(chatResp.Choices) == 0 {
		return nil, fmt.Errorf("no choices in LLM response")
	}

	var score QualityScore
	if err := json.Unmarshal([]byte(chatResp.Choices[0].Message.Content), &score); err != nil {
		return nil, fmt.Errorf("parsing quality score: %w", err)
	}

	return &score, nil
}

There’s a gotcha here that will bite you in production: LLMs don’t always return valid JSON. Sometimes they wrap it in markdown code fences. Sometimes they add a chatty preamble before the JSON. You need to strip that junk before calling json.Unmarshal:

import "strings"

func extractJSON(raw string) string {
	// Strip markdown code fences if present
	raw = strings.TrimSpace(raw)
	if strings.HasPrefix(raw, "```json") {
		raw = strings.TrimPrefix(raw, "```json")
		raw = strings.TrimSuffix(raw, "```")
	} else if strings.HasPrefix(raw, "```") {
		raw = strings.TrimPrefix(raw, "```")
		raw = strings.TrimSuffix(raw, "```")
	}
	return strings.TrimSpace(raw)
}

Call extractJSON on chatResp.Choices[0].Message.Content before unmarshalling. I’ve been burned by this more than once.

Running analyses concurrently

Hundreds of conversations, one at a time? No. Go’s goroutines and channels make this almost too easy. Here’s a worker pool using errgroup from golang.org/x/sync:

package agent

import (
	"context"
	"fmt"

	"golang.org/x/sync/errgroup"

	"github.com/tanviet12/chat-quality-agent/model"
)

type Result struct {
	ConversationID string
	Score          *QualityScore
}

func AnalyzeBatch(ctx context.Context, client *LLMClient, convs []model.Conversation, concurrency int) ([]Result, error) {
	results := make([]Result, len(convs))

	g, ctx := errgroup.WithContext(ctx)
	g.SetLimit(concurrency)

	for i, conv := range convs {
		i, conv := i, conv // capture loop vars
		g.Go(func() error {
			prompt, err := BuildPrompt(conv)
			if err != nil {
				return fmt.Errorf("building prompt for %s: %w", conv.ID, err)
			}

			score, err := client.Analyze(ctx, prompt)
			if err != nil {
				return fmt.Errorf("analyzing %s: %w", conv.ID, err)
			}

			results[i] = Result{
				ConversationID: conv.ID,
				Score:          score,
			}
			return nil
		})
	}

	if err := g.Wait(); err != nil {
		return nil, err
	}

	return results, nil
}

errgroup.SetLimit caps concurrent goroutines so you don’t hammer the LLM API with hundreds of simultaneous requests. Each goroutine writes to its own index in the results slice, so no mutex needed. I use this pattern constantly.

One thing to watch: if any single analysis fails, errgroup cancels the context for all other goroutines. That’s fail-fast behavior. If you’d rather collect partial results and log failures, you’d handle errors per-conversation instead of returning early.

Putting it all together

The main entry point ties everything together:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/tanviet12/chat-quality-agent/agent"
	"github.com/tanviet12/chat-quality-agent/platform/facebook"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
	defer cancel()

	fetcher := facebook.NewClient(
		os.Getenv("FB_ACCESS_TOKEN"),
		os.Getenv("FB_PAGE_ID"),
	)

	since := time.Now().Add(-24 * time.Hour)
	convs, err := fetcher.FetchConversations(ctx, since)
	if err != nil {
		log.Fatalf("fetching conversations: %v", err)
	}

	llm := agent.NewLLMClient(
		os.Getenv("LLM_ENDPOINT"),
		os.Getenv("LLM_API_KEY"),
		os.Getenv("LLM_MODEL"),
	)

	results, err := agent.AnalyzeBatch(ctx, llm, convs, 5)
	if err != nil {
		log.Fatalf("analyzing conversations: %v", err)
	}

	enc := json.NewEncoder(os.Stdout)
	enc.SetIndent("", "  ")
	for _, r := range results {
		fmt.Printf("Conversation %s:\n", r.ConversationID)
		enc.Encode(r.Score)
	}
}

The top-level context has a 5-minute timeout that propagates through every HTTP call, both to the messaging platform APIs and the LLM. If anything hangs, the whole pipeline shuts down cleanly.

Patterns worth stealing

Even if you never build a chat quality agent, there are pieces here I’d reuse:

The single-method ChatFetcher interface makes adding Zalo, WhatsApp, or anything else trivial. This is interface segregation done right in Go.

text/template for prompt construction gives you type-safe, testable prompt building. You can unit test your prompts by passing in a known Conversation and checking the output string. Way better than string concatenation scattered across your codebase.

errgroup with SetLimit for bounded concurrency keeps your LLM calls parallel without overwhelming rate limits. The index-based result writing avoids locks entirely.

And context propagation through the whole pipeline, from HTTP trigger to platform fetch to LLM call, means timeouts and cancellation work everywhere without extra plumbing