Why we chose Grok as our fallback AI
We run Ollama on our own infrastructure as the primary AI model. For resilience, we needed a fallback. Here is why we picked Grok from xAI.
xNord runs its own AI infrastructure. Our primary model is Ollama (llama3.2:3b) running on a dedicated Hetzner VPS. This gives us cost control, latency predictability, and no dependence on third-party API rate limits for the core product.
But self-hosted infrastructure fails. Servers go down, deployments break, network routes become unreliable. For an email agent that founders depend on every morning, "the AI is down" is not an acceptable answer.
We needed a cloud fallback — a provider that could take over seamlessly if our primary model was unavailable. We evaluated four options: OpenAI, Anthropic, Groq, and Grok (xAI). Here is how the decision went.
What we were optimising for
Latency: Email triage and draft generation need to be fast. A fallback that adds 30 seconds per email is not useful.
Quality for structured tasks: Our prompts require the model to follow strict JSON output formats, reason about email urgency, and match writing style. Not all models are equally good at this.
OpenAI-compatible API: We wanted a provider whose API was compatible with our existing Ollama client, so the fallback required minimal code changes.
Rate limits: We needed a provider that could handle bursts — a user connecting their inbox for the first time might trigger processing of 200 emails at once.
Cost: The fallback is for resilience, not the primary path. It needs to be cost-efficient for occasional use.
Why not OpenAI or Anthropic?
Both are excellent models. But for our use case — structured JSON output, specific system prompts, tight latency — we found they were more expensive and slower than the alternatives for the task sizes we were running. Their real advantage (long-context reasoning, multimodal input) is not relevant for email triage.
Why not Groq?
We actually use Groq in production for a subset of our processing — it is extremely fast (low single-digit latency for short prompts) and the quality on structured tasks is good. But Groq's rate limits can be restrictive during high-volume burst processing, which is exactly when a fallback is most likely to be needed.
Why Grok?
Grok from xAI uses an OpenAI-compatible API, which meant our integration was minimal — a configuration change, not a rewrite. The quality on our benchmark tasks (urgency classification, draft generation, chat response) was competitive with the alternatives. The rate limits are generous. And the pricing is reasonable for occasional fallback use.
The other factor: xAI's infrastructure has shown good reliability in the periods we have been monitoring it. For a fallback, uptime matters more than feature set.
How the fallback works
The routing logic is simple: if Groq is available (GROQ_API_KEY is set), use it. If Groq fails, try Ollama. If Ollama fails, fall back to Grok. If all three fail, the agent run is aborted and the user is notified.
The fallback is transparent to the end user. There is no indication in the product which provider processed a given email — the output quality is consistent enough that the distinction does not matter in practice.
The status page at xnord.co.uk/status now shows the health of both the primary AI provider and the Grok fallback independently, so you can see at a glance whether the fallback is available if you ever need it.