Self-Hosting AI Agents: The Architecture, The Costs, and Why It Beats Managed Services

The default advice for any business deploying AI is "use a managed service." OpenAI's API. Anthropic's Claude. Google's Vertex AI. And for many use cases, that's the right call. But when you're deploying autonomous agents that handle sensitive business data, interact with your customers, and make decisions on your behalf — the case for self-hosting becomes overwhelming. Here's how we do it, what it costs, and why it matters.

The Problem with Managed AI Services

Managed services are excellent for prototyping, development, and low-stakes applications. But they come with three fundamental constraints that become deal-breakers at production scale:

1. Data Sovereignty

When your agent processes a customer's email through the OpenAI API, that email transits through OpenAI's infrastructure. Yes, they promise not to use it for training. Yes, they have SOC 2 compliance. But under GDPR, the mere act of sending EU personal data to a US-based processor creates compliance obligations that most SMEs aren't equipped to manage.

Self-hosting eliminates this entirely. Your data stays on your infrastructure. Your compliance surface area shrinks to zero for the AI component.

2. Cost at Scale

API pricing is per-token. This is great when you're processing 100 requests a day. It becomes painful at 10,000. And it becomes prohibitive at 100,000.

A typical agent workflow processes 2,000-5,000 tokens per interaction. At OpenAI's GPT-4o pricing (~$2.50 per million input tokens, $10 per million output tokens), 1,000 interactions per day costs roughly £150-300/month. That sounds reasonable until you consider that a dedicated GPU instance running an open-source model of comparable quality costs £50-100/month — and handles unlimited interactions.

3. Dependency Risk

Major API providers have all experienced outages. If your entire agent infrastructure runs through a single API, those outages shut down your AI workforce completely. Self-hosting with redundancy means you control your uptime.

The Google Cloud Self-Hosting Stack

Our self-hosting architecture is built entirely on Google Cloud Platform, and it's designed for reliability, cost-efficiency, and operational simplicity:

The Components

1. OpenClaw on Cloud Run

The agent orchestration layer runs on Cloud Run — Google's serverless container platform. This means:

No servers to manage — Google handles scaling, patching, and infrastructure
Scale to zero — if no one's talking to your agent at 3am, you're paying nothing
Scale to thousands — if you get a traffic spike, Cloud Run auto-scales in milliseconds
Pay per request — you're billed for actual compute, not idle capacity

2. Gemini API via Vertex AI

While we self-host the orchestration layer, we use Google's Gemini models through Vertex AI for the LLM inference. Why? Because running your own LLM requires expensive GPU instances (A100s at minimum), and Google's Vertex AI pricing is competitive enough that self-hosting the model itself doesn't make financial sense for most workloads.

The key: your data stays within Google Cloud. It never leaves the GCP boundary. Vertex AI's data handling policies are GDPR-compliant, and with VPC Service Controls, you can ensure your agent's inference requests never traverse the public internet.

3. Firestore for Memory

Agent memory — conversation history, learned preferences, accumulated knowledge — lives in Firestore. It's serverless, automatically scales, and costs fractions of a penny per operation. For a typical agent deployment handling 500 conversations per day, the Firestore bill is under £5/month.

4. Secret Manager for Credentials

API keys, channel tokens (WhatsApp, Slack, Telegram), database credentials — all stored in Google Secret Manager. Never hardcoded, never in environment variables, never in config files. Rotated regularly. Audited continuously.

5. Cloud Monitoring + Logging

Every agent action is logged. Every decision is traceable. If an agent sends an unexpected message to a customer, we can reconstruct exactly what happened, what data informed the decision, and why. This observability isn't optional — it's essential for trust, debugging, and compliance.

The Monthly Bill: Real Numbers

Here's what our typical SME deployment costs per month:

Component	Monthly Cost	Notes
Cloud Run (OpenClaw)	£8-25	Scales to zero overnight/weekends
Vertex AI (Gemini)	£15-40	Depends on interaction volume
Firestore	£2-5	Agent memory and sessions
Secret Manager	£0.50	Near-zero cost
Cloud Monitoring	£0-5	Free tier covers most SMEs
Total	£25-75	For a 24/7 AI agent workforce

Compare that to managed alternatives:

ChatGPT Teams: £20/user/month (and you still need to build the agent logic yourself)
Custom OpenAI API integration: £150-300/month (for comparable interaction volume)
Enterprise AI platforms: £500-2,000/month (Intercom Fin, Zendesk AI, etc.)

Self-hosting with OpenClaw on GCP can be significantly cheaper than managed alternatives at SME scale. And you get more control, better privacy, and no vendor lock-in.

The Operational Reality

Self-hosting sounds like it requires a dedicated DevOps team. With modern cloud infrastructure, it doesn't. Our deployments typically require:

Initial setup: 1-2 days (including channel configuration and agent tuning)
Ongoing maintenance: 1-2 hours per month (mostly reviewing logs and tuning agent behaviour)
Updates: Pull the latest OpenClaw image, redeploy to Cloud Run. Five minutes.

The infrastructure is boring. That's the point. Boring infrastructure lets you focus on what matters: the agent intelligence, the business logic, the customer experience. Not the servers.

Self-hosting AI isn't about being a technology purist. It's about cost, control, and compliance. For UK SMEs processing customer data through AI agents, it's increasingly the only responsible choice. And with OpenClaw on Google Cloud, it's never been easier.