How-To Self-Hosting

Running Hermes Agent on a $5 VPS: A Practical Guide

Hermes Agent

@hermesagents

March 19, 2026

8 min read

I pay $5 a month for a VPS that does nothing most of the time. One gigabyte of RAM, a single shared CPU, twenty gigabytes of SSD, a public IPv4 address. Every VPS provider sells roughly this machine, and if you have ever run a small personal project you probably already have one sitting around with spare capacity.

Last month I turned mine into a Hermes Agent gateway. It now replies to me in Telegram, runs scheduled cron jobs that post summaries to a Discord channel I share with friends, watches an IMAP inbox, and is currently using — as I type this — about 320 megabytes of RAM and under 2% of CPU. For the price of a coffee, I have an assistant that is always on.

This post is a practical guide to the setup, and to the handful of decisions that actually matter on a small machine.

What you actually need

For Hermes, a $5 VPS tier from any reputable provider (Hetzner, DigitalOcean, Vultr, Linode, Contabo, OVH — they all offer the same thing at roughly the same price) is enough. The numbers to look for are:

•At least 1 GB of RAM. Hermes' Python process itself lives around 200-300 MB after startup. The Telegram, Discord, and Slack gateway threads each add a small overhead. Leave headroom for the language model API library buffering responses, and for the occasional tool that loads larger data.
•At least 10 GB of disk. Hermes, all its dependencies, the session database, cron history, and log files fit comfortably in under 5 GB. The rest is margin.
•Outbound HTTPS. This is the only network requirement. Hermes does not need inbound ports opened unless you run the optional OpenAI-compatible API server or the Telegram adapter in webhook mode instead of polling.
•A modern Linux distribution with systemd. Ubuntu 22.04 or 24.04 is the no-drama default. Debian 12 works. The gateway service wizard uses systemd to register Hermes as a persistent system or user service.

Notably missing from this list: a GPU, a particular CPU architecture (Hermes runs fine on AMD, Intel, and ARM64 VPSes), a domain name, a reverse proxy, or anything else. The gateway is outbound-only by default.

The install, and what it does

The first command is hermes setup. This is the wizard — it asks you which provider to use (OpenRouter, Nous Portal, Anthropic, OpenAI, Hugging Face, or a local/custom endpoint), helps you paste your API key, lets you pick a default model, and writes the result into ~/.hermes/config.yaml.

The second step that matters on a small machine is hermes gateway install. This is the command that turns Hermes into a systemd service, so it survives reboots and restarts automatically on crashes. You can choose user scope (the service runs as your login user, no sudo required) or system scope (service starts before login, useful for a headless box). On a $5 VPS, user scope is usually what you want. On headless systems, Hermes automatically enables systemd linger so the service keeps running after you disconnect.

From there, hermes gateway enable telegram (or discord, slack, signal, matrix, etc.) adds a platform. Each adapter is a plugin — you can run one platform or all eight at once; the memory cost of each additional platform is small, a few MB of Python objects plus whatever buffering the platform's own SDK wants.

The decisions that actually matter on a small box

Three choices make or break the experience on a cheap VPS.

Model choice. The agent's memory footprint on the VPS does not depend on model size, because inference does not happen on the box. But the latency and cost of each response do. The sweet spot for a personal gateway is usually a medium-sized fast model (Claude Sonnet, GPT-4.1 mini, Gemini Flash, or the free MiMo v2 Pro on Nous Portal for aux tasks) for default use, with the /model command available to escalate to a bigger model on demand. Live model switching means you can do this from inside a conversation without restarting anything.

Context compression. The default is fine. Hermes proactively compresses conversation history when the context window fills up, and the compressed summary is cached. On a small VPS this matters because context compression runs locally and uses CPU — leaving compression on means long conversations stay fast and do not accidentally burn your entire token budget in a single turn.

Credential pooling. If you pay for multiple API keys (common if you share a provider account with friends or rotate between free tiers), Hermes has a same-provider credential pool feature that rotates keys automatically on rate limit or 401 errors. On a small VPS this effectively turns N free tiers into a single always-available key, which is exactly what you want for an always-on assistant.

Why this even works

The reason a $5 VPS can host a real AI assistant is not that Hermes has been heroically optimized. It is that the architecture delegates the hard part — the language model — to somebody else, and keeps only the coordination, memory, and tool-execution logic locally. That split is what makes the per-month cost reasonable and what makes a tiny machine enough.

Self-hosting an assistant used to mean running a model. It does not anymore. It means running the thing that tells the model what to do.

auto_stories Related Articles

March 15, 2026 · 7 min read

Installing Hermes Agent on Your Android Phone with Termux