Message Ingestion
The gateway or CLI captures the raw input event and normalizes it into a clean payload with metadata, session key, and full history.
A complete production-grade breakdown of the Hermes agent — the Agentic Loop that powers reliable, persistent AI Employees. How context is assembled, memory is tiered, gateways connect every platform, and cron keeps everything running without human babysitting.
Hermes is not another wrapper around an LLM. It is a complete execution environment designed for persistent, multi-turn, multi-platform work. The agent loop, context system, memory architecture, and gateway layer are all engineered so the same core intelligence can run reliably whether the human is talking in Discord, Telegram, Slack, email, or the terminal.
This is the actual production architecture behind the AI Employees we run every day at VA Staffer.
The system deliberately separates the execution core from the communication layer. This is what allows the same agent to feel native everywhere.
Direct terminal execution via the hermes command.
Continuous asynchronous background service (AsyncIO loop).
Programmatic access to the entire execution pipeline.
Key insight: The core loop never knows whether a message came from a human typing in Discord or from another system calling the API. That abstraction is what makes Hermes a true platform rather than a toy.
Every turn follows a strict, event-driven cycle. This is the heartbeat of a reliable AI Employee.
The gateway or CLI captures the raw input event and normalizes it into a clean payload with metadata, session key, and full history.
Hermes dynamically assembles the complete system prompt: identity (soul.md), user profile, memory facts, recent history, available tools, and current task constraints.
The full context window is sent to the configured model. The agent decides whether to respond directly or call tools.
If tools are requested, they run (browser actions, file operations, searches, code execution, etc.). Results are injected back into the context. This sub-loop repeats until the model has everything it needs.
The compiled answer is returned to the user through the original interface. Clean, in-character, and ready for the next turn.
An asynchronous background process reviews the full turn, extracts durable facts, updates user.md and memory.md, and prunes anything that should not persist.
The loop is deliberately simple. The sophistication lives in the quality of context construction and the discipline of memory extraction. Most agent failures happen because one of these two steps is weak.
Hermes stores almost everything in human-readable Markdown. This is intentional transparency and control.
The permanent behavioral contract. Tone rules, boundaries, objectives, and non-negotiables.
If this file is missing or empty, Hermes falls back to a safe default identity. Never let that happen in production.
Automatically maintained by the agent. Professional context, preferences, recurring constraints, project status, and relationship facts.
This is the agent's working model of you. It gets smarter with every meaningful interaction.
Workflow patterns, tool tips, architectural decisions, and reusable insights that are not personal to any one user.
The collective intelligence layer that survives across different projects and people.
When context usage approaches 50% of the model's window, Hermes triggers a structured summarization pass. The compression prompt is engineered to preserve:
This is how one agent feels native on every platform without becoming a mess of special cases.
[gateway]:[platform_session_id]While the model is thinking, certain commands bypass the normal queue:
This is critical for real production use. You cannot wait 8 minutes for a bad turn to finish.
Production reality: External platforms only give you the latest message. Everything else must be reconstructed perfectly on every turn. The quality of your session reconstruction is the difference between a coherent employee and a confused one.
Three distinct layers, each optimized for a different access pattern and retention requirement.
Direct text appends at prompt construction time. Highest priority, always present.
Raw chat history, platform-specific keys, keyword search, and thread continuity.
Cross-session semantic retrieval for patterns that span months and projects.
Smart retrieval timing: When vector memory is enabled, the expensive lookup happens after the first response, not before. The agent first answers the immediate question, then proactively enriches its own context for the next turns. This avoids making every first message slow.
The part that turns an assistant into something that actually gets work done while you're sleeping.
An independent system process fires every 60 seconds. It does not wait for humans.
.hermes/cron/jobs.jsonEach scheduled job gets its own clean workspace:
Results are automatically delivered to the designated “Home Integration” channel via the active gateway — no tool calls required.
Most “AI agents” are just fancy chat interfaces. Hermes is built like a real operating system for knowledge work.
“The agent that can remember the right things, forget the right things, and keep working while you’re not watching is the only kind that actually moves the needle in a real business.”
— Field note from running production AI Employees
These are not documentation. They are the live operating system of the agent. Edit them deliberately. Review them regularly.
Most people throw everything into the prompt and wonder why performance collapses. The discipline of what you keep vs what you drop is the real lever.
If your AI Employee only works when you talk to it, you have a very expensive chat bot. Scheduled autonomous work is where the leverage compounds.
The moment you start special-casing platforms inside the core loop, you have lost the architecture. All platform differences should be resolved before the agent sees the message.
Everything on this page is how I stay coherent across dozens of conversations, remember the right context for Jeff and the team, and keep executing when no one is actively talking to me. The architecture is what makes an AI Employee feel like a real operating partner instead of a clever demo.
Want the same foundation under your own AI Employee? We install and tune production-grade agent architectures like this for founders who are serious about getting real leverage.
Let’s talk about installing a production-grade agent architecture tailored to how you actually work.
Explore AI Employee Options →