Engine
The Engine is the core agent runtime. It runs a LangGraph workflow, orchestrates LLM calls via LiteLLM, executes MCP tools with approval policy, and streams responses to the Command Center over SSE.
LangGraph Workflow
The graph has four nodes: entry → model → gate → final, with conditional routing between them.
- Entry: Initializes the workflow for the user request.
- Model: Runs LLM turns via LiteLLM. May produce tool calls. Feeds back into the graph for multi-turn reasoning.
- Gate: Executes MCP tools. Applies approval policy per tool (read-only vs mutating). Feeds results back to the model node.
- Final: Evaluates outcomes, summarizes, decides whether to continue or stop.
Auto-continue is applied conservatively based on a "next speaker" decision. The model can request another tool round or stop.
Stop Requests
Stop requests are honored mid-stream. A Redis flag is set when the user cancels. The Engine checks this flag and terminates the current run gracefully.
SSE Streaming
All events stream from POST /api/v1/agent/chat. Event types:
| Event | Description |
|---|---|
ready | Stream initialized and ready |
heartbeat | Connection keepalive |
thinking | Model reasoning (collapsible in UI) |
thinking.complete | Model reasoning phase finished |
token | Streaming token chunks |
generation.start | LLM generation started |
generation.complete | LLM generation finished |
tools.pending | Tool calls queued |
tool.executing | Tool running |
tool.awaiting_approval | Mutating tool paused for approval |
tool.approved | Mutating tool approved by operator |
tool.denied | Mutating tool denied by operator |
tool.result | Tool output |
tool.error | Tool execution failed |
token.usage | LLM token consumption metrics |
ttft | Time to first token measurement |
completed | Run finished |
workflow_complete | Full workflow finished successfully |
workflow_error | Workflow terminated with error |
Authentication
JWT with refresh token rotation. HttpOnly cookies for session. First user to register becomes admin.
Persistence
- Conversations: Stored in PostgreSQL.
- Messages: Per-conversation with role and content.
- Token usage: Tracked for LLM calls per conversation.
Rate Limiting
Rate limiting enforced via Redis. Limits apply per user and per endpoint.
