Engine

The Engine is the core agent runtime. It runs a LangGraph workflow, orchestrates LLM calls via LiteLLM, executes MCP tools with approval policy, and streams responses to the Command Center over SSE.

LangGraph Workflow

The graph has four nodes: entry → model → gate → final, with conditional routing between them.

  • Entry: Initializes the workflow for the user request.
  • Model: Runs LLM turns via LiteLLM. May produce tool calls. Feeds back into the graph for multi-turn reasoning.
  • Gate: Executes MCP tools. Applies approval policy per tool (read-only vs mutating). Feeds results back to the model node.
  • Final: Evaluates outcomes, summarizes, decides whether to continue or stop.

Auto-continue is applied conservatively based on a "next speaker" decision. The model can request another tool round or stop.

Stop Requests

Stop requests are honored mid-stream. A Redis flag is set when the user cancels. The Engine checks this flag and terminates the current run gracefully.

SSE Streaming

All events stream from POST /api/v1/agent/chat. Event types:

EventDescription
readyStream initialized and ready
heartbeatConnection keepalive
thinkingModel reasoning (collapsible in UI)
thinking.completeModel reasoning phase finished
tokenStreaming token chunks
generation.startLLM generation started
generation.completeLLM generation finished
tools.pendingTool calls queued
tool.executingTool running
tool.awaiting_approvalMutating tool paused for approval
tool.approvedMutating tool approved by operator
tool.deniedMutating tool denied by operator
tool.resultTool output
tool.errorTool execution failed
token.usageLLM token consumption metrics
ttftTime to first token measurement
completedRun finished
workflow_completeFull workflow finished successfully
workflow_errorWorkflow terminated with error

Authentication

JWT with refresh token rotation. HttpOnly cookies for session. First user to register becomes admin.

Persistence

  • Conversations: Stored in PostgreSQL.
  • Messages: Per-conversation with role and content.
  • Token usage: Tracked for LLM calls per conversation.

Rate Limiting

Rate limiting enforced via Redis. Limits apply per user and per endpoint.