Agent
An autonomous AI system that reasons, plans, and takes sequences of actions—such as calling tools or delegating to sub-agents—to complete a goal.
Overview
An agent is an AI system that goes beyond a single LLM call. Agents use language models as a reasoning engine to decide what actions to take, execute those actions (tool calls, API requests, sub-agent delegation), and iterate until a goal is achieved or a stopping condition is met.
Agent vs. Single LLM Call
A standard LLM call takes an input, generates an output, and is done. An agent:
- Reasons about the task before acting
- Chooses from a set of available tools or actions
- Iterates across multiple steps
- May call other agents (multi-agent workflows)
- Maintains state across its execution
Testing Agents with Rhesis
Agents introduce unique testing challenges because their behavior is non-deterministic and path-dependent. Rhesis provides specialized support for agent testing:
Trace Visualization: Rhesis captures the full execution graph of an agent run, showing each tool call, decision point, and sub-agent invocation as a span in the trace.
Handoff Tracking: When one agent delegates to another (a handoff), Rhesis tracks the transition including the state passed between agents.
Multi-Turn Execution: Penelope conducts multi-turn conversations against your agent application, evaluating whether the agent achieves the stated goal across the conversation.
Agent Invocation Counts: The Graph View shows how many times each agent in your workflow was invoked, helping identify loops or unexpected repeated invocations.
Multi-Agent Workflows
Complex AI applications often use multiple specialized agents working together. For example:
- A router agent that classifies incoming requests
- A retrieval agent that fetches relevant documents
- A response agent that generates the final answer
Rhesis traces the entire workflow, making it possible to pinpoint failures at any stage of the pipeline.
Best Practices
- Instrument all tool-calling functions with to capture the full execution graph in traces
- Use the Graph View to detect unexpected agent loops or redundant sub-agent invocations
- Define clear goal criteria in multi-turn tests so Penelope can accurately judge whether the agent succeeded
- Test edge cases where the agent should gracefully decline rather than hallucinate a tool call