Agent

Back to Glossary Advanced Testing

An autonomous AI system that reasons, plans, and takes sequences of actions—such as calling tools or delegating to sub-agents—to complete a goal.

Also known as: AI agent, autonomous agent

Overview

An agent is an AI system that goes beyond a single LLM call. Agents use language models as a reasoning engine to decide what actions to take, execute those actions (tool calls, API requests, sub-agent delegation), and iterate until a goal is achieved or a stopping condition is met.

Agent vs. Single LLM Call

A standard LLM call takes an input, generates an output, and is done. An agent:

Reasons about the task before acting
Chooses from a set of available tools or actions
Iterates across multiple steps
May call other agents (multi-agent workflows)
Maintains state across its execution

Testing Agents with Rhesis

Agents introduce unique testing challenges because their behavior is non-deterministic and path-dependent. Rhesis provides specialized support for agent testing:

Trace Visualization: Rhesis captures the full execution graph of an agent run, showing each tool call, decision point, and sub-agent invocation as a span in the trace.

Handoff Tracking: When one agent delegates to another (a handoff), Rhesis tracks the transition including the state passed between agents.

Multi-Turn Execution: Penelope conducts multi-turn conversations against your agent application, evaluating whether the agent achieves the stated goal across the conversation.

Agent Invocation Counts: The Graph View shows how many times each agent in your workflow was invoked, helping identify loops or unexpected repeated invocations.

Multi-Agent Workflows

Complex AI applications often use multiple specialized agents working together. For example:

A router agent that classifies incoming requests
A retrieval agent that fetches relevant documents
A response agent that generates the final answer

Rhesis traces the entire workflow, making it possible to pinpoint failures at any stage of the pipeline.

Best Practices

Instrument all tool-calling functions with to capture the full execution graph in traces
Use the Graph View to detect unexpected agent loops or redundant sub-agent invocations
Define clear goal criteria in multi-turn tests so Penelope can accurately judge whether the agent succeeded
Test edge cases where the agent should gracefully decline rather than hallucinate a tool call