Skip to Content
SDKEntitiesExperiments

Experiments

An experiment is a named bundle of parameter values for a project. Each save creates an immutable version identified by a content hash.

Properties

PropertyTypeDescription
idstrUnique identifier
namestrDisplay name
descriptionstrOptional description
project_idstrParent project ID
visibilitystr"private" (default) or "shared"
versions_countintNumber of committed versions
latest_versionstrContent hash of the most recent version

Creating experiments

create.py
from rhesis.sdk.entities import Experiment

exp = Experiment(
    name="tuning-v1",
    project_id="my-project",
    description="Initial tuning run",
)
exp.push()
print(f"Created: {exp.id}")

Committing values

Each commit creates an immutable version. Pass bare Python values — the SDK wraps them using the project schema:

commit.py
version = exp.commit(
    {"model": "gpt-4o", "temperature": 0.9},
    message="bump temp",
)
print(f"Version: {version['version']}")  # v_a3f9b8...

Chain versions using parent_version:

commit_chain.py
v1 = exp.commit({"temperature": 0.3}, message="cold")
v2 = exp.commit(
    {"temperature": 0.9},
    message="hot",
    parent_version=v1["version"],
)

Sharing and promoting

share_promote.py
# Make visible to the team
exp.share()

# Bind the latest version to an environment
exp.promote(environment="default")

# Revert to private
exp.unshare()

Only shared experiments can be promoted to an environment.

One-liner publish

publish() does create, commit, share, and promote in a single call:

publish.py
exp = Experiment.publish(
    name="tuning-v2",
    project_id="my-project",
    values={"model": "claude-sonnet", "temperature": 0.5},
    message="switch to Claude",
    environment="default",
)
print(f"Live at version {exp.latest_version}")

Version history

versions.py
# All versions (oldest to newest)
for v in exp.list_versions():
    print(f"{v['version']}: {v['message']}")

# Latest version only
latest = exp.latest_version_data()
print(f"{latest['version']}: {latest['message']}")

# Specific version by content hash
v = exp.get_version("v_abc123")
print(v["values"])

Running experiments

Execute a test set with this experiment’s parameters using run():

run.py
from rhesis.sdk.entities import TestSets, Endpoints

test_set = TestSets.pull(name="Safety Tests")
endpoint = Endpoints.pull(name="GPT-4o")

# Run with the experiment's latest version
result = exp.run(test_set, endpoint)

# Inline parameters — commits automatically, then executes
result = exp.run(test_set, endpoint, parameters={"temperature": 0.9})

This is equivalent to test_set.execute(endpoint, experiment=exp). See Test Execution for the full execution API.

Results

Retrieve aggregated test-run statistics for an experiment. Results can be grouped by individual run or by parameter version:

results_by_run.py
# Per-run breakdown
data = exp.results(group_by="run")
for run in data["items"]:
    stats = run.get("stats", {})
    print(f"{run['name']}: {stats['passed']}/{stats['total']} passed")
results_by_version.py
# Grouped by parameter version, with diffs against parent
data = exp.results(group_by="version", limit=50)
for group in data["items"]:
    print(f"Version {group['version']}: {group['total_tests']} tests")
    for key, change in group["diff"].items():
        print(f"  {key}: {change['before']}{change['after']}")

Each run includes a stats object with total, passed, failed, and errors counts.

Fetching experiments

fetch.py
from rhesis.sdk.entities import Experiments

# List all
for exp in Experiments.all():
    print(f"{exp.name} ({exp.visibility})")

# By name
exp = Experiments.pull(name="tuning-v1")

# By ID
exp = Experiments.pull(id="exp-uuid")

Deleting experiments

Deleting an experiment automatically unbinds any environments that point to it:

delete.py
exp.delete()

Method reference

MethodReturnsDescription
push()dictCreate or update the experiment header
pull()ExperimentRefresh from the server
delete()boolDelete (auto-unbinds environments)
commit(values, *, message, parent_version)dictAppend an immutable version
list_versions()list[dict]All versions, oldest to newest
latest_version_data()dict | NoneMost recent version entry
get_version(version)dictSingle version by content hash
share()NoneSet visibility to shared
unshare()NoneSet visibility to private
promote(environment)NoneBind latest version to environment
run(test_set, endpoint, *, parameters, ...)dict | NoneExecute a test set with this experiment’s parameters
results(*, group_by, limit)dictAggregated test-run statistics
publish(*, name, project_id, values, ...)ExperimentClass method: create, commit, share, promote

Next Steps