Polyphemus (Development)
Polyphemus is the model-serving service used for adversarial generation workloads. It proxies generation requests to Vertex AI and exposes authenticated REST endpoints.
Runtime and deployment notes
- Runtime baseline: Python
>=3.12 - Router module:
apps/polyphemus/src/rhesis/polyphemus/routers/services.py - Request schemas:
apps/polyphemus/src/rhesis/polyphemus/schemas/schemas.py
API endpoints
Polyphemus exposes two primary generation endpoints:
| Endpoint | Purpose | Auth |
|---|---|---|
POST /generate | Single generation request | Bearer token required |
POST /generate_batch | Batch generation for multiple requests | Bearer token required |
/generate_batch accepts up to 50 items per call (MAX_BATCH_SIZE).
Environment configuration
Polyphemus reads Vertex AI target configuration from environment variables:
| Variable | Required | Description |
|---|---|---|
POLYPHEMUS_ENDPOINT_ID | Yes | Vertex AI endpoint identifier |
POLYPHEMUS_PROJECT_ID | Yes | GCP project ID for endpoint invocation |
POLYPHEMUS_LOCATION | No | Vertex AI region (defaults to us-central1) |
If required variables are missing, the service returns HTTP 400 with configuration error details.
Batch request and response format
Rate limiting is applied through check_rate_limit. For batch calls, one HTTP request counts as one
rate-limit unit regardless of item count.