Polyphemus (Development)
Polyphemus is the model-serving service used for adversarial generation workloads. It proxies generation requests to Vertex AI and exposes authenticated REST endpoints.
Runtime and deployment notes
- Runtime baseline: Python
>=3.12 - Router module:
apps/polyphemus/src/rhesis/polyphemus/routers/services.py - Request schemas:
apps/polyphemus/src/rhesis/polyphemus/schemas/schemas.py
API endpoints
Polyphemus exposes two primary generation endpoints:
| Endpoint | Purpose | Auth |
|---|---|---|
POST /generate | Single generation request | Bearer token required |
POST /generate_batch | Batch generation for multiple requests | Bearer token required |
/generate_batch accepts up to 50 items per call (MAX_BATCH_SIZE).
Environment configuration
Polyphemus reads Vertex AI target configuration from environment variables:
| Variable | Required | Description |
|---|---|---|
POLYPHEMUS_ENDPOINT_ID | Yes | Vertex AI endpoint identifier |
POLYPHEMUS_PROJECT_ID | Yes | GCP project ID for endpoint invocation |
POLYPHEMUS_LOCATION | No | Vertex AI region (defaults to us-central1) |
If required variables are missing, the service returns HTTP 400 with configuration error details.
Deployment region variable mapping (v0.2.8+)
Region configuration uses two separate variables depending on context:
| Context | Variable | Source | Where it is consumed |
|---|---|---|---|
| GitHub Actions CI/CD workflow | REGION | secrets.REGION (falls back to secrets.TF_VAR_REGION, then us-central1) | .github/workflows/polyphemus.yml |
| Running Polyphemus service | POLYPHEMUS_LOCATION | Set to $REGION by the CI workflow | apps/polyphemus/src/rhesis/polyphemus/routers/services.py |
| Vertex model deployment script | GCP_REGION | Set directly in the local environment (not mapped from REGION) | apps/polyphemus/model_deployment/config.py |
The workflow maps REGION → POLYPHEMUS_LOCATION automatically for service deployments.
The model deployment script reads GCP_REGION independently; when running it locally you must
export GCP_REGION yourself (see apps/polyphemus/model_deployment/.env.example).
Batch request and response format
Rate limiting is applied through check_rate_limit. For batch calls, one HTTP request counts as one
rate-limit unit regardless of item count.