Polyphemus (Development)
Polyphemus is the model-serving service used for adversarial generation workloads. It proxies generation requests to Vertex AI and exposes authenticated REST endpoints.
Runtime and deployment notes
- Runtime baseline: Python
>=3.12 - Router module:
apps/polyphemus/src/rhesis/polyphemus/routers/services.py - Request schemas:
apps/polyphemus/src/rhesis/polyphemus/schemas/schemas.py - Docker image: API-only service image. PyTorch is not bundled in the Polyphemus container; model weights and serving runtime live behind Vertex AI.
API endpoints
Polyphemus exposes two primary generation endpoints:
| Endpoint | Purpose | Auth |
|---|---|---|
POST /generate | Single generation request | Bearer token required |
POST /generate_batch | Batch generation for multiple requests | Bearer token required |
/generate_batch accepts up to 50 items per call (MAX_BATCH_SIZE).
Environment configuration
Polyphemus reads Vertex AI target configuration from environment variables:
| Variable | Required | Description |
|---|---|---|
POLYPHEMUS_ENDPOINT_ID | Yes | Vertex AI endpoint identifier |
POLYPHEMUS_PROJECT_ID | Yes | GCP project ID for endpoint invocation |
POLYPHEMUS_LOCATION | No | Vertex AI region (defaults to us-central1) |
VLLM_LOGGING_LEVEL | No | vLLM container log verbosity for Vertex serving (for example, DEBUG, INFO) |
If required variables are missing, the service returns HTTP 400 with configuration error details.
Deployment region variable mapping (v0.2.8+)
Region configuration uses two separate variables depending on context:
| Context | Variable | Source | Where it is consumed |
|---|---|---|---|
| GitHub Actions CI/CD workflow | REGION | secrets.REGION (falls back to secrets.TF_VAR_REGION, then us-central1) | .github/workflows/polyphemus.yml |
| Running Polyphemus service | POLYPHEMUS_LOCATION | Set to $REGION by the CI workflow | apps/polyphemus/src/rhesis/polyphemus/routers/services.py |
| Vertex model deployment script | GCP_REGION | Set directly in the local environment (not mapped from REGION) | apps/polyphemus/model_deployment/config.py |
The workflow maps REGION → POLYPHEMUS_LOCATION automatically for service deployments.
The model deployment script reads GCP_REGION independently; when running it locally you must
export GCP_REGION yourself (see apps/polyphemus/model_deployment/.env.example).
vLLM logging level (v0.2.9+)
When deploying Polyphemus to Vertex AI, you can control serving container verbosity with
VLLM_LOGGING_LEVEL.
If set, deployment injects VLLM_LOGGING_LEVEL into the serving container environment.
Polyphemus deployment separates the lightweight API container from the Vertex AI serving container. Configure vLLM logging on the Vertex deployment, not by installing PyTorch or model runtime dependencies into the Polyphemus API image.
Batch request and response format
Rate limiting is applied through check_rate_limit. For batch calls, one HTTP request counts as one
rate-limit unit regardless of item count.