Skip to Content
ContributeBackendDatabase Field Encryption

Database Field Encryption for Sensitive Tokens

Status

Accepted

Context

The Rhesis backend currently stores sensitive credentials as plaintext in the PostgreSQL database, including:

  • API keys and authentication tokens in the endpoint table
  • Model provider API keys in the model table
  • OAuth tokens and client secrets

This creates security risks:

  • Database dumps or backups could expose sensitive credentials
  • Log files or error messages might inadvertently include plaintext secrets
  • Unauthorized database access (via SQL injection or compromised credentials) could reveal all tokens
  • Compliance and security best practices require encryption at rest for sensitive data

We need a transparent encryption solution that protects data at rest while remaining straightforward to implement and maintain.

Decision

We will implement field-level encryption for sensitive database columns using cryptography.fernet for symmetric encryption with the following approach:

1. Encryption Library: cryptography.fernet

Chosen library: cryptography.fernet

Rationale:

  • Part of the widely-used cryptography package in the Python ecosystem
  • Implements AES-128 in CBC mode with HMAC authentication
  • Provides authenticated encryption (prevents tampering and ensures integrity)
  • Simple, secure API: Fernet(key).encrypt() / decrypt()
  • Returns URL-safe base64-encoded ciphertext suitable for database storage
  • Well-documented and actively maintained
  • Battle-tested in production environments

Installation:

install-cryptography.sh
cd apps/backend
uv add cryptography

Basic Usage:

fernet-basic-usage.py
from cryptography.fernet import Fernet

# Generate encryption key (one-time setup)
key = Fernet.generate_key()  # Returns: b'32-byte-url-safe-base64-encoded-key'

# Initialize cipher
cipher = Fernet(key)

# Encrypt
plaintext = "my-api-key-12345"
encrypted = cipher.encrypt(plaintext.encode())  # Returns: b'encrypted-base64-string'

# Decrypt
decrypted = cipher.decrypt(encrypted).decode()  # Returns: "my-api-key-12345"

2. Key Management Strategy

Environment Variable: DB_ENCRYPTION_KEY

Key Format:

  • 32 URL-safe base64-encoded bytes (Fernet standard format)
  • Example: ZmDfcTF7_60GrrY167zsiPd67pEvs0aGOv2oasOM92s=

Key Generation: Developers can generate keys locally using:

generate-key.sh
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Key Storage by Environment:

Phase 1 (Initial Implementation) - Environment Secrets:

  • Local Development: .env file (gitignored, never committed)
  • CI/CD: GitHub Secrets for automated testing
  • Staging/Production:
    • Kubernetes: Environment variables populated from Kubernetes secrets
    • Docker: Environment variable injection at runtime
    • Google Cloud Run: Environment variables with GCP Secret Manager reference

Phase 2 (Future Enhancement) - Cloud Secret Manager:

  • Direct GCP Secret Manager integration
  • Automatic key rotation support
  • Centralized audit logging of key access
  • Per-environment key isolation with access controls
  • Versioned secrets with rollback capability

Security Considerations:

  • Keys must never be committed to version control (enforce with .gitignore)
  • The same key must be used across all application instances within a single environment
  • Critical: Losing the encryption key means permanent loss of access to encrypted data
    • Document key backup procedures in deployment documentation
    • Store production keys in multiple secure locations
    • Consider key escrow for disaster recovery
  • Each environment (dev, staging, production) uses a separate encryption key
  • Key rotation strategy will be implemented in Phase 2

3. Migration Strategy: In-Place Updates

Approach: Update existing database columns in-place without schema changes

Rationale: Same column names and ORM types; encrypt in place during a controlled rollout instead of maintaining parallel columns.

Migration Flow:

migration-flow.txt
1. Deploy application code with EncryptedString TypeDecorator
 - Includes backward compatibility (reads both encrypted and plaintext)
2. Run data migration script to encrypt all existing plaintext values
 - Processes each row: plaintext → encrypted in same column
3. Monitor application logs for any remaining plaintext values
 - Log warnings when plaintext is encountered
4. After validation period (e.g., 1 week), remove backward compatibility fallback

Backward Compatibility Implementation:

The EncryptedString SQLAlchemy TypeDecorator will gracefully handle both encrypted and plaintext values during the migration window:

backward-compatibility.py
def process_result_value(self, value, dialect):
    """Decrypt when reading from database"""
    if value is None:
        return value

    try:
        # Attempt decryption
        decrypted = self.cipher.decrypt(value.encode()).decode()
        return decrypted
    except InvalidToken:
        # Value is not encrypted yet (plaintext)
        # Log warning for monitoring
        logger.warning(
            "Encountered unencrypted value in encrypted column",
            extra={"column": self.column_name}
        )
        return value  # Return plaintext during migration window

4. Database Schema: No Changes Required

Existing columns remain as-is:

  • endpoint.auth_token (Text) → stores encrypted base64 string
  • endpoint.client_secret (Text) → stores encrypted base64 string
  • endpoint.last_token (Text) → stores encrypted base64 string
  • model.key (String) → stores encrypted base64 string

Size Considerations: Fernet encryption adds overhead to the stored data:

  • Overhead: ~40-60 bytes plus the original length
  • Example:
    • Original: "my-api-key" (10 characters)
    • Encrypted: "gAAAAABmV8x..." (~120 characters base64-encoded)
  • Impact: Most token columns are already Text type (effectively unlimited), so no schema changes needed

5. Implementation Architecture

SQLAlchemy TypeDecorator: Encryption/decryption will be transparent to application code using SQLAlchemy’s TypeDecorator:

encrypted-string-type.py
from sqlalchemy import TypeDecorator, Text
from cryptography.fernet import Fernet
import os

class EncryptedString(TypeDecorator):
    """SQLAlchemy type for transparent field encryption"""

    impl = Text
    cache_ok = True

    def __init__(self):
        self.cipher = Fernet(os.getenv("DB_ENCRYPTION_KEY").encode())
        super().__init__()

    def process_bind_param(self, value, dialect):
        """Encrypt when writing to database"""
        if value is None:
            return value
        return self.cipher.encrypt(value.encode()).decode()

    def process_result_value(self, value, dialect):
        """Decrypt when reading from database"""
        if value is None:
            return value
        return self.cipher.decrypt(value.encode()).decode()

Usage in Models:

encrypted-model-usage.py
from sqlalchemy import Column, Integer, String
from rhesis.backend.app.models.base import Base
from rhesis.backend.app.utils.encryption import EncryptedString

class Endpoint(Base):
    __tablename__ = "endpoint"

    id = Column(Integer, primary_key=True)
    name = Column(String)
    auth_token = Column(EncryptedString)  # Transparently encrypted
    client_secret = Column(EncryptedString)  # Transparently encrypted

6. Security Model

Threat model (stylized): at-rest ciphertext helps against backup/SQL-dump style exposure; it does not substitute for network access control, least-privilege DB users, or protecting DB_ENCRYPTION_KEY. If both DB and key are lost to the same attacker, they can decrypt.

Phase 1 (implemented): Fernet via EncryptedString, key in env, optional dual-read during migration.

Later: key rotation, Secret Manager/HSM, per-tenant keys, and decrypt audit trails are out of scope for phase 1.

Consequences

Upside: Transparent ORM encryption, no column rename, migration can read mixed plaintext/ciphertext for a window.

Downside: You must operate DB_ENCRYPTION_KEY like production credentials—loss of key is data loss; small per-field CPU cost during rollout.

Every environment needs the key; document how you generate, store, and recover it.

Implementation Plan

Track issues #496–501: TypeDecorator and utils, wire Endpoint/Model (and related CRUD), data backfill script, then integration tests and deployment notes.

Security and operations

Treat DB_ENCRYPTION_KEY like a production secret: gitignored locally, injected from CI/CD or a secret manager in deployed envs, backed up with clear recovery steps, rotated on a schedule, and monitored for decrypt failures or missing env.

Future Work

Possible next steps: dual-key rotation (snippet above), Secret Manager–backed keys, stricter audit of decrypt operations, and extending EncryptedString to other columns if threat model requires it.

References

  • Parent: #495 Support for Encrypted Auth Tokens in DB
  • Blocks:
    • #497 Implement Core Encryption Infrastructure
    • #498 Add Encryption to Endpoint Model
    • #499 Add Encryption to Model Table
    • #500 Data Migration Script
    • #501 Integration Testing & Documentation