Skip to content

Image Storage Patterns - Project Reference Pattern

Component: MinIO Object Storage Status: 🟡 In Development Created: 2025-12-29 Last Updated: 2025-12-29


Overview

Purpose

This PRP defines implementation patterns for image storage using MinIO (S3-compatible object storage) including bucket structure, presigned URL generation, upload/download operations, and lifecycle management.

Scope

Responsibilities: - MinIO bucket configuration and structure - Object key naming conventions - Presigned URL generation for secure access - Image upload from edge collectors - Image retrieval for web UI - Lifecycle rules for retention

Out of Scope: - Retention policy decisions (see data-retention.md) - Image processing/compression (handled at edge) - CDN caching (future enhancement)

Dependencies

Requires: - MinIO server (S3-compatible) - minio Python SDK - PostgreSQL (stores object key references)

Used By: - Detection ingestion service - Web UI (image display) - Export functionality - Cleanup jobs


Quick Reference

When to Use This PRP

Use when: - Storing detection images from edge collectors - Generating URLs for image display in UI - Implementing image lifecycle management - Setting up MinIO bucket configuration

Don't use when: - Storing non-image data (use PostgreSQL) - Implementing image processing (see edge PRP)


Patterns

Pattern: Bucket Structure

Problem: Images need organized storage with efficient date-based cleanup and clear separation between image types.

Solution: Use path-based organization within a single bucket with date hierarchy.

Implementation:

alpr-images/                          # Primary bucket
├── full/                             # Full scene images
│   └── {year}/{month}/{day}/
│       └── {detection_id}_full.jpg
│
├── crop/                             # Plate crop images
│   └── {year}/{month}/{day}/
│       └── {detection_id}_crop.jpg
│
└── temp/                             # Temporary upload staging
    └── {upload_id}/
        └── pending files...

Object Key Format:

Image Type Key Pattern Example
Full scene full/{year}/{month}/{day}/{detection_id}_full.jpg full/2025/01/15/evt_abc123_full.jpg
Plate crop crop/{year}/{month}/{day}/{detection_id}_crop.jpg crop/2025/01/15/evt_abc123_crop.jpg

Key Components:

Component Format Example
year 4-digit year 2025
month 2-digit month (zero-padded) 01, 12
day 2-digit day (zero-padded) 05, 31
detection_id UUID or prefixed ID evt_abc123def456

Key Generation Helper:

from datetime import datetime

def generate_object_key(
    detection_id: str,
    captured_at: datetime,
    image_type: str  # "full" or "crop"
) -> str:
    """Generate consistent object key for image storage."""
    date_path = captured_at.strftime("%Y/%m/%d")
    return f"{image_type}/{date_path}/{detection_id}_{image_type}.jpg"


# Usage
key = generate_object_key("evt_abc123", datetime(2025, 1, 15), "full")
# Returns: "full/2025/01/15/evt_abc123_full.jpg"

When to Use: - All image storage operations - Date-based lifecycle rules

Trade-offs: - Pros: Easy date-based cleanup, clear organization, efficient prefix queries - Cons: Cannot reorganize by other dimensions (collector, camera) without duplication


Pattern: Presigned URL Generation

Problem: Images in private bucket need secure, time-limited access without exposing credentials to clients.

Solution: Generate presigned URLs on-demand with short expiration times.

Implementation:

from datetime import timedelta
from minio import Minio

class ImageStorage:
    def __init__(self, endpoint: str, access_key: str, secret_key: str):
        self.client = Minio(
            endpoint,
            access_key=access_key,
            secret_key=secret_key,
            secure=True
        )
        self.bucket_name = "alpr-images"
        self.default_expiry = timedelta(minutes=15)
        self.max_expiry = timedelta(hours=1)

    def get_presigned_url(
        self,
        detection_id: str,
        image_type: str,
        captured_at: datetime,
        expiry: timedelta | None = None
    ) -> str:
        """
        Generate presigned URL for image access.

        Args:
            detection_id: Detection identifier
            image_type: "full" or "crop"
            captured_at: Detection timestamp for key generation
            expiry: URL validity period (default 15 minutes)

        Returns:
            Presigned URL for direct image access
        """
        if expiry is None:
            expiry = self.default_expiry
        elif expiry > self.max_expiry:
            expiry = self.max_expiry

        object_key = generate_object_key(detection_id, captured_at, image_type)

        return self.client.presigned_get_object(
            bucket_name=self.bucket_name,
            object_name=object_key,
            expires=expiry
        )

URL Settings:

Setting Value Rationale
Default expiration 15 minutes Sufficient for page view
Max expiration 1 hour For downloads/exports
Signature version v4 Current AWS standard

Example Presigned URL:

https://minio.example.com/alpr-images/full/2025/01/15/evt_abc123_full.jpg
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=...
  &X-Amz-Date=20250115T143200Z
  &X-Amz-Expires=900
  &X-Amz-SignedHeaders=host
  &X-Amz-Signature=...

When to Use: - All image URLs in API responses - Web UI image display - Export downloads

Trade-offs: - Pros: No credential exposure, automatic expiry, revocable - Cons: URLs not bookmarkable, require server round-trip to regenerate


Pattern: Image Upload

Problem: Edge collectors send images as multipart form-data with detection batch uploads; these need efficient parallel upload to MinIO.

Solution: Extract image binary from multipart request and upload to MinIO with proper metadata, using async I/O for parallelism.

Note: The architecture uses multipart form-data (not base64 JSON) to save 33% bandwidth. See Detection Batching for the upload format.

Implementation:

import asyncio
from datetime import datetime
import io

async def store_detection_images(
    storage: ImageStorage,
    detection_id: str,
    captured_at: datetime,
    collector_id: str,
    full_image_bytes: bytes,
    crop_image_bytes: bytes
) -> tuple[str, str]:
    """
    Store both detection images and return object keys.

    Args:
        storage: ImageStorage instance
        detection_id: Detection identifier
        captured_at: Detection timestamp
        collector_id: Source collector UUID
        full_image_bytes: Full scene image data
        crop_image_bytes: Plate crop image data

    Returns:
        Tuple of (full_key, crop_key)
    """
    full_key = generate_object_key(detection_id, captured_at, "full")
    crop_key = generate_object_key(detection_id, captured_at, "crop")

    # Common metadata for both images
    metadata = {
        "detection-id": detection_id,
        "collector-id": collector_id,
        "captured-at": captured_at.isoformat()
    }

    # Upload both images in parallel
    await asyncio.gather(
        _upload_image(storage, full_key, full_image_bytes, metadata),
        _upload_image(storage, crop_key, crop_image_bytes, metadata)
    )

    return full_key, crop_key


async def _upload_image(
    storage: ImageStorage,
    object_key: str,
    image_bytes: bytes,
    metadata: dict[str, str]
) -> None:
    """Upload single image with metadata."""
    storage.client.put_object(
        bucket_name=storage.bucket_name,
        object_name=object_key,
        data=io.BytesIO(image_bytes),
        length=len(image_bytes),
        content_type="image/jpeg",
        metadata=metadata
    )

Object Metadata:

Header Value
Content-Type image/jpeg
Cache-Control private, max-age=86400
x-amz-meta-detection-id {detection_id}
x-amz-meta-collector-id {collector_id}
x-amz-meta-captured-at {ISO timestamp}

When to Use: - Detection ingestion from edge collectors - Batch image upload

Trade-offs: - Pros: Parallel uploads, consistent metadata, decoupled from Detection storage - Cons: Base64 encoding overhead (33% larger in transit)


Pattern: Error Handling with Retry

Problem: MinIO may be temporarily unavailable; image upload failures shouldn't block Detection storage.

Solution: Implement retry with exponential backoff; store events even if images fail.

Implementation:

import asyncio
from functools import wraps
from typing import TypeVar, Callable, Awaitable

T = TypeVar("T")

def with_retry(
    max_attempts: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 30.0
) -> Callable[[Callable[..., Awaitable[T]]], Callable[..., Awaitable[T]]]:
    """Decorator for async functions with exponential backoff retry."""
    def decorator(func: Callable[..., Awaitable[T]]) -> Callable[..., Awaitable[T]]:
        @wraps(func)
        async def wrapper(*args, **kwargs) -> T:
            last_error = None
            for attempt in range(max_attempts):
                try:
                    return await func(*args, **kwargs)
                except Exception as e:
                    last_error = e
                    if attempt < max_attempts - 1:
                        delay = min(base_delay * (2 ** attempt), max_delay)
                        await asyncio.sleep(delay)
            raise last_error
        return wrapper
    return decorator


@with_retry(max_attempts=3)
async def upload_with_retry(
    storage: ImageStorage,
    object_key: str,
    image_bytes: bytes,
    metadata: dict[str, str]
) -> None:
    """Upload image with automatic retry on failure."""
    await _upload_image(storage, object_key, image_bytes, metadata)

Error Scenarios:

Scenario Behavior
Image upload fails Retry 3x with backoff; log error; Detection stored without images
Presigned URL expired Client re-requests; server generates new URL
Image deleted (retention) Return placeholder image; UI shows "Image expired"
MinIO unavailable Detection stored; image upload queued for retry

When to Use: - All MinIO operations - Any network-dependent storage call

Trade-offs: - Pros: Resilient to transient failures, events never lost - Cons: Delayed image availability, complexity in tracking failed uploads


Pattern: Bucket Configuration

Problem: MinIO bucket needs proper security, access policies, and lifecycle rules.

Solution: Configure bucket for private access with lifecycle-based cleanup.

Implementation:

from minio import Minio
from minio.lifecycleconfig import LifecycleConfig, Rule, Expiration

def configure_bucket(client: Minio, bucket_name: str = "alpr-images") -> None:
    """Configure bucket with proper settings and lifecycle rules."""

    # Create bucket if not exists
    if not client.bucket_exists(bucket_name):
        client.make_bucket(bucket_name, location="us-east-1")

    # Set lifecycle rules for automatic cleanup
    lifecycle_config = LifecycleConfig(
        [
            # Delete images after 1 year
            Rule(
                rule_id="expire-images",
                status="Enabled",
                expiration=Expiration(days=365),
                rule_filter={"prefix": ""}  # All objects
            ),
            # Clean up incomplete multipart uploads
            Rule(
                rule_id="cleanup-incomplete",
                status="Enabled",
                abort_incomplete_multipart_upload_days_after_initiation=7
            )
        ]
    )

    client.set_bucket_lifecycle(bucket_name, lifecycle_config)

Bucket Settings:

bucket_name: alpr-images
region: us-east-1  # Required even for self-hosted

# Access policy: private (presigned URLs only)
policy: private

# Versioning: disabled (WORM handled at DB level)
versioning: false

# Object locking: disabled (legal holds via DB)
object_lock: false

Lifecycle Rules:

Rule Trigger Action
expire-images 365 days Delete object
cleanup-incomplete 7 days Abort multipart upload

When to Use: - Initial MinIO setup - Environment provisioning scripts


Pattern: Tiered Storage (Optional)

Problem: Older images accessed less frequently; storing all on fast SSD is expensive.

Solution: Configure MinIO tiering to move older images to cheaper storage.

Implementation:

# Add cold tier using MinIO Client (mc)
mc admin tier add minio WARM_HDD \
    --endpoint https://localhost:9000 \
    --bucket alpr-images-archive \
    --prefix "" \
    --storage-class STANDARD

# Add transition rule
mc ilm add minio/alpr-images \
    --transition-days 90 \
    --transition-tier "WARM_HDD" \
    --prefix ""

Tier Configuration:

Tier Age Storage Type Access Speed
Hot 0-90 days SSD Fast (active investigations)
Warm 90-365 days HDD Slower (historical lookups)
Expired >365 days Deleted N/A

Hardware Layout:

MinIO Server
├── /mnt/ssd (Hot tier)
│   └── Fast NVMe/SSD storage
│   └── Size: 500GB - 1TB
│
└── /mnt/hdd (Warm tier)
    └── Spinning disk array
    └── Size: 4TB+ (expandable)

When to Use: - Production deployments with >500GB images - Cost-sensitive environments

Trade-offs: - Pros: Significant cost savings, automatic management - Cons: Slower access to older images, additional hardware


Anti-Patterns

Anti-Pattern: Storing Images in Database

Problem: Storing binary image data directly in PostgreSQL.

Example of BAD code:

# BAD: Images as bytea columns
class Detection(Base):
    __tablename__ = "detections"
    id = Column(BigInteger, primary_key=True)
    full_image = Column(LargeBinary)  # Don't do this!
    crop_image = Column(LargeBinary)  # Don't do this!

Why It's Wrong: - Database bloat (100KB per row) - Slow queries (pulling image data) - Expensive backups - Cannot use CDN

Correct Approach:

# GOOD: Store reference to object storage
class EventImage(Base):
    __tablename__ = "event_images"
    id = Column(BigInteger, primary_key=True)
    detection_id = Column(BigInteger, ForeignKey("events.id"))
    full_image_key = Column(String(255))  # MinIO object key
    crop_image_key = Column(String(255))  # MinIO object key

Anti-Pattern: Long-Lived Presigned URLs

Problem: Generating presigned URLs with long expiration (days/weeks).

Example of BAD code:

# BAD: 30-day presigned URL
url = client.presigned_get_object(
    bucket_name="alpr-images",
    object_name=key,
    expires=timedelta(days=30)  # Too long!
)

Why It's Wrong: - Cannot revoke access - URLs shared/leaked remain valid - Complicates audit trail

Correct Approach:

# GOOD: Short-lived URLs generated on demand
url = client.presigned_get_object(
    bucket_name="alpr-images",
    object_name=key,
    expires=timedelta(minutes=15)  # Short-lived
)

Testing Strategies

Unit Testing

import pytest
from unittest.mock import Mock, patch
from datetime import datetime

def test_generate_object_key():
    """Test consistent key generation."""
    key = generate_object_key(
        detection_id="evt_abc123",
        captured_at=datetime(2025, 1, 15, 10, 30),
        image_type="full"
    )
    assert key == "full/2025/01/15/evt_abc123_full.jpg"


def test_generate_object_key_zero_padded():
    """Test zero-padding for single digit month/day."""
    key = generate_object_key(
        detection_id="evt_xyz",
        captured_at=datetime(2025, 3, 5),
        image_type="crop"
    )
    assert key == "crop/2025/03/05/evt_xyz_crop.jpg"

Integration Testing

import pytest
from minio import Minio

@pytest.fixture
def minio_client():
    """MinIO test client."""
    client = Minio(
        endpoint="localhost:9000",
        access_key="minioadmin",
        secret_key="minioadmin",
        secure=False
    )
    # Setup test bucket
    if not client.bucket_exists("test-images"):
        client.make_bucket("test-images")
    yield client
    # Cleanup after tests
    # ... remove test objects


async def test_store_and_retrieve_image(minio_client):
    """Test full upload/download cycle."""
    storage = ImageStorage(minio_client)

    # Store test image
    test_image = b"\xff\xd8\xff\xe0..."  # JPEG bytes
    full_key, crop_key = await store_detection_images(
        storage=storage,
        detection_id="test_evt_123",
        captured_at=datetime.utcnow(),
        collector_id="test-collector",
        full_image_bytes=test_image,
        crop_image_bytes=test_image
    )

    # Verify retrieval
    url = storage.get_presigned_url(
        detection_id="test_evt_123",
        image_type="full",
        captured_at=datetime.utcnow()
    )
    assert "test_evt_123" in url
    assert "X-Amz-Signature" in url

Configuration

Environment Variables

Variable Required Default Description
MINIO_ENDPOINT Yes - MinIO server endpoint (host:port)
MINIO_ACCESS_KEY Yes - Access key ID
MINIO_SECRET_KEY Yes - Secret access key
MINIO_BUCKET No alpr-images Bucket name
MINIO_SECURE No true Use HTTPS
MINIO_REGION No us-east-1 Bucket region
IMAGE_URL_EXPIRY_MINUTES No 15 Presigned URL expiry
IMAGE_MAX_URL_EXPIRY_MINUTES No 60 Maximum URL expiry

Docker Compose Example

services:
  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: ${MINIO_ACCESS_KEY}
      MINIO_ROOT_PASSWORD: ${MINIO_SECRET_KEY}
    volumes:
      - minio_data:/data
    ports:
      - "9000:9000"  # API
      - "9001:9001"  # Console
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

volumes:
  minio_data:


Maintainer: Development Team Review Cycle: Quarterly