Image Storage Patterns - Project Reference Pattern¶
Component: MinIO Object Storage Status: 🟡 In Development Created: 2025-12-29 Last Updated: 2025-12-29
Overview¶
Purpose¶
This PRP defines implementation patterns for image storage using MinIO (S3-compatible object storage) including bucket structure, presigned URL generation, upload/download operations, and lifecycle management.
Scope¶
Responsibilities: - MinIO bucket configuration and structure - Object key naming conventions - Presigned URL generation for secure access - Image upload from edge collectors - Image retrieval for web UI - Lifecycle rules for retention
Out of Scope: - Retention policy decisions (see data-retention.md) - Image processing/compression (handled at edge) - CDN caching (future enhancement)
Dependencies¶
Requires:
- MinIO server (S3-compatible)
- minio Python SDK
- PostgreSQL (stores object key references)
Used By: - Detection ingestion service - Web UI (image display) - Export functionality - Cleanup jobs
Quick Reference¶
When to Use This PRP¶
Use when: - Storing detection images from edge collectors - Generating URLs for image display in UI - Implementing image lifecycle management - Setting up MinIO bucket configuration
Don't use when: - Storing non-image data (use PostgreSQL) - Implementing image processing (see edge PRP)
Patterns¶
Pattern: Bucket Structure¶
Problem: Images need organized storage with efficient date-based cleanup and clear separation between image types.
Solution: Use path-based organization within a single bucket with date hierarchy.
Implementation:
alpr-images/ # Primary bucket
├── full/ # Full scene images
│ └── {year}/{month}/{day}/
│ └── {detection_id}_full.jpg
│
├── crop/ # Plate crop images
│ └── {year}/{month}/{day}/
│ └── {detection_id}_crop.jpg
│
└── temp/ # Temporary upload staging
└── {upload_id}/
└── pending files...
Object Key Format:
| Image Type | Key Pattern | Example |
|---|---|---|
| Full scene | full/{year}/{month}/{day}/{detection_id}_full.jpg |
full/2025/01/15/evt_abc123_full.jpg |
| Plate crop | crop/{year}/{month}/{day}/{detection_id}_crop.jpg |
crop/2025/01/15/evt_abc123_crop.jpg |
Key Components:
| Component | Format | Example |
|---|---|---|
year |
4-digit year | 2025 |
month |
2-digit month (zero-padded) | 01, 12 |
day |
2-digit day (zero-padded) | 05, 31 |
detection_id |
UUID or prefixed ID | evt_abc123def456 |
Key Generation Helper:
from datetime import datetime
def generate_object_key(
detection_id: str,
captured_at: datetime,
image_type: str # "full" or "crop"
) -> str:
"""Generate consistent object key for image storage."""
date_path = captured_at.strftime("%Y/%m/%d")
return f"{image_type}/{date_path}/{detection_id}_{image_type}.jpg"
# Usage
key = generate_object_key("evt_abc123", datetime(2025, 1, 15), "full")
# Returns: "full/2025/01/15/evt_abc123_full.jpg"
When to Use: - All image storage operations - Date-based lifecycle rules
Trade-offs: - Pros: Easy date-based cleanup, clear organization, efficient prefix queries - Cons: Cannot reorganize by other dimensions (collector, camera) without duplication
Pattern: Presigned URL Generation¶
Problem: Images in private bucket need secure, time-limited access without exposing credentials to clients.
Solution: Generate presigned URLs on-demand with short expiration times.
Implementation:
from datetime import timedelta
from minio import Minio
class ImageStorage:
def __init__(self, endpoint: str, access_key: str, secret_key: str):
self.client = Minio(
endpoint,
access_key=access_key,
secret_key=secret_key,
secure=True
)
self.bucket_name = "alpr-images"
self.default_expiry = timedelta(minutes=15)
self.max_expiry = timedelta(hours=1)
def get_presigned_url(
self,
detection_id: str,
image_type: str,
captured_at: datetime,
expiry: timedelta | None = None
) -> str:
"""
Generate presigned URL for image access.
Args:
detection_id: Detection identifier
image_type: "full" or "crop"
captured_at: Detection timestamp for key generation
expiry: URL validity period (default 15 minutes)
Returns:
Presigned URL for direct image access
"""
if expiry is None:
expiry = self.default_expiry
elif expiry > self.max_expiry:
expiry = self.max_expiry
object_key = generate_object_key(detection_id, captured_at, image_type)
return self.client.presigned_get_object(
bucket_name=self.bucket_name,
object_name=object_key,
expires=expiry
)
URL Settings:
| Setting | Value | Rationale |
|---|---|---|
| Default expiration | 15 minutes | Sufficient for page view |
| Max expiration | 1 hour | For downloads/exports |
| Signature version | v4 | Current AWS standard |
Example Presigned URL:
https://minio.example.com/alpr-images/full/2025/01/15/evt_abc123_full.jpg
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=...
&X-Amz-Date=20250115T143200Z
&X-Amz-Expires=900
&X-Amz-SignedHeaders=host
&X-Amz-Signature=...
When to Use: - All image URLs in API responses - Web UI image display - Export downloads
Trade-offs: - Pros: No credential exposure, automatic expiry, revocable - Cons: URLs not bookmarkable, require server round-trip to regenerate
Pattern: Image Upload¶
Problem: Edge collectors send images as multipart form-data with detection batch uploads; these need efficient parallel upload to MinIO.
Solution: Extract image binary from multipart request and upload to MinIO with proper metadata, using async I/O for parallelism.
Note: The architecture uses multipart form-data (not base64 JSON) to save 33% bandwidth. See Detection Batching for the upload format.
Implementation:
import asyncio
from datetime import datetime
import io
async def store_detection_images(
storage: ImageStorage,
detection_id: str,
captured_at: datetime,
collector_id: str,
full_image_bytes: bytes,
crop_image_bytes: bytes
) -> tuple[str, str]:
"""
Store both detection images and return object keys.
Args:
storage: ImageStorage instance
detection_id: Detection identifier
captured_at: Detection timestamp
collector_id: Source collector UUID
full_image_bytes: Full scene image data
crop_image_bytes: Plate crop image data
Returns:
Tuple of (full_key, crop_key)
"""
full_key = generate_object_key(detection_id, captured_at, "full")
crop_key = generate_object_key(detection_id, captured_at, "crop")
# Common metadata for both images
metadata = {
"detection-id": detection_id,
"collector-id": collector_id,
"captured-at": captured_at.isoformat()
}
# Upload both images in parallel
await asyncio.gather(
_upload_image(storage, full_key, full_image_bytes, metadata),
_upload_image(storage, crop_key, crop_image_bytes, metadata)
)
return full_key, crop_key
async def _upload_image(
storage: ImageStorage,
object_key: str,
image_bytes: bytes,
metadata: dict[str, str]
) -> None:
"""Upload single image with metadata."""
storage.client.put_object(
bucket_name=storage.bucket_name,
object_name=object_key,
data=io.BytesIO(image_bytes),
length=len(image_bytes),
content_type="image/jpeg",
metadata=metadata
)
Object Metadata:
| Header | Value |
|---|---|
| Content-Type | image/jpeg |
| Cache-Control | private, max-age=86400 |
| x-amz-meta-detection-id | {detection_id} |
| x-amz-meta-collector-id | {collector_id} |
| x-amz-meta-captured-at | {ISO timestamp} |
When to Use: - Detection ingestion from edge collectors - Batch image upload
Trade-offs: - Pros: Parallel uploads, consistent metadata, decoupled from Detection storage - Cons: Base64 encoding overhead (33% larger in transit)
Pattern: Error Handling with Retry¶
Problem: MinIO may be temporarily unavailable; image upload failures shouldn't block Detection storage.
Solution: Implement retry with exponential backoff; store events even if images fail.
Implementation:
import asyncio
from functools import wraps
from typing import TypeVar, Callable, Awaitable
T = TypeVar("T")
def with_retry(
max_attempts: int = 3,
base_delay: float = 1.0,
max_delay: float = 30.0
) -> Callable[[Callable[..., Awaitable[T]]], Callable[..., Awaitable[T]]]:
"""Decorator for async functions with exponential backoff retry."""
def decorator(func: Callable[..., Awaitable[T]]) -> Callable[..., Awaitable[T]]:
@wraps(func)
async def wrapper(*args, **kwargs) -> T:
last_error = None
for attempt in range(max_attempts):
try:
return await func(*args, **kwargs)
except Exception as e:
last_error = e
if attempt < max_attempts - 1:
delay = min(base_delay * (2 ** attempt), max_delay)
await asyncio.sleep(delay)
raise last_error
return wrapper
return decorator
@with_retry(max_attempts=3)
async def upload_with_retry(
storage: ImageStorage,
object_key: str,
image_bytes: bytes,
metadata: dict[str, str]
) -> None:
"""Upload image with automatic retry on failure."""
await _upload_image(storage, object_key, image_bytes, metadata)
Error Scenarios:
| Scenario | Behavior |
|---|---|
| Image upload fails | Retry 3x with backoff; log error; Detection stored without images |
| Presigned URL expired | Client re-requests; server generates new URL |
| Image deleted (retention) | Return placeholder image; UI shows "Image expired" |
| MinIO unavailable | Detection stored; image upload queued for retry |
When to Use: - All MinIO operations - Any network-dependent storage call
Trade-offs: - Pros: Resilient to transient failures, events never lost - Cons: Delayed image availability, complexity in tracking failed uploads
Pattern: Bucket Configuration¶
Problem: MinIO bucket needs proper security, access policies, and lifecycle rules.
Solution: Configure bucket for private access with lifecycle-based cleanup.
Implementation:
from minio import Minio
from minio.lifecycleconfig import LifecycleConfig, Rule, Expiration
def configure_bucket(client: Minio, bucket_name: str = "alpr-images") -> None:
"""Configure bucket with proper settings and lifecycle rules."""
# Create bucket if not exists
if not client.bucket_exists(bucket_name):
client.make_bucket(bucket_name, location="us-east-1")
# Set lifecycle rules for automatic cleanup
lifecycle_config = LifecycleConfig(
[
# Delete images after 1 year
Rule(
rule_id="expire-images",
status="Enabled",
expiration=Expiration(days=365),
rule_filter={"prefix": ""} # All objects
),
# Clean up incomplete multipart uploads
Rule(
rule_id="cleanup-incomplete",
status="Enabled",
abort_incomplete_multipart_upload_days_after_initiation=7
)
]
)
client.set_bucket_lifecycle(bucket_name, lifecycle_config)
Bucket Settings:
bucket_name: alpr-images
region: us-east-1 # Required even for self-hosted
# Access policy: private (presigned URLs only)
policy: private
# Versioning: disabled (WORM handled at DB level)
versioning: false
# Object locking: disabled (legal holds via DB)
object_lock: false
Lifecycle Rules:
| Rule | Trigger | Action |
|---|---|---|
| expire-images | 365 days | Delete object |
| cleanup-incomplete | 7 days | Abort multipart upload |
When to Use: - Initial MinIO setup - Environment provisioning scripts
Pattern: Tiered Storage (Optional)¶
Problem: Older images accessed less frequently; storing all on fast SSD is expensive.
Solution: Configure MinIO tiering to move older images to cheaper storage.
Implementation:
# Add cold tier using MinIO Client (mc)
mc admin tier add minio WARM_HDD \
--endpoint https://localhost:9000 \
--bucket alpr-images-archive \
--prefix "" \
--storage-class STANDARD
# Add transition rule
mc ilm add minio/alpr-images \
--transition-days 90 \
--transition-tier "WARM_HDD" \
--prefix ""
Tier Configuration:
| Tier | Age | Storage Type | Access Speed |
|---|---|---|---|
| Hot | 0-90 days | SSD | Fast (active investigations) |
| Warm | 90-365 days | HDD | Slower (historical lookups) |
| Expired | >365 days | Deleted | N/A |
Hardware Layout:
MinIO Server
├── /mnt/ssd (Hot tier)
│ └── Fast NVMe/SSD storage
│ └── Size: 500GB - 1TB
│
└── /mnt/hdd (Warm tier)
└── Spinning disk array
└── Size: 4TB+ (expandable)
When to Use: - Production deployments with >500GB images - Cost-sensitive environments
Trade-offs: - Pros: Significant cost savings, automatic management - Cons: Slower access to older images, additional hardware
Anti-Patterns¶
Anti-Pattern: Storing Images in Database¶
Problem: Storing binary image data directly in PostgreSQL.
Example of BAD code:
# BAD: Images as bytea columns
class Detection(Base):
__tablename__ = "detections"
id = Column(BigInteger, primary_key=True)
full_image = Column(LargeBinary) # Don't do this!
crop_image = Column(LargeBinary) # Don't do this!
Why It's Wrong: - Database bloat (100KB per row) - Slow queries (pulling image data) - Expensive backups - Cannot use CDN
Correct Approach:
# GOOD: Store reference to object storage
class EventImage(Base):
__tablename__ = "event_images"
id = Column(BigInteger, primary_key=True)
detection_id = Column(BigInteger, ForeignKey("events.id"))
full_image_key = Column(String(255)) # MinIO object key
crop_image_key = Column(String(255)) # MinIO object key
Anti-Pattern: Long-Lived Presigned URLs¶
Problem: Generating presigned URLs with long expiration (days/weeks).
Example of BAD code:
# BAD: 30-day presigned URL
url = client.presigned_get_object(
bucket_name="alpr-images",
object_name=key,
expires=timedelta(days=30) # Too long!
)
Why It's Wrong: - Cannot revoke access - URLs shared/leaked remain valid - Complicates audit trail
Correct Approach:
# GOOD: Short-lived URLs generated on demand
url = client.presigned_get_object(
bucket_name="alpr-images",
object_name=key,
expires=timedelta(minutes=15) # Short-lived
)
Testing Strategies¶
Unit Testing¶
import pytest
from unittest.mock import Mock, patch
from datetime import datetime
def test_generate_object_key():
"""Test consistent key generation."""
key = generate_object_key(
detection_id="evt_abc123",
captured_at=datetime(2025, 1, 15, 10, 30),
image_type="full"
)
assert key == "full/2025/01/15/evt_abc123_full.jpg"
def test_generate_object_key_zero_padded():
"""Test zero-padding for single digit month/day."""
key = generate_object_key(
detection_id="evt_xyz",
captured_at=datetime(2025, 3, 5),
image_type="crop"
)
assert key == "crop/2025/03/05/evt_xyz_crop.jpg"
Integration Testing¶
import pytest
from minio import Minio
@pytest.fixture
def minio_client():
"""MinIO test client."""
client = Minio(
endpoint="localhost:9000",
access_key="minioadmin",
secret_key="minioadmin",
secure=False
)
# Setup test bucket
if not client.bucket_exists("test-images"):
client.make_bucket("test-images")
yield client
# Cleanup after tests
# ... remove test objects
async def test_store_and_retrieve_image(minio_client):
"""Test full upload/download cycle."""
storage = ImageStorage(minio_client)
# Store test image
test_image = b"\xff\xd8\xff\xe0..." # JPEG bytes
full_key, crop_key = await store_detection_images(
storage=storage,
detection_id="test_evt_123",
captured_at=datetime.utcnow(),
collector_id="test-collector",
full_image_bytes=test_image,
crop_image_bytes=test_image
)
# Verify retrieval
url = storage.get_presigned_url(
detection_id="test_evt_123",
image_type="full",
captured_at=datetime.utcnow()
)
assert "test_evt_123" in url
assert "X-Amz-Signature" in url
Configuration¶
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
MINIO_ENDPOINT |
Yes | - | MinIO server endpoint (host:port) |
MINIO_ACCESS_KEY |
Yes | - | Access key ID |
MINIO_SECRET_KEY |
Yes | - | Secret access key |
MINIO_BUCKET |
No | alpr-images |
Bucket name |
MINIO_SECURE |
No | true |
Use HTTPS |
MINIO_REGION |
No | us-east-1 |
Bucket region |
IMAGE_URL_EXPIRY_MINUTES |
No | 15 |
Presigned URL expiry |
IMAGE_MAX_URL_EXPIRY_MINUTES |
No | 60 |
Maximum URL expiry |
Docker Compose Example¶
services:
minio:
image: minio/minio:latest
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: ${MINIO_ACCESS_KEY}
MINIO_ROOT_PASSWORD: ${MINIO_SECRET_KEY}
volumes:
- minio_data:/data
ports:
- "9000:9000" # API
- "9001:9001" # Console
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
volumes:
minio_data:
Related Documentation¶
- Architecture: data-retention.md - Retention policy decisions
- Architecture: ARCHITECTURE.md - System overview
- PRP: camera-adapters-prp.md - Camera image capture
- PRP: database-prp.md - Detection metadata storage
- Testing: TEST_CONFIG.md - MinIO test fixtures
Maintainer: Development Team Review Cycle: Quarterly