Skip to content

Data Retention Policy

Overview

Data retention balances investigative needs (long history) with storage costs (images are large).

Retention Periods

Data Type Retention Storage Rationale
Detection metadata 7 years PostgreSQL Legal/investigation requirements
Plate text & confidence 7 years PostgreSQL Core search data
Vehicle info (color, type) 7 years PostgreSQL Investigation context
Camera/collector info 7 years PostgreSQL Audit trail
Alert history 7 years PostgreSQL Compliance records
Full scene images 1 year MinIO Storage cost management
Plate crop images 1 year MinIO Storage cost management

Image Storage

Images are stored in MinIO (S3-compatible object storage) with date-based organization for efficient lifecycle management.

Key Design Decisions: - Single bucket with path prefixes (full/, crop/) for image types - Date-based paths (YYYY/MM/DD) for efficient cleanup - Presigned URLs for secure, time-limited access (no public bucket) - Private bucket policy - all access via signed URLs

Access Pattern: - Images accessed via 15-minute presigned URLs - URLs generated on-demand by API - Expired URLs prompt client to re-request

Error Handling: - Image upload failures don't block Detection storage - Failed uploads are retried with exponential backoff - Events stored even if images fail (metadata preserved)

Implementation Details: See Image Storage PRP for bucket structure, presigned URL generation, upload patterns, and lifecycle configuration.


Storage Estimates

Image Storage (MinIO)

Scenario Daily Volume 1 Year Notes
Month 1: 7 cameras (5 regular + 2 high-volume) ~11,000 detections ~400 GB ~100KB/detection
Month 6: 100 cameras (90 regular + 10 high-volume) ~100,000 detections ~3.6 TB

Detection rate assumptions: - Regular camera: ~720 detections/day (1/min × 12 hours) - High-volume camera: ~3,600 detections/day (1/sec during peak periods)

Database Storage (PostgreSQL)

Detection metadata is small (~500 bytes/detection). 100,000 detections/day for 7 years ≈ 120 GB.

Image Lifecycle

Detection
Hot Storage (SSD) ──── 0-90 days ──── Fast access for active investigations
Warm Storage (HDD) ─── 90-365 days ── Slower access, lower cost (optional)
Deletion ───────────── After 1 year ─ Automatic cleanup

Tiered Storage (Optional)

MinIO supports tiered storage to move older objects to cheaper storage.

Tier Configuration

Tier Age Storage Type Access Speed
Hot 0-90 days SSD Fast (active investigations)
Warm 90-365 days HDD Slower (historical lookups)
Expired >365 days Deleted N/A

MinIO Lifecycle Rules

Configure via MinIO Console or mc ilm CLI:

Transition rule (Hot → Warm at 90 days): - Applies to: images/* - Transition after: 90 days - Target tier: WARM_HDD

Expiration rule (Delete at 1 year): - Applies to: images/* - Expire after: 365 days

Hardware Setup for Tiering

MinIO Server
├── /mnt/ssd (Hot tier)
│   └── Fast NVMe/SSD storage
│   └── Size: 500GB - 1TB
└── /mnt/hdd (Warm tier)
    └── Spinning disk array
    └── Size: 4TB+ (expandable)

Database Retention

Automatic Cleanup

Scheduled job (daily) removes data older than retention period:

Table Retention Cleanup Strategy
events 7 years Partition by month, drop old partitions
alerts 7 years Cascade delete with events
audit_logs 7 years Partition by month
legal_holds 7 years Never deleted; retained with events
alert_queue 7 days Delete completed/failed entries
sessions 30 days Delete expired sessions

Table Partitioning

Partition large tables by time for efficient cleanup:

events
├── events_2025_01
├── events_2025_02
├── ...
└── events_2031_12

Benefits: - Drop entire partition instead of DELETE (instant, no bloat) - Query performance (partition pruning) - Easier backups (per-partition)

Image Deletion Handling

When images expire but metadata remains:

  1. Image URLs in database become invalid
  2. UI shows "Image expired" placeholder
  3. Metadata (plate, timestamp, location) still searchable
  4. Alert history preserved with context

For active investigations, exempt specific data from automatic deletion.

Note: Legal holds are managed via a separate legal_holds junction table (not columns on events). This maintains WORM immutability of the detections table. See data-integrity.md for full schema and implementation details.

Field (in legal_holds table) Purpose
case_id Investigation/case reference number
detection_id FK to protected detection
reason Description of why hold was placed
hold_until Optional explicit expiration date
released_at NULL = active hold; timestamp = released

Detections with an active legal hold (unreleased, not expired) are excluded from cleanup jobs.

Backup Strategy

Data Backup Frequency Retention
PostgreSQL Daily full, hourly incremental 30 days
MinIO (hot tier) Daily incremental 7 days
MinIO (warm tier) Weekly incremental 30 days
Collector configs On change 90 days

Configuration

Environment variables:

# Retention periods (days)
RETENTION_EVENTS_DAYS=2555          # 7 years
RETENTION_IMAGES_DAYS=365           # 1 year
RETENTION_SESSIONS_DAYS=30
RETENTION_ALERT_QUEUE_DAYS=7

# Tiering thresholds (days)
IMAGE_TIER_HOT_DAYS=90              # Move to warm after 90 days

# Cleanup schedule
CLEANUP_SCHEDULE="0 3 * * *"        # Daily at 3am

Compliance Notes

  • 7-year retention aligns with common legal discovery requirements
  • Audit logs track all access to detection data
  • Legal hold prevents accidental deletion during investigations
  • Deletion is permanent (no soft delete for storage efficiency)

  • Architecture: Alert Engine - Alert generation and notification processing
  • Architecture: Data Integrity - WORM protection and legal hold implementation
  • PRP: Image Storage PRP - MinIO bucket structure and lifecycle rules

Decision Date: 2025-12-29 Status: Approved Rationale: Balances long-term investigative needs (7-year metadata) with storage costs (1-year images)