Data Retention Policy¶
Overview¶
Data retention balances investigative needs (long history) with storage costs (images are large).
Retention Periods¶
| Data Type | Retention | Storage | Rationale |
|---|---|---|---|
| Detection metadata | 7 years | PostgreSQL | Legal/investigation requirements |
| Plate text & confidence | 7 years | PostgreSQL | Core search data |
| Vehicle info (color, type) | 7 years | PostgreSQL | Investigation context |
| Camera/collector info | 7 years | PostgreSQL | Audit trail |
| Alert history | 7 years | PostgreSQL | Compliance records |
| Full scene images | 1 year | MinIO | Storage cost management |
| Plate crop images | 1 year | MinIO | Storage cost management |
Image Storage¶
Images are stored in MinIO (S3-compatible object storage) with date-based organization for efficient lifecycle management.
Key Design Decisions:
- Single bucket with path prefixes (full/, crop/) for image types
- Date-based paths (YYYY/MM/DD) for efficient cleanup
- Presigned URLs for secure, time-limited access (no public bucket)
- Private bucket policy - all access via signed URLs
Access Pattern: - Images accessed via 15-minute presigned URLs - URLs generated on-demand by API - Expired URLs prompt client to re-request
Error Handling: - Image upload failures don't block Detection storage - Failed uploads are retried with exponential backoff - Events stored even if images fail (metadata preserved)
Implementation Details: See Image Storage PRP for bucket structure, presigned URL generation, upload patterns, and lifecycle configuration.
Storage Estimates¶
Image Storage (MinIO)¶
| Scenario | Daily Volume | 1 Year | Notes |
|---|---|---|---|
| Month 1: 7 cameras (5 regular + 2 high-volume) | ~11,000 detections | ~400 GB | ~100KB/detection |
| Month 6: 100 cameras (90 regular + 10 high-volume) | ~100,000 detections | ~3.6 TB |
Detection rate assumptions: - Regular camera: ~720 detections/day (1/min × 12 hours) - High-volume camera: ~3,600 detections/day (1/sec during peak periods)
Database Storage (PostgreSQL)¶
Detection metadata is small (~500 bytes/detection). 100,000 detections/day for 7 years ≈ 120 GB.
Image Lifecycle¶
Detection
│
▼
Hot Storage (SSD) ──── 0-90 days ──── Fast access for active investigations
│
▼
Warm Storage (HDD) ─── 90-365 days ── Slower access, lower cost (optional)
│
▼
Deletion ───────────── After 1 year ─ Automatic cleanup
Tiered Storage (Optional)¶
MinIO supports tiered storage to move older objects to cheaper storage.
Tier Configuration¶
| Tier | Age | Storage Type | Access Speed |
|---|---|---|---|
| Hot | 0-90 days | SSD | Fast (active investigations) |
| Warm | 90-365 days | HDD | Slower (historical lookups) |
| Expired | >365 days | Deleted | N/A |
MinIO Lifecycle Rules¶
Configure via MinIO Console or mc ilm CLI:
Transition rule (Hot → Warm at 90 days):
- Applies to: images/*
- Transition after: 90 days
- Target tier: WARM_HDD
Expiration rule (Delete at 1 year):
- Applies to: images/*
- Expire after: 365 days
Hardware Setup for Tiering¶
MinIO Server
├── /mnt/ssd (Hot tier)
│ └── Fast NVMe/SSD storage
│ └── Size: 500GB - 1TB
│
└── /mnt/hdd (Warm tier)
└── Spinning disk array
└── Size: 4TB+ (expandable)
Database Retention¶
Automatic Cleanup¶
Scheduled job (daily) removes data older than retention period:
| Table | Retention | Cleanup Strategy |
|---|---|---|
events |
7 years | Partition by month, drop old partitions |
alerts |
7 years | Cascade delete with events |
audit_logs |
7 years | Partition by month |
legal_holds |
7 years | Never deleted; retained with events |
alert_queue |
7 days | Delete completed/failed entries |
sessions |
30 days | Delete expired sessions |
Table Partitioning¶
Partition large tables by time for efficient cleanup:
Benefits: - Drop entire partition instead of DELETE (instant, no bloat) - Query performance (partition pruning) - Easier backups (per-partition)
Image Deletion Handling¶
When images expire but metadata remains:
- Image URLs in database become invalid
- UI shows "Image expired" placeholder
- Metadata (plate, timestamp, location) still searchable
- Alert history preserved with context
Legal Holds¶
For active investigations, exempt specific data from automatic deletion.
Note: Legal holds are managed via a separate legal_holds junction table (not columns on events). This maintains WORM immutability of the detections table. See data-integrity.md for full schema and implementation details.
| Field (in legal_holds table) | Purpose |
|---|---|
case_id |
Investigation/case reference number |
detection_id |
FK to protected detection |
reason |
Description of why hold was placed |
hold_until |
Optional explicit expiration date |
released_at |
NULL = active hold; timestamp = released |
Detections with an active legal hold (unreleased, not expired) are excluded from cleanup jobs.
Backup Strategy¶
| Data | Backup Frequency | Retention |
|---|---|---|
| PostgreSQL | Daily full, hourly incremental | 30 days |
| MinIO (hot tier) | Daily incremental | 7 days |
| MinIO (warm tier) | Weekly incremental | 30 days |
| Collector configs | On change | 90 days |
Configuration¶
Environment variables:
# Retention periods (days)
RETENTION_EVENTS_DAYS=2555 # 7 years
RETENTION_IMAGES_DAYS=365 # 1 year
RETENTION_SESSIONS_DAYS=30
RETENTION_ALERT_QUEUE_DAYS=7
# Tiering thresholds (days)
IMAGE_TIER_HOT_DAYS=90 # Move to warm after 90 days
# Cleanup schedule
CLEANUP_SCHEDULE="0 3 * * *" # Daily at 3am
Compliance Notes¶
- 7-year retention aligns with common legal discovery requirements
- Audit logs track all access to detection data
- Legal hold prevents accidental deletion during investigations
- Deletion is permanent (no soft delete for storage efficiency)
Related Documentation¶
- Architecture: Alert Engine - Alert generation and notification processing
- Architecture: Data Integrity - WORM protection and legal hold implementation
- PRP: Image Storage PRP - MinIO bucket structure and lifecycle rules
Decision Date: 2025-12-29 Status: Approved Rationale: Balances long-term investigative needs (7-year metadata) with storage costs (1-year images)