Backup and Disaster Recovery¶
Overview¶
Backup strategy for community watch platform with sensitive PII (license plates, vehicle images, location data).
Data Classification¶
| Data | PII Level | Backup Priority |
|---|---|---|
| Detection metadata (plates, locations) | High | Critical |
| Vehicle images | High | Critical |
| User accounts | Medium | High |
| Watchlists | Medium | High |
| Collector configurations | Low | Medium |
| Application logs | Low | Low |
Backup Strategy¶
PostgreSQL (pg_dump)¶
| Backup Type | Frequency | Retention | Method |
|---|---|---|---|
| Full dump | Daily | 30 days | pg_dump --format=custom |
| Incremental (WAL) | Continuous | 7 days | WAL archiving |
Full Backup Schedule:
- Time: 3:00 AM (low traffic window)
- Format: Custom format (compressed, parallel restore)
- Naming: alpr_backup_YYYYMMDD_HHMMSS.dump
WAL Archiving: - Enables point-in-time recovery (PITR) - Captures all changes between full backups - Required for minimal data loss (RPO < 1 hour)
MinIO (Images)¶
| Backup Type | Frequency | Retention | Method |
|---|---|---|---|
| Incremental sync | Daily | Per retention policy | mc mirror or replication |
| Full sync | Weekly | 4 weeks | mc mirror --overwrite |
Options (Pending Security Approval):
| Target | Pros | Cons | PII Concern |
|---|---|---|---|
| AWS S3 | Reliable, scalable | Cost, latency | Data leaves premises |
| AWS Glacier | Very cheap long-term | Slow retrieval | Data leaves premises |
| On-premise NAS | Data stays local | Hardware cost, maintenance | None |
| Second MinIO instance | Simple replication | Hardware cost | None |
DECISION REQUIRED: Cloud backup (S3/Glacier) requires security review and approval for PII data storage. See Security Considerations.
Backup Target Host¶
TBD: Target backup host/location not yet determined.
Options under consideration: - Dedicated backup server (on-premise) - NAS device with RAID - Cloud storage (pending security approval) - Off-site co-location
Recovery Objectives¶
| Metric | Target | Notes |
|---|---|---|
| RPO (Recovery Point Objective) | < 1 hour | Max data loss acceptable |
| RTO (Recovery Time Objective) | < 4 hours | Max downtime acceptable |
RPO Breakdown¶
| Component | RPO | Method |
|---|---|---|
| PostgreSQL | < 1 hour | WAL archiving |
| MinIO (images) | < 24 hours | Daily sync |
| Configurations | < 24 hours | Daily backup |
RTO Breakdown¶
| Component | RTO | Restore Method |
|---|---|---|
| PostgreSQL | 1-2 hours | pg_restore from dump + WAL replay |
| MinIO | 2-4 hours | mc mirror from backup |
| Application | 30 minutes | Docker redeploy |
Disaster Scenarios¶
Scenario 1: Database Corruption¶
| Step | Action | Time |
|---|---|---|
| 1 | Stop application | 5 min |
| 2 | Assess corruption extent | 15 min |
| 3 | Restore from latest pg_dump | 30-60 min |
| 4 | Replay WAL to point before corruption | 15-30 min |
| 5 | Verify data integrity | 15 min |
| 6 | Restart application | 5 min |
Total RTO: ~2 hours
Scenario 2: Server Hardware Failure¶
| Step | Action | Time |
|---|---|---|
| 1 | Provision new server | 30-60 min |
| 2 | Install Docker, restore configs | 30 min |
| 3 | Restore PostgreSQL | 60 min |
| 4 | Restore MinIO data | 2-3 hours |
| 5 | Update DNS/networking | 15 min |
| 6 | Verify and resume | 30 min |
Total RTO: ~4-6 hours
Scenario 3: Ransomware/Security Incident¶
| Step | Action | Time |
|---|---|---|
| 1 | Isolate affected systems | Immediate |
| 2 | Assess scope of compromise | 1-4 hours |
| 3 | Provision clean infrastructure | 1-2 hours |
| 4 | Restore from known-good backup | 2-4 hours |
| 5 | Security review before reconnecting | Variable |
Note: Maintain offline/immutable backups for ransomware protection.
Scenario 4: Accidental Data Deletion¶
| Data Type | Recovery Method |
|---|---|
| Single detection | Restore from pg_dump, extract record |
| Bulk detections | Point-in-time recovery via WAL |
| Images | Restore from MinIO backup |
| Watchlist | Restore from pg_dump |
Security Considerations¶
PII in Backups¶
Backups contain sensitive PII: - License plate numbers - Vehicle images - Location/timestamp data - User information
Requirements for any backup target:
| Requirement | On-Premise | Cloud (S3/Glacier) |
|---|---|---|
| Encryption at rest | Required | Required (SSE-S3 or SSE-KMS) |
| Encryption in transit | Required | Required (TLS) |
| Access controls | Local ACLs | IAM policies |
| Audit logging | Local logs | CloudTrail |
| Data residency | N/A | May require specific region |
| Compliance review | Internal | Legal/security approval |
Cloud Storage Decision¶
Before enabling cloud backup (S3/Glacier):
- [ ] Security team review of PII implications
- [ ] Legal review of data residency requirements
- [ ] Compliance check (if applicable regulations)
- [ ] Encryption key management plan
- [ ] Access audit procedures
- [ ] Incident response plan update
Status: Pending security approval
Backup Encryption¶
All backups encrypted regardless of target:
| Component | Encryption Method |
|---|---|
| pg_dump output | GPG symmetric encryption |
| MinIO backup | Server-side encryption (SSE) |
| Config backups | GPG symmetric encryption |
Encryption keys stored separately from backups.
Implementation¶
PostgreSQL Backup Script¶
Scheduled via cron or systemd timer:
Daily at 3:00 AM:
1. pg_dump to local staging directory
2. Compress with gzip
3. Encrypt with GPG
4. Transfer to backup target
5. Verify backup integrity
6. Clean up old backups (>30 days)
7. Log results and alert on failure
MinIO Backup¶
Options:
Option A: mc mirror (simple) - Daily sync to backup target - Incremental (only changed objects) - Suitable for on-premise NAS
Option B: MinIO Replication (robust) - Real-time replication to second MinIO - Automatic failover capable - Higher infrastructure cost
Option C: S3 Sync (pending approval) - Daily sync to AWS S3 - Lifecycle policy to Glacier after 30 days - Lowest long-term cost
Verification¶
| Check | Frequency | Method |
|---|---|---|
| Backup completion | Daily | Automated alerting |
| Backup integrity | Weekly | Checksum verification |
| Restore test | Monthly | Restore to test environment |
| Full DR drill | Quarterly | Complete recovery simulation |
Monitoring and Alerts¶
| Event | Alert Level | Response |
|---|---|---|
| Backup job failed | Critical | Investigate immediately |
| Backup size anomaly (>50% change) | Warning | Review for issues |
| Backup target disk >80% | Warning | Plan capacity |
| No backup in 48 hours | Critical | Investigate immediately |
Open Decisions¶
| Decision | Options | Status |
|---|---|---|
| Backup target host | On-premise NAS, dedicated server, cloud | TBD |
| Cloud backup for images | AWS S3, Glacier, none | Pending security approval |
| Replication vs backup for MinIO | mc mirror, MinIO replication | TBD |
| Off-site backup location | Cloud, co-location, none | TBD |
Decision Date: 2025-12-29 Status: Draft (pending security review for cloud options) Rationale: Protect critical PII data while meeting RPO/RTO targets