Rate Limiting Strategy¶
Problem¶
Traditional rate limiting (reject after X requests/minute) doesn't fit edge-to-central Detection ingestion:
- Edge collectors are authenticated and trusted
- High traffic from busy locations is legitimate, not abuse
- Rejecting events loses evidence (unacceptable for community watch)
- Events would queue up on edge, potentially for days during sustained high traffic
Strategy: Trust + Backpressure + Monitor¶
Principle¶
Accept all legitimate traffic from authenticated collectors. Use backpressure signaling instead of rejection.
Endpoint Categories¶
1. Edge Collector Endpoints (No Rate Limits)¶
These endpoints serve authenticated, trusted edge collectors:
| Endpoint | Rate Limit | Rationale |
|---|---|---|
POST /api/v1/events/batch |
None | All detections are legitimate data |
POST /api/v1/heartbeat |
None | Expected 1/minute per collector |
POST /api/v1/commands/{id}/ack |
None | Response to server commands |
Protection via: API key authentication, backpressure signaling, monitoring
2. User Authentication Endpoints (Strict Limits)¶
Prevent brute force and credential stuffing:
| Endpoint | Rate Limit | Rationale |
|---|---|---|
POST /auth/login |
5/minute per IP | Prevent brute force |
POST /auth/register |
3/minute per IP | Prevent spam accounts |
POST /auth/forgot-password |
3/minute per IP | Prevent email bombing |
On limit exceeded: Return 429 with Retry-After header
3. User API Endpoints (Moderate Limits)¶
Prevent abuse by authenticated users:
| Endpoint | Rate Limit | Rationale |
|---|---|---|
GET /api/v1/search |
60/minute per user | Database-intensive queries |
GET /api/v1/events/* |
120/minute per user | Normal browsing patterns |
POST /api/v1/watchlist/* |
30/minute per user | Prevent bulk creation |
GET /api/v1/export/* |
5/minute per user | Resource-intensive exports |
4. Admin Endpoints (Moderate Limits)¶
| Endpoint | Rate Limit | Rationale |
|---|---|---|
POST /api/v1/admin/* |
30/minute per user | Normal admin operations |
DELETE /api/v1/* |
10/minute per user | Prevent accidental bulk deletes |
Backpressure Signaling¶
When central server is under load, signal edge collectors to slow down without rejecting events.
Response Format¶
Normal response:
Backpressure response:
HTTP 202 Accepted
{
"accepted": 50,
"results": [...],
"backpressure": {
"active": true,
"recommended_delay_seconds": 10,
"reason": "high_load"
}
}
Edge Behavior on Backpressure¶
- Process response normally (events were accepted)
- Wait
recommended_delay_secondsbefore next batch upload - Events continue buffering locally during delay
- Resume normal upload schedule after delay
Backpressure Triggers¶
| Condition | Delay | Reason |
|---|---|---|
| DB write latency > 500ms | 10s | Database under load |
| Detection queue depth > 10,000 | 30s | Processing falling behind |
| MinIO latency > 1s | 15s | Storage under load |
| Server CPU > 80% | 20s | General resource pressure |
Monitoring (Not Blocking)¶
Track traffic patterns without rejecting legitimate events:
Per-Collector Metrics¶
| Metric | Expected | Alert Threshold |
|---|---|---|
| Events/minute | 1-100 | >500 (investigate) |
| Batch size | 10-50 | >100 (check config) |
| Request size | 1-5 MB | >10 MB (log warning) |
| Error rate | <1% | >5% (investigate) |
System-Wide Metrics¶
| Metric | Alert Threshold | Action |
|---|---|---|
| Total events/minute | >5,000 | Consider scaling |
| Backpressure activations/hour | >10 | Investigate load |
| Failed uploads/hour | >100 | Check connectivity |
Anomaly Detection¶
Flag unusual patterns for review (but don't auto-block):
- Collector suddenly 10x normal volume
- Events from deactivated collector
- Unusual time patterns (spike at 3am)
- Geographic anomalies (collector in NY sending events tagged LA)
Implementation Notes¶
Rate Limiter Configuration¶
Use sliding window algorithm for smoother limiting:
| Algorithm | Use Case |
|---|---|
| Sliding window | User API endpoints (smoother) |
| Token bucket | Auth endpoints (burst tolerance) |
Headers¶
Always return rate limit headers on limited endpoints:
On 429 response:
Storage¶
- POC: In-memory (resets on restart, acceptable)
- Production: Redis (shared across instances)
What This Strategy Does NOT Do¶
- Does not reject legitimate high-volume traffic from edge collectors
- Does not require edge collectors to pre-register expected volume
- Does not create artificial caps that lose evidence data
- Does not treat authenticated collectors as untrusted
Summary¶
| Traffic Source | Strategy |
|---|---|
| Edge collectors (authenticated) | Accept all, backpressure signal, monitor |
| User login attempts | Strict rate limit (5/min) |
| User API calls | Moderate rate limit (60-120/min) |
| Admin operations | Moderate rate limit (30/min) |
Decision Date: 2025-12-29 Status: Approved Rationale: Traditional rate limiting would reject legitimate evidence data; backpressure preserves all events while managing server load