Rate Limiting Strategy¶

Problem¶

Traditional rate limiting (reject after X requests/minute) doesn't fit edge-to-central Detection ingestion:

Edge collectors are authenticated and trusted
High traffic from busy locations is legitimate, not abuse
Rejecting events loses evidence (unacceptable for community watch)
Events would queue up on edge, potentially for days during sustained high traffic

Strategy: Trust + Backpressure + Monitor¶

Principle¶

Accept all legitimate traffic from authenticated collectors. Use backpressure signaling instead of rejection.

Endpoint Categories¶

1. Edge Collector Endpoints (No Rate Limits)¶

These endpoints serve authenticated, trusted edge collectors:

Endpoint	Rate Limit	Rationale
`POST /api/v1/events/batch`	None	All detections are legitimate data
`POST /api/v1/heartbeat`	None	Expected 1/minute per collector
`POST /api/v1/commands/{id}/ack`	None	Response to server commands

Protection via: API key authentication, backpressure signaling, monitoring

2. User Authentication Endpoints (Strict Limits)¶

Prevent brute force and credential stuffing:

Endpoint	Rate Limit	Rationale
`POST /auth/login`	5/minute per IP	Prevent brute force
`POST /auth/register`	3/minute per IP	Prevent spam accounts
`POST /auth/forgot-password`	3/minute per IP	Prevent email bombing

On limit exceeded: Return 429 with Retry-After header

3. User API Endpoints (Moderate Limits)¶

Prevent abuse by authenticated users:

Endpoint	Rate Limit	Rationale
`GET /api/v1/search`	60/minute per user	Database-intensive queries
`GET /api/v1/events/*`	120/minute per user	Normal browsing patterns
`POST /api/v1/watchlist/*`	30/minute per user	Prevent bulk creation
`GET /api/v1/export/*`	5/minute per user	Resource-intensive exports

4. Admin Endpoints (Moderate Limits)¶

Endpoint	Rate Limit	Rationale
`POST /api/v1/admin/*`	30/minute per user	Normal admin operations
`DELETE /api/v1/*`	10/minute per user	Prevent accidental bulk deletes

Backpressure Signaling¶

When central server is under load, signal edge collectors to slow down without rejecting events.

Response Format¶

Normal response:

HTTP 201 Created
{
  "accepted": 50,
  "results": [...]
}

Backpressure response:

HTTP 202 Accepted
{
  "accepted": 50,
  "results": [...],
  "backpressure": {
    "active": true,
    "recommended_delay_seconds": 10,
    "reason": "high_load"
  }
}

Edge Behavior on Backpressure¶

Process response normally (events were accepted)
Wait recommended_delay_seconds before next batch upload
Events continue buffering locally during delay
Resume normal upload schedule after delay

Backpressure Triggers¶

Condition	Delay	Reason
DB write latency > 500ms	10s	Database under load
Detection queue depth > 10,000	30s	Processing falling behind
MinIO latency > 1s	15s	Storage under load
Server CPU > 80%	20s	General resource pressure

Monitoring (Not Blocking)¶

Track traffic patterns without rejecting legitimate events:

Per-Collector Metrics¶

Metric	Expected	Alert Threshold
Events/minute	1-100	>500 (investigate)
Batch size	10-50	>100 (check config)
Request size	1-5 MB	>10 MB (log warning)
Error rate	<1%	>5% (investigate)

System-Wide Metrics¶

Metric	Alert Threshold	Action
Total events/minute	>5,000	Consider scaling
Backpressure activations/hour	>10	Investigate load
Failed uploads/hour	>100	Check connectivity

Anomaly Detection¶

Flag unusual patterns for review (but don't auto-block):

Collector suddenly 10x normal volume
Events from deactivated collector
Unusual time patterns (spike at 3am)
Geographic anomalies (collector in NY sending events tagged LA)

Implementation Notes¶

Rate Limiter Configuration¶

Use sliding window algorithm for smoother limiting:

Algorithm	Use Case
Sliding window	User API endpoints (smoother)
Token bucket	Auth endpoints (burst tolerance)

Headers¶

Always return rate limit headers on limited endpoints:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704067200

On 429 response:

Retry-After: 30

Storage¶

POC: In-memory (resets on restart, acceptable)
Production: Redis (shared across instances)

What This Strategy Does NOT Do¶

Does not reject legitimate high-volume traffic from edge collectors
Does not require edge collectors to pre-register expected volume
Does not create artificial caps that lose evidence data
Does not treat authenticated collectors as untrusted

Summary¶

Traffic Source	Strategy
Edge collectors (authenticated)	Accept all, backpressure signal, monitor
User login attempts	Strict rate limit (5/min)
User API calls	Moderate rate limit (60-120/min)
Admin operations	Moderate rate limit (30/min)

Decision Date: 2025-12-29 Status: Approved Rationale: Traditional rate limiting would reject legitimate evidence data; backpressure preserves all events while managing server load