Skip to content

Rate Limiting Strategy

Problem

Traditional rate limiting (reject after X requests/minute) doesn't fit edge-to-central Detection ingestion:

  • Edge collectors are authenticated and trusted
  • High traffic from busy locations is legitimate, not abuse
  • Rejecting events loses evidence (unacceptable for community watch)
  • Events would queue up on edge, potentially for days during sustained high traffic

Strategy: Trust + Backpressure + Monitor

Principle

Accept all legitimate traffic from authenticated collectors. Use backpressure signaling instead of rejection.

Endpoint Categories

1. Edge Collector Endpoints (No Rate Limits)

These endpoints serve authenticated, trusted edge collectors:

Endpoint Rate Limit Rationale
POST /api/v1/events/batch None All detections are legitimate data
POST /api/v1/heartbeat None Expected 1/minute per collector
POST /api/v1/commands/{id}/ack None Response to server commands

Protection via: API key authentication, backpressure signaling, monitoring

2. User Authentication Endpoints (Strict Limits)

Prevent brute force and credential stuffing:

Endpoint Rate Limit Rationale
POST /auth/login 5/minute per IP Prevent brute force
POST /auth/register 3/minute per IP Prevent spam accounts
POST /auth/forgot-password 3/minute per IP Prevent email bombing

On limit exceeded: Return 429 with Retry-After header

3. User API Endpoints (Moderate Limits)

Prevent abuse by authenticated users:

Endpoint Rate Limit Rationale
GET /api/v1/search 60/minute per user Database-intensive queries
GET /api/v1/events/* 120/minute per user Normal browsing patterns
POST /api/v1/watchlist/* 30/minute per user Prevent bulk creation
GET /api/v1/export/* 5/minute per user Resource-intensive exports

4. Admin Endpoints (Moderate Limits)

Endpoint Rate Limit Rationale
POST /api/v1/admin/* 30/minute per user Normal admin operations
DELETE /api/v1/* 10/minute per user Prevent accidental bulk deletes

Backpressure Signaling

When central server is under load, signal edge collectors to slow down without rejecting events.

Response Format

Normal response:

HTTP 201 Created
{
  "accepted": 50,
  "results": [...]
}

Backpressure response:

HTTP 202 Accepted
{
  "accepted": 50,
  "results": [...],
  "backpressure": {
    "active": true,
    "recommended_delay_seconds": 10,
    "reason": "high_load"
  }
}

Edge Behavior on Backpressure

  1. Process response normally (events were accepted)
  2. Wait recommended_delay_seconds before next batch upload
  3. Events continue buffering locally during delay
  4. Resume normal upload schedule after delay

Backpressure Triggers

Condition Delay Reason
DB write latency > 500ms 10s Database under load
Detection queue depth > 10,000 30s Processing falling behind
MinIO latency > 1s 15s Storage under load
Server CPU > 80% 20s General resource pressure

Monitoring (Not Blocking)

Track traffic patterns without rejecting legitimate events:

Per-Collector Metrics

Metric Expected Alert Threshold
Events/minute 1-100 >500 (investigate)
Batch size 10-50 >100 (check config)
Request size 1-5 MB >10 MB (log warning)
Error rate <1% >5% (investigate)

System-Wide Metrics

Metric Alert Threshold Action
Total events/minute >5,000 Consider scaling
Backpressure activations/hour >10 Investigate load
Failed uploads/hour >100 Check connectivity

Anomaly Detection

Flag unusual patterns for review (but don't auto-block):

  • Collector suddenly 10x normal volume
  • Events from deactivated collector
  • Unusual time patterns (spike at 3am)
  • Geographic anomalies (collector in NY sending events tagged LA)

Implementation Notes

Rate Limiter Configuration

Use sliding window algorithm for smoother limiting:

Algorithm Use Case
Sliding window User API endpoints (smoother)
Token bucket Auth endpoints (burst tolerance)

Headers

Always return rate limit headers on limited endpoints:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704067200

On 429 response:

Retry-After: 30

Storage

  • POC: In-memory (resets on restart, acceptable)
  • Production: Redis (shared across instances)

What This Strategy Does NOT Do

  • Does not reject legitimate high-volume traffic from edge collectors
  • Does not require edge collectors to pre-register expected volume
  • Does not create artificial caps that lose evidence data
  • Does not treat authenticated collectors as untrusted

Summary

Traffic Source Strategy
Edge collectors (authenticated) Accept all, backpressure signal, monitor
User login attempts Strict rate limit (5/min)
User API calls Moderate rate limit (60-120/min)
Admin operations Moderate rate limit (30/min)

Decision Date: 2025-12-29 Status: Approved Rationale: Traditional rate limiting would reject legitimate evidence data; backpressure preserves all events while managing server load