Production Hardening for Government Web Services
Real Metrics from Securing a Live AI-Powered Public Website
Core Thesis: A $0-budget, 6-phase hardening pipeline transformed an unprotected demo site into a production-ready government web service โ with 50+ verified metrics proving every claim.
February 2026 ยท vanderdev.net ยท Innovation Fellowship
What Is Production Hardening?
Taking a working website and making it safe for real users โ preventing crashes, blocking abuse, adding monitoring, and proving it all works under load.
๐ก
Think of it like getting a building up to code before opening day. The building works โ the lights turn on, doors open โ but it hasn't been inspected for fire exits, sprinklers, emergency lighting, or maximum occupancy.
Production hardening is the inspection and retrofit process for web servers.
It answers three questions:
- Can it survive abuse? (rate limiting, memory caps, auth)
- Will we know when something breaks? (alerts, monitors, health checks)
- Does it perform under real load? (caching, CDN, load testing)
The Problem โ What Was at Risk
Our site (vanderdev.net) hosts three AI chatbots, a workflow engine, a database, and monitoring tools โ all on one server. Before hardening:
โ ๏ธ
The site was functional but unprotected. Any of these could happen during a live demo:
Anyone could crash the server by exhausting memory โ programs had no safety limits. When a computer's memory (RAM) fills up, the operating system starts force-killing running programs (OOM kill โ "out of memory kill"). There was no overflow space (swap โ emergency disk-based memory) either.
Anyone could run up AI costs by spamming the chatbots โ no limits on how many requests one person could make per second (rate limiting).
No one would know if something broke โ no automatic alerts, no health checks, no uptime monitoring.
No protection against flood attacks โ a single attacker could knock the site offline by overwhelming it with fake traffic (DDoS โ distributed denial-of-service).
The Architecture
vanderdev.net runs 28 programs (called containers โ isolated mini-servers, each running one job) on a single VPS with 4 vCPU and 16 GB RAM:
๐ค
3 AI Chatbots
WaterBot, BizBot, KiddoBot
โ๏ธ
Workflow Engine
n8n โ like Zapier, self-hosted
๐๏ธ
Database + Auth
Supabase (PostgreSQL)
๐
Monitoring Stack
Prometheus + Grafana
๐
Web Server
nginx reverse proxy
๐ก๏ธ
Edge Protection
Cloudflare CDN + DNS
VPS = Virtual Private Server โ a rented cloud computer. Docker = the platform that runs all 28 containers.
The 6-Phase Approach
We tackled hardening in order โ security first, then visibility, then speed, then edge protection, then tested everything, then documented it all.
1
Lock It Down
Phase 1: Add memory limits, traffic caps, and authentication to block abuse
2
Watch It
Phase 2: Add alerts, health checks, and monitors so we know the instant something breaks
3
Speed It Up
Phase 3: Cache repeat AI answers, reduce background traffic, add security headers
4
Shield It
Phase 4: Put Cloudflare in front of the server โ absorbing attacks and caching files globally
5
Prove It
Phase 5: Simulate 50 simultaneous users and verify every security measure holds
6
Teach It
Phase 6: Compile real metrics into this training deck for the Fellowship
Phase 1: Lock It Down
Goal: Stop crashes and block abuse before anything else.
Swap (emergency overflow memory on disk) gives the OS a safety net when RAM fills up, instead of killing programs. Container limits cap each program so one runaway can't consume everything. Rate limiting (speed limits for web traffic) prevents one user from flooding the system. Webhook auth (password-protecting the AI chatbot endpoints) blocks unauthorized callers. fail2ban (auto-blocking repeat offenders) watches logs and bans bad IPs.
- Memory tiers: Heavy (2 GB) for n8n and the database. Medium (512 MB) for 7 services. Light (256 MB) for 15 services. Minimal (128 MB) for the SSL certificate manager.
- Rate limiting zones: Bot webhooks (URLs that trigger AI workflows) at 10 req/s, API at 2 req/s, general traffic at 30 req/s.
- fail2ban: Upgraded from 1 jail (SSH only) to 4 jails โ added nginx auth failures, bot scanners, and rate limit violators. Uses polling backend because Docker container logs aren't visible to the standard system journal.
Phase 2: Watch It
Goal: If something breaks at 2 AM, know before users report it.
Prometheus (a metrics collection system โ like a health monitor for servers) already collected data, but never acted on it. We added alert rules that trigger automatic push notifications when thresholds are crossed. Uptime Kuma sends synthetic test requests every 60 seconds to verify each service is actually responding.
- 16 alert rules across 4 categories: host health (CPU, memory, disk), service availability (container down), dead man's switch (proves monitoring itself is running), and platform compatibility.
- Health endpoint at
/health โ a cron job writes a status file every 30 seconds. This works even if the main API server is down, because it's just a static file served by nginx.
- 11 monitors: 4 HTTP (website, API, webhooks, health), 4 TCP (database, Prometheus, Grafana, n8n), 3 Docker container checks.
- Notification path: Prometheus โ Alertmanager โ ntfy.sh (a free push notification service) โ phone.
Phase 3: Speed It Up
Goal: Stop paying the AI to answer the same question twice.
When two users ask the same question, the AI used to generate a fresh (and expensive) response every time. Redis (a fast in-memory cache โ like keeping the answer sheet next to the phone) stores recent answers for 10 minutes. A cache "hit" returns the stored answer in milliseconds instead of waiting seconds for the AI.
129x
Faster Cached Responses
6x
Fewer Background Requests
- Dashboard poll optimization: The frontend used to ask "anything new?" every 5 seconds (1,200 requests/minute at 50 users). Now every 30 seconds (200 req/min) โ with browser-level caching on the response.
- Security headers (CSP, CORS, and more) โ browser-level rules that prevent unauthorized scripts and cross-site attacks. Frontend scores A+ (7 of 7 headers), API scores B (4 of 6 โ CSP intentionally excluded for JSON APIs).
- CORS (Cross-Origin Resource Sharing โ browser rules controlling which websites can talk to your API): Changed from wildcard (any site) to restricted (only vanderdev.net).
Phase 4: Shield It
Goal: Put a global shield between the internet and our server.
Cloudflare (a content delivery network, or CDN โ a global network that sits in front of your server) absorbs attack traffic and caches files at edge servers worldwide. We migrated DNS (the "phone book" that translates domain names to server addresses) from Hostinger to Cloudflare.
TLS 1.3
Encryption Standard
1 CA
Certificate Authority
- SSL/TLS (encryption that makes
https:// work โ the lock icon) upgraded to Full Strict mode: encrypted end-to-end, with TLS 1.3 as the primary protocol and TLS 1.1 rejected entirely.
- CAA records (DNS entries that control which companies can issue security certificates for your domain): Reduced from 12 authorized certificate authorities to 1 โ only Let's Encrypt.
- nginx real_ip configured BEFORE enabling Cloudflare proxy โ this ensures rate limiting sees the real visitor's IP address, not Cloudflare's. Order matters.
- 4 domains proxied through Cloudflare (vanderdev.net, www, api, n8n), 5 DNS-only (grafana, portainer, telemetry, pgadmin, studio โ internal tools accessed via Tailscale).
Phase 5: Prove It
Goal: Simulate real load and verify every security measure.
We used k6 (a load testing tool that simulates many users at once) to hit the site with 50 concurrent virtual users, plus targeted flood tests against rate-limited endpoints.
48.4ms
Response Time (p95)
44%
Requests Shed by Rate Limiter
- p95 latency (95% of requests finish faster than this time): 48.4ms โ meaning the server responds in under 50 milliseconds for the vast majority of traffic, even under load.
- 44% error rate is a PASS โ those aren't failures, they're the rate limiter correctly blocking excess traffic. The server was protecting itself by returning "429 Too Many Requests" to requests that exceeded the speed limit.
- Latency under load only increased 4-9ms compared to baseline (30-35ms idle) โ negligible degradation at 50 concurrent users.
Before/After: Security Posture
Before Hardening
- No traffic speed limits โ any user could flood the server
- AI chatbot endpoints completely open to the public
- 1 auto-block rule (SSH only โ nothing for web traffic)
- 3 security headers (minimal browser protection)
- Basic encryption (TLS 1.2+, no enforcement)
- Any website could call our API (wildcard CORS)
- Basic DNS with 12 authorized certificate authorities
- No DDoS protection
After Hardening
- 3 rate limiting zones (10/2/30 requests per second)
- Webhook endpoints password-protected at the nginx level
- 4 auto-block rules (SSH + auth + bot scanners + rate limit)
- 7 security headers โ A+ grade on frontend
- TLS 1.3 primary, TLS 1.1 rejected, Full Strict mode
- Only vanderdev.net can call the API (restricted CORS)
- Cloudflare DNS with 1 authorized CA (Let's Encrypt only)
- Cloudflare DDoS protection active
Before/After: Performance
Before
- Every repeat question costs AI processing time (4-10 seconds)
- Dashboard polls every 5 seconds (1,200 req/min at scale)
- No browser caching on API responses
- No CDN โ all traffic hits origin server directly
- No CORS preflight cache
After
- Cached answers return in 80-115ms via Redis (free)
- Dashboard polls every 30 seconds (200 req/min at scale)
- 15-second browser cache on API responses
- Cloudflare global edge caching configured
- 24-hour CORS preflight cache eliminates redundant checks
Before/After: Monitoring
Before
- Metrics collected by Prometheus but never acted on
- Zero notifications when things break
- No health check endpoint
- No synthetic uptime testing
- No dead man's switch
After
- 16 alert rules across 4 categories trigger push notifications
- Phone notifications via ntfy.sh (2 topics: infra + uptime)
- /health endpoint updated every 30 seconds (works even if API is down)
- 11 synthetic monitors (HTTP, TCP, Docker container checks)
- Cron-based dead man's switch (proves monitoring is still running)
Deep Dive: Rate Limiting in Action
We flooded the chatbot webhook endpoint with 30 simultaneous requests. The rate limiter (speed limit for web traffic) did exactly what it should โ let legitimate traffic through and blocked the excess with 429 "Too Many Requests" responses.
Webhook flood test (30 parallel requests):
Webhooks
21 passed ยท 9 blocked (429)
API flood test (15 serial requests):
API
9 passed ยท 6 blocked (429)
The rate limiter uses a "nodelay burst" strategy โ it processes the initial burst of requests immediately (no queueing delay), then hard-blocks anything over the limit. This means legitimate users get instant responses while abuse is stopped cold.
Auth verification also passed: no token โ 403 (blocked), wrong token โ 403 (blocked), valid token โ 200 (success).
Deep Dive: AI Response Caching
When two users ask the same question, why pay for the AI to answer twice? Redis (fast in-memory database used as a cache) stores answers for 10 minutes using DJB2 hash keys.
| Bot | Without Cache | With Cache | Speedup |
| WaterBot | 9.7 seconds | 95ms | 103x faster |
| BizBot | 4.5 seconds | 115ms | 39x faster |
| KiddoBot | 10.3 seconds | 80ms | 128x faster |
| Permit Finder | 10.2 seconds | 79ms | 129x faster |
โน๏ธ
How it works: Each chatbot workflow has 8 Redis nodes โ check cache first, skip AI if found, store new answers for 10 minutes. Cache keys use a DJB2 hash (a fast, simple hashing algorithm) instead of MD5 because n8n's sandboxed code environment blocks external modules.
Key Decisions and Tradeoffs
Every security decision is a tradeoff between protection, cost, and usability. Here are the ones that shaped this project:
Client-Side Auth Token
The chatbot's webhook token lives in a JavaScript config file โ visible in browser dev tools. It's a speed bump (stops casual abuse), not vault-level security. Acceptable for a demo site with no sensitive data.
Free Notification Service
Using ntfy.sh's public server โ the topic name is the only "security." Fine for a demo VPS. Production government systems should self-host the notification server.
$0 Total Cost
Every tool in this pipeline is free-tier: Cloudflare Free, Let's Encrypt, Redis (self-hosted), Prometheus, Grafana, Uptime Kuma, ntfy.sh, fail2ban, k6. Zero procurement needed.
Nodelay Burst Strategy
Rate limiter processes the initial burst immediately (no queue delay for real users), then hard-blocks excess. Prioritizes user experience over strict throttling.
Lessons Learned
Gotchas for anyone doing this themselves โ each one cost us debugging time:
Docker logs aren't in the system journal
Fix: fail2ban needs a polling backend and explicit log volume mounts to read container logs. The default systemd backend can't see Docker output.
SSH + special characters = broken configs
Fix: Sending configuration files over SSH? YAML, Prometheus rules, and nginx configs contain characters that break shell escaping. Use base64 encoding to pipe files safely.
Real IP must be configured BEFORE Cloudflare proxy
Fix: nginx must be told to trust Cloudflare's IP ranges and extract the real visitor IP from the CF-Connecting-IP header. If you enable proxy first, rate limiting sees Cloudflare's IP โ not the user's โ and blocks everyone.
n8n sandbox blocks external modules
Fix: n8n's Task Runner sandbox blocks process.env and require(). Use n8n's built-in Redis nodes instead of Code nodes with ioredis. Use DJB2 hash instead of crypto.createHash().
Build Metrics โ How Long Did This Take?
โน๏ธ
Average: 13 minutes per plan. The entire hardening project โ from first swap file to final load test โ took under 3 hours of active work using Claude Code. Wall-clock time was longer due to breaks and decisions between phases, but the actual build time was 157 minutes across 13 plans.
| Phase |
Plans |
Time |
Avg/Plan |
| 1. Lock It Down |
3 |
27 min |
9 min |
| 2. Watch It |
3 |
26 min |
9 min |
| 3. Speed It Up |
2 |
56 min |
28 min |
| 4. Shield It |
2 |
29 min |
15 min |
| 5. Prove It |
2 |
19 min |
10 min |
| 6. Teach It |
1 |
This deck |
โ |
What's Next
This hardening covers 10-50 concurrent unauthenticated users โ the demo-day tier. Scaling beyond that requires:
- Streaming chatbot responses โ real-time typing effect so users see answers as they're generated, not after a multi-second wait
- Queue-based processing โ at 100+ concurrent users, webhook requests should be queued and processed in order, not all at once
- Self-hosted notification server โ production government systems need ntfy on their own infrastructure, not a public service
- Fix Supabase health checks โ 4 containers report "unhealthy" because their Docker images lack the
curl binary used by health check commands (services work fine โ it's cosmetic)
๐ก
The hardening approach scales. The same 6-phase pipeline โ lock down, observe, optimize, shield, validate, document โ applies to any web service. The specific tools change; the methodology doesn't.
Resources
From this project:
Tools used (all free tier):
Questions? This deck is part of the Governor's Innovation Fellowship training library.