Production Hardening for Government Web Services

Real Metrics from Securing a Live AI-Powered Public Website

Core Thesis: A $0-budget, 6-phase hardening pipeline transformed an unprotected demo site into a production-ready government web service — with 50+ verified metrics proving every claim.

February 2026 · vanderdev.net · Innovation Fellowship

What Is Production Hardening?

Taking a working website and making it safe for real users — preventing crashes, blocking abuse, adding monitoring, and proving it all works under load.

💡

Think of it like getting a building up to code before opening day. The building works — the lights turn on, doors open — but it hasn't been inspected for fire exits, sprinklers, emergency lighting, or maximum occupancy.

Production hardening is the inspection and retrofit process for web servers.

It answers three questions:

Can it survive abuse? (rate limiting, memory caps, auth)
Will we know when something breaks? (alerts, monitors, health checks)
Does it perform under real load? (caching, CDN, load testing)

The Problem — What Was at Risk

Our site (vanderdev.net) hosts three AI chatbots, a workflow engine, a database, and monitoring tools — all on one server. Before hardening:

⚠️

The site was functional but unprotected. Any of these could happen during a live demo:

Anyone could crash the server by exhausting memory — programs had no safety limits. When a computer's memory (RAM) fills up, the operating system starts force-killing running programs (OOM kill — "out of memory kill"). There was no overflow space (swap — emergency disk-based memory) either.
Anyone could run up AI costs by spamming the chatbots — no limits on how many requests one person could make per second (rate limiting).
No one would know if something broke — no automatic alerts, no health checks, no uptime monitoring.
No protection against flood attacks — a single attacker could knock the site offline by overwhelming it with fake traffic (DDoS — distributed denial-of-service).

The Architecture

vanderdev.net runs 28 programs (called containers — isolated mini-servers, each running one job) on a single VPS with 4 vCPU and 16 GB RAM:

🤖

3 AI Chatbots

WaterBot, BizBot, KiddoBot

⚙️

Workflow Engine

n8n — like Zapier, self-hosted

🗄️

Database + Auth

Supabase (PostgreSQL)

📊

Monitoring Stack

Prometheus + Grafana

🌐

Web Server

nginx reverse proxy

🛡️

Edge Protection

Cloudflare CDN + DNS

VPS = Virtual Private Server — a rented cloud computer. Docker = the platform that runs all 28 containers.

The 6-Phase Approach

We tackled hardening in order — security first, then visibility, then speed, then edge protection, then tested everything, then documented it all.

Lock It Down

Phase 1: Add memory limits, traffic caps, and authentication to block abuse

Watch It

Phase 2: Add alerts, health checks, and monitors so we know the instant something breaks

Speed It Up

Phase 3: Cache repeat AI answers, reduce background traffic, add security headers

Shield It

Phase 4: Put Cloudflare in front of the server — absorbing attacks and caching files globally

Prove It

Phase 5: Simulate 50 simultaneous users and verify every security measure holds

Teach It

Phase 6: Compile real metrics into this training deck for the Fellowship

Phase 1: Lock It Down

Goal: Stop crashes and block abuse before anything else.

Swap (emergency overflow memory on disk) gives the OS a safety net when RAM fills up, instead of killing programs. Container limits cap each program so one runaway can't consume everything. Rate limiting (speed limits for web traffic) prevents one user from flooding the system. Webhook auth (password-protecting the AI chatbot endpoints) blocks unauthorized callers. fail2ban (auto-blocking repeat offenders) watches logs and bans bad IPs.

4 GB

Swap Safety Net

25 of 25

Programs Capped

Traffic Speed Limits

Auto-Block Rules

Memory tiers: Heavy (2 GB) for n8n and the database. Medium (512 MB) for 7 services. Light (256 MB) for 15 services. Minimal (128 MB) for the SSL certificate manager.
Rate limiting zones: Bot webhooks (URLs that trigger AI workflows) at 10 req/s, API at 2 req/s, general traffic at 30 req/s.
fail2ban: Upgraded from 1 jail (SSH only) to 4 jails — added nginx auth failures, bot scanners, and rate limit violators. Uses polling backend because Docker container logs aren't visible to the standard system journal.

Phase 2: Watch It

Goal: If something breaks at 2 AM, know before users report it.

Prometheus (a metrics collection system — like a health monitor for servers) already collected data, but never acted on it. We added alert rules that trigger automatic push notifications when thresholds are crossed. Uptime Kuma sends synthetic test requests every 60 seconds to verify each service is actually responding.

Alert Rules

Uptime Monitors

30s

Health Check Interval

Notification Channels

16 alert rules across 4 categories: host health (CPU, memory, disk), service availability (container down), dead man's switch (proves monitoring itself is running), and platform compatibility.
Health endpoint at /health — a cron job writes a status file every 30 seconds. This works even if the main API server is down, because it's just a static file served by nginx.
11 monitors: 4 HTTP (website, API, webhooks, health), 4 TCP (database, Prometheus, Grafana, n8n), 3 Docker container checks.
Notification path: Prometheus → Alertmanager → ntfy.sh (a free push notification service) → phone.

Phase 3: Speed It Up

Goal: Stop paying the AI to answer the same question twice.

When two users ask the same question, the AI used to generate a fresh (and expensive) response every time. Redis (a fast in-memory cache — like keeping the answer sheet next to the phone) stores recent answers for 10 minutes. A cache "hit" returns the stored answer in milliseconds instead of waiting seconds for the AI.

129x

Faster Cached Responses

Fewer Background Requests

A+

Security Header Grade

24h

Browser CORS Cache

Dashboard poll optimization: The frontend used to ask "anything new?" every 5 seconds (1,200 requests/minute at 50 users). Now every 30 seconds (200 req/min) — with browser-level caching on the response.
Security headers (CSP, CORS, and more) — browser-level rules that prevent unauthorized scripts and cross-site attacks. Frontend scores A+ (7 of 7 headers), API scores B (4 of 6 — CSP intentionally excluded for JSON APIs).
CORS (Cross-Origin Resource Sharing — browser rules controlling which websites can talk to your API): Changed from wildcard (any site) to restricted (only vanderdev.net).

Phase 4: Shield It

Goal: Put a global shield between the internet and our server.

Cloudflare (a content delivery network, or CDN — a global network that sits in front of your server) absorbs attack traffic and caches files at edge servers worldwide. We migrated DNS (the "phone book" that translates domain names to server addresses) from Hostinger to Cloudflare.

TLS 1.3

Encryption Standard

1 CA

Certificate Authority

Domains Shielded

15+7

IP Ranges Configured

SSL/TLS (encryption that makes https:// work — the lock icon) upgraded to Full Strict mode: encrypted end-to-end, with TLS 1.3 as the primary protocol and TLS 1.1 rejected entirely.
CAA records (DNS entries that control which companies can issue security certificates for your domain): Reduced from 12 authorized certificate authorities to 1 — only Let's Encrypt.
nginx real_ip configured BEFORE enabling Cloudflare proxy — this ensures rate limiting sees the real visitor's IP address, not Cloudflare's. Order matters.
4 domains proxied through Cloudflare (vanderdev.net, www, api, n8n), 5 DNS-only (grafana, portainer, telemetry, pgadmin, studio — internal tools accessed via Tailscale).

Phase 5: Prove It

Goal: Simulate real load and verify every security measure.

We used k6 (a load testing tool that simulates many users at once) to hit the site with 50 concurrent virtual users, plus targeted flood tests against rate-limited endpoints.

6,512

Test Requests Sent

48.4ms

Response Time (p95)

A+

Header Score

44%

Requests Shed by Rate Limiter

p95 latency (95% of requests finish faster than this time): 48.4ms — meaning the server responds in under 50 milliseconds for the vast majority of traffic, even under load.
44% error rate is a PASS — those aren't failures, they're the rate limiter correctly blocking excess traffic. The server was protecting itself by returning "429 Too Many Requests" to requests that exceeded the speed limit.
Latency under load only increased 4-9ms compared to baseline (30-35ms idle) — negligible degradation at 50 concurrent users.

Before/After: Security Posture

Before Hardening

No traffic speed limits — any user could flood the server
AI chatbot endpoints completely open to the public
1 auto-block rule (SSH only — nothing for web traffic)
3 security headers (minimal browser protection)
Basic encryption (TLS 1.2+, no enforcement)
Any website could call our API (wildcard CORS)
Basic DNS with 12 authorized certificate authorities
No DDoS protection

After Hardening

3 rate limiting zones (10/2/30 requests per second)
Webhook endpoints password-protected at the nginx level
4 auto-block rules (SSH + auth + bot scanners + rate limit)
7 security headers — A+ grade on frontend
TLS 1.3 primary, TLS 1.1 rejected, Full Strict mode
Only vanderdev.net can call the API (restricted CORS)
Cloudflare DNS with 1 authorized CA (Let's Encrypt only)
Cloudflare DDoS protection active

Before/After: Performance

Before

Every repeat question costs AI processing time (4-10 seconds)
Dashboard polls every 5 seconds (1,200 req/min at scale)
No browser caching on API responses
No CDN — all traffic hits origin server directly
No CORS preflight cache

After

Cached answers return in 80-115ms via Redis (free)
Dashboard polls every 30 seconds (200 req/min at scale)
15-second browser cache on API responses
Cloudflare global edge caching configured
24-hour CORS preflight cache eliminates redundant checks

Before/After: Monitoring

Before

Metrics collected by Prometheus but never acted on
Zero notifications when things break
No health check endpoint
No synthetic uptime testing
No dead man's switch

After

16 alert rules across 4 categories trigger push notifications
Phone notifications via ntfy.sh (2 topics: infra + uptime)
/health endpoint updated every 30 seconds (works even if API is down)
11 synthetic monitors (HTTP, TCP, Docker container checks)
Cron-based dead man's switch (proves monitoring is still running)

Deep Dive: Rate Limiting in Action

We flooded the chatbot webhook endpoint with 30 simultaneous requests. The rate limiter (speed limit for web traffic) did exactly what it should — let legitimate traffic through and blocked the excess with 429 "Too Many Requests" responses.

Webhook flood test (30 parallel requests):

Webhooks

21 passed · 9 blocked (429)

API flood test (15 serial requests):

API

9 passed · 6 blocked (429)

The rate limiter uses a "nodelay burst" strategy — it processes the initial burst of requests immediately (no queueing delay), then hard-blocks anything over the limit. This means legitimate users get instant responses while abuse is stopped cold.

Auth verification also passed: no token → 403 (blocked), wrong token → 403 (blocked), valid token → 200 (success).

Deep Dive: AI Response Caching

When two users ask the same question, why pay for the AI to answer twice? Redis (fast in-memory database used as a cache) stores answers for 10 minutes using DJB2 hash keys.

Bot	Without Cache	With Cache	Speedup
WaterBot	9.7 seconds	95ms	103x faster
BizBot	4.5 seconds	115ms	39x faster
KiddoBot	10.3 seconds	80ms	128x faster
Permit Finder	10.2 seconds	79ms	129x faster

ℹ️

How it works: Each chatbot workflow has 8 Redis nodes — check cache first, skip AI if found, store new answers for 10 minutes. Cache keys use a DJB2 hash (a fast, simple hashing algorithm) instead of MD5 because n8n's sandboxed code environment blocks external modules.

Key Decisions and Tradeoffs

Every security decision is a tradeoff between protection, cost, and usability. Here are the ones that shaped this project:

Client-Side Auth Token

The chatbot's webhook token lives in a JavaScript config file — visible in browser dev tools. It's a speed bump (stops casual abuse), not vault-level security. Acceptable for a demo site with no sensitive data.

Free Notification Service

Using ntfy.sh's public server — the topic name is the only "security." Fine for a demo VPS. Production government systems should self-host the notification server.

$0 Total Cost

Every tool in this pipeline is free-tier: Cloudflare Free, Let's Encrypt, Redis (self-hosted), Prometheus, Grafana, Uptime Kuma, ntfy.sh, fail2ban, k6. Zero procurement needed.

Nodelay Burst Strategy

Rate limiter processes the initial burst immediately (no queue delay for real users), then hard-blocks excess. Prioritizes user experience over strict throttling.

Lessons Learned

Gotchas for anyone doing this themselves — each one cost us debugging time:

Docker logs aren't in the system journal

Fix: fail2ban needs a polling backend and explicit log volume mounts to read container logs. The default systemd backend can't see Docker output.

SSH + special characters = broken configs

Fix: Sending configuration files over SSH? YAML, Prometheus rules, and nginx configs contain characters that break shell escaping. Use base64 encoding to pipe files safely.

Real IP must be configured BEFORE Cloudflare proxy

Fix: nginx must be told to trust Cloudflare's IP ranges and extract the real visitor IP from the CF-Connecting-IP header. If you enable proxy first, rate limiting sees Cloudflare's IP — not the user's — and blocks everyone.

n8n sandbox blocks external modules

Fix: n8n's Task Runner sandbox blocks process.env and require(). Use n8n's built-in Redis nodes instead of Code nodes with ioredis. Use DJB2 hash instead of crypto.createHash().

Build Metrics — How Long Did This Take?

Phases Completed

Execution Plans

157 min

Active Build Time

Total Cost

ℹ️

Average: 13 minutes per plan. The entire hardening project — from first swap file to final load test — took under 3 hours of active work using Claude Code. Wall-clock time was longer due to breaks and decisions between phases, but the actual build time was 157 minutes across 13 plans.

Phase	Plans	Time	Avg/Plan
1. Lock It Down	3	27 min	9 min
2. Watch It	3	26 min	9 min
3. Speed It Up	2	56 min	28 min
4. Shield It	2	29 min	15 min
5. Prove It	2	19 min	10 min
6. Teach It	1	This deck	—

What's Next

This hardening covers 10-50 concurrent unauthenticated users — the demo-day tier. Scaling beyond that requires:

Streaming chatbot responses — real-time typing effect so users see answers as they're generated, not after a multi-second wait
Queue-based processing — at 100+ concurrent users, webhook requests should be queued and processed in order, not all at once
Self-hosted notification server — production government systems need ntfy on their own infrastructure, not a public service
Fix Supabase health checks — 4 containers report "unhealthy" because their Docker images lack the curl binary used by health check commands (services work fine — it's cosmetic)

💡

The hardening approach scales. The same 6-phase pipeline — lock down, observe, optimize, shield, validate, document — applies to any web service. The specific tools change; the methodology doesn't.

Resources

From this project:

Verification Matrix — 50+ metrics with before/after values
Innovation Fellowship Library — all training decks

Tools used (all free tier):

Cloudflare — DNS, CDN, DDoS protection
Let's Encrypt — free SSL/TLS certificates
Prometheus + Grafana — metrics and dashboards
Uptime Kuma — self-hosted uptime monitoring
k6 — open-source load testing
fail2ban — intrusion prevention
Redis — in-memory caching
ntfy — push notifications

Questions? This deck is part of the Governor's Innovation Fellowship training library.