WaterBot: Building a Government AI Assistant

A RAG Chatbot Case Study for California Water Boards

Core Thesis: The AI model is the easy part. The knowledge base is what separates a hallucinating toy from a production system.

February 2026

The Complexity Problem

California's water regulatory landscape: a maze of agencies, permits, and programs that overwhelms the people who need it most.

State Water Board

Regional Boards

100+

Permit Types

Funding Programs

What if we could give every Californian access to the equivalent of a water regulation expert, 24/7, for free?

Meet WaterBot

A live, production AI assistant for California Water Boards — four ways to get help.

WaterBot mode selector interface

1 2 3 4 5

1Personalized Help — intake form gathers project context first

2Just Chat — ask anything about CA water regulations

3Permit Finder — interactive decision tree for permit types

4Funding Navigator — eligibility checker for infrastructure funding

5Disclaimer — always directs to official Regional Board sources

WaterBot in Action

A real conversation — user asks about NPDES permits, WaterBot responds with structured guidance and official links.

WaterBot responding to NPDES permit question

1 2 3 4

1User question — plain English, no jargon required

2Structured response — headings and bullets, not a wall of text

3Inline link to SMARTS Portal — actionable, not just informational

4Fee formula with current year — grounded in real regulatory data

The Full Experience

Personalized Help mode walks users through a 5-step intake so the AI knows their situation before answering.

Step 1 — Project Type

Construction, agricultural, municipal, industrial, or habitat restoration

Step 2 — Location

County selection → maps to the correct Regional Water Board

Step 3 — Discharge Details

Type and volume — determines which permits apply

Step 4 — Water Rights & Federal Nexus

Existing rights? Federal permits? Changes the regulatory path.

Step 5 — Applicant Profile

Business, individual, or municipality — plus DAC status for funding eligibility

The Permit Decision Tree

An interactive decision tree built from a 107KB JSON structure covering every permit pathway.

How Most People Find Permits

Google "California water permit"
Land on the wrong Regional Board site
Still not sure which permit applies
Give up and call a consultant

How the Decision Tree Works

Answer 4-6 plain-language questions
Get the specific permit type you need
Direct link to the application portal
Total time: under 2 minutes

Design principle: The tree doesn't replace the regulatory process — it helps people find the right starting point.

The Funding Navigator

58 state and federal funding programs, matched by answering 5 plain-language questions — no AI hallucination risk.

How Most People Find Funding

Google scattered agency sites
Read 200-page NOFAs
Miss programs they qualify for
Give up and hire a grant writer

How the Navigator Works

Answer 5 questions: org type, project, population, DAC status, matching funds
Get tiered results: Eligible, Likely, May Qualify
Direct links to applications
AI enriches results with tips — but matching is deterministic

Programs Cataloged

Questions Asked

Eligibility Tiers

Deterministic matching — no AI hallucination risk. Hard filters narrow the field; the AI only enriches results with application tips and deadline context.

How It Actually Works: RAG

How It Works

RAG — Retrieval-Augmented Generation. Think of it as giving the AI a research assistant who pulls the right files before speaking.

Without RAG

AI answers from memory (training data)
Can't cite specific sources
Hallucinates confidently
"I think the permit fee is around..."

With RAG

Searches a curated knowledge base first
Every claim cites a real source
Knowledge updates without retraining
"The CGP fee formula is $511 + ($54 × acres)"

The Full Architecture

Six systems cooperate in about 2-3 seconds. None are exotic — React, PostgreSQL, and webhook APIs.

👤

User Asks

React frontend

→

🔗

Webhook

n8n receives

→

🔢

Embed

OpenAI 1,536d

→

🔍

pgvector top-8

→

📋

Prompt

Context + query

→

🤖

Response

Claude + citations

Key insight: The AI (steps 5-6) is the simplest piece. 80% of the effort is in curating, embedding, and tuning retrieval.

n8n gotchas: alwaysOutputData: true prevents silent pipeline death on zero results. escapeBraces() in prompt templates avoids n8n expression collisions. Top-K = 8 chunks.

The Stack

Every component is open-source or free-tier.

⚛️

React + Tailwind

Frontend UI

🔄

n8n

Workflow automation

🧠

OpenAI Embeddings

text-embedding-3-small

🗄️

PostgreSQL + pgvector

Vector database

🤖

Claude (Anthropic)

Response generation

🐳

Docker on VPS

Tailscale mesh networking

The Knowledge Base

Knowledge Engineering

The knowledge base is the product. The LLM is a commodity.

130

Markdown Files

179

Database Chunks

Inside the Knowledge Base

Category	Coverage
Permits & Compliance	NPDES, WDR, 401 Cert, MS4, enforcement
Funding Programs	CWSRF, DWSRF, SAFER, Prop 4, federal grants
Regional Boards	All 9 regions — jurisdictions, contacts, priorities
Water Quality	TMDLs, impaired waters, beneficial uses
Pollutants	PFAS, lead, arsenic, nitrate, chromium-6
Consumer FAQ	Tap safety, CCR reports, billing, hard water
Conservation	Usage targets, drought rules, Save Our Water

Why Chunking Strategy Matters

If you chunk badly, your retrieval will be bad — and no LLM can fix bad retrieval.

Wrong: Arbitrary Splitting

Split every 500 characters regardless of content:

...discharge requirements for car
washes. The permit fee

schedule is as follows: $500 for
minor facilities...

Information split mid-sentence. Related content scattered.

Right: Semantic Splitting

Split on H2 headers — each chunk is a complete thought:

## Fee Schedule
The permit fee for car washes is
$500 for minor facilities and
$1,200 for major facilities...

Complete section stays together. Self-contained and retrievable.

Our rule: Split on H2 headers. Max 2,000 chars, min 100. Overflow splits on paragraph boundaries — never mid-sentence.

Quality Assurance Pipeline

This pipeline is what separates a demo from production.

1 — Semantic Chunking

Split on meaning, not character count. H2 headers define chunk boundaries.

2 — Embedding Generation

OpenAI text-embedding-3-small → 1,536-dimension vectors per chunk.

3 — Deduplication

MD5 hash check — duplicate chunks are invisible poison for retrieval.

4 — URL Validation

313 URLs tested. Dead links destroy credibility.

5 — Adversarial Testing

35 real-world queries from Reddit and forums — NOT self-generated.

6 — Gap Analysis & Retest

Find what's missing, add content, re-embed, repeat until 100%.

Testing with Real Questions

Critical: We tested with real questions from Reddit, water operator forums, and agency FAQ pages — NOT questions we wrote ourselves. This prevents circular testing.

We measure cosine similarity — how closely a user's question matches content in the knowledge base. 0.0 = no match, 1.0 = identical. Above 0.40 means the system found relevant content to answer from.

35/35

Passed — every query found relevant content

0.59 avg

Strong retrieval across all queries

0.40

Pass/fail threshold

Sample Results

Real-World Query	Score	Verdict
"Recycled water regulations California"	0.79	STRONG
"TMDL pollution limits explained"	0.76	STRONG
"SAFER funding eligibility"	0.72	STRONG
"How do I report a sewage spill?"	0.51	STRONG
"Is chromium-6 in my tap water?"	0.63	STRONG

All 35 queries sourced from outside the content creation process. Non-circular methodology.

What Testing Revealed

Experts build knowledge bases that answer expert questions. Real users ask beginner questions.

Before: 64% Coverage

Strong on permits and enforcement
Good on regional board jurisdictions
Missing: "Is my tap water safe?"
Missing: "How do I read my water bill?"

After: 100% Coverage

Added 25 consumer FAQ documents
Added conservation program guides
Chromium-6: 0.34 → 0.63
Water billing: new → 0.52

Less Is More: The Clean-Slate Rebuild

The counterintuitive move that made WaterBot production-ready.

1,286

Original Chunks

179

After Rebuild

86%

Reduction

100%

Coverage Maintained

When your RAG system underperforms, your instinct will be to add more content. Often the right move is to remove content. Noise drowns signal.

Trust Architecture

Every design decision in WaterBot prioritizes accuracy and transparency.

Source Citations

Every response links to original documents. 313 URLs to official waterboards.ca.gov pages.

Grounded Generation

System prompt: "Answer using ONLY the provided context." No creative fill-in.

Disclaimer Transparency

"This is not legal advice. For official guidance, contact your Regional Water Board."

No PII Collection

No login. No personal data stored. Session state lives in the browser only.

Infrastructure

The Docker Stack

25 Docker containers on a single VPS:

n8n — Workflow automation
Supabase — PostgreSQL + pgvector, Auth, REST API
nginx-proxy — SSL/TLS via Let's Encrypt
Tailscale — Encrypted mesh networking
Portainer — Container management UI

Backups follow the 3-2-1 rule: local, remote sync, offsite cloud.

Monitoring

Real-time health monitoring:

Prometheus — Metrics collection (4 targets)
Grafana — Visual dashboards (109 panels)
Uptime Kuma — Health checks every 60 seconds
ntfy — Push notifications on failure

Alerts fire in under 2 minutes. Full rebuild from scratch in under an hour via Docker Compose.

One Pattern, Three Bots

The same architecture powers two other chatbots — proving the pattern is reusable.

179

WaterBot Chunks

425

BizBot Chunks

1,402

KiddoBot Chunks

Same Infrastructure

Same PostgreSQL + pgvector database
Same n8n workflow pattern
Same OpenAI embedding model
Shared component library — ChatMessage, DecisionTreeView, RAGButton all reused

Different Domains

WaterBot — Water regs + permits + funding navigator
BizBot — Business permits + license finder
KiddoBot — Childcare resources + program finder
Each has its own KB, test suite, and specialized tools

Lessons Learned

Every "don't" below is something we did and had to fix.

Do

Spend 80% of time on the knowledge base, 20% on technology
Test with real user queries from outside your team
Use semantic chunking — split on meaning, not character count
Start with a narrow domain and go deep, not wide

Don't

Assume more data means better answers — noise drowns signal
Test with questions you wrote yourself — that's circular testing
Skip deduplication — duplicate chunks are invisible poison
Launch without URL validation — dead links destroy credibility

Biggest lesson: We had 2 weeks of false confidence from circular tests. Real user questions from Reddit dropped coverage from "100%" to 64%.

Your Blueprint

Your Turn

WaterBot was built in about 20 hours by one person. Here's the breakdown.

Day 1 — Pick Your Domain & Write the KB

"All of Caltrans" is too wide. "Encroachment permits for District 4" is right. Write markdown files with source URLs — this is the bulk of the work.

Day 2 — Database + Chunking

PostgreSQL + pgvector (Supabase = one container). Split on headers, embed with OpenAI, load. A 50-line script.

Day 3 — Orchestration + Frontend

n8n webhook wires the pipeline visually. React chat UI POSTs to your webhook. No server code.

Day 4-5 — QA Gauntlet

Deduplicate. Validate URLs. Test with real queries from outside your team. Find gaps, add content, retest.

Total: ~20 hours with one technical person. The knowledge base is the longest part — plan accordingly.

Resources

Try WaterBot now — vanderdev.net/waterbot

Open source — The knowledge base and frontend are available on GitHub

Related Training:

Module 1: Perplexity AI for Government — Research tools
Module 2: GitHub for Non-Coders — Collaboration
Module 5: RAG Quality Assurance — QA methodology

Tools Used:

n8n.io — Workflow automation (open source)
supabase.com — Database + pgvector (open source)
openai.com — Embeddings API
anthropic.com — Claude API (LLM)