Building GenAI Chatbots for Government

A Reusable Pattern

Core Thesis: Your agency already has the knowledge. It's in your regulations, guidance documents, and program descriptions. This pattern makes that knowledge searchable, conversational, and available 24/7 — without writing a single new document.

State Water Resources Control Board · Innovation Fellowship · March 2026

The Problem We're Solving

Government agencies sit on mountains of domain knowledge — regulations, program guides, compliance procedures, FAQs. Citizens can't find what they need.

The Citizen Experience

Regulations buried in 200-page PDFs
Program eligibility scattered across 15 web pages
Phone lines backed up with routine questions
Different answers from different staff members
Business hours only — no weekend or evening access

The Staff Experience

Answering the same 50 questions repeatedly
Directing people to documents they can't navigate
Stretched thin between complex cases and routine inquiries
Knowledge trapped in individual expertise
No way to scale without hiring

ℹ️

The opportunity isn't creating new knowledge — it's unlocking what you already have. Every agency has regulations, guidance documents, program descriptions, and FAQs sitting in PDFs and web pages. This pattern structures that existing content and makes it conversational.

The Policy Landscape

California has built the framework for responsible GenAI adoption. The mandate is clear — the barrier is execution.

State Policy Framework

Policy	What It Does
SAM 4986 Series	13-section governance framework for GenAI in state government (updated Feb 2025 via TL 25-01)
EO N-12-23	Directs agencies to adopt GenAI responsibly; streamline procurement
CDT GenAI Sandbox	Cloud-based POC testing environment — won NASCIO State CIO award
Poppy AI Assistant	CDT-built platform: 2,600+ users across 66 departments, 11 LLMs

The Momentum Is Real

State Level

66 departments already using Poppy
20+ AI training courses built by CalHR/ODI/CDT
Little Hoover Report #284: "Give all state workers AI access"
Caltrans, Finance, CDTFA all running GenAI pilots

Federal Tailwinds

OMB M-25-21/22: Accelerating federal AI adoption + procurement modernization
America's AI Action Plan: Open-model support, infrastructure investment
NIST AI Risk Management Framework remains the gold standard
Federal posture is pro-adoption — urgency to build competency now

The Compliance Fast Lane

Not all GenAI projects carry the same compliance burden. A public-data RAG chatbot is the lowest-risk GenAI project a department can build.

Full Process (High-Risk GenAI)

PAL (Project Approval Lifecycle)
PDL (Project Delivery Lifecycle)
Full SIMM 5305-F risk assessment
CDT consultation before procurement
Timeline: 6-12+ months

Public-Data RAG Path (Streamlined)

SIMM 150 EZ (streamlined assessment)
Possibly SIMM 71B certification
Standard ISO briefing
Timeline: Weeks, not months

Why Public-Data RAG Qualifies for the Fast Lane

Criterion	Status	Why
No PII ingested	✅	Only publicly available documents — regulations, guides, FAQs
No decisions made	✅	Bot retrieves and cites existing information — doesn't adjudicate
Source attribution	✅	Every answer cites the specific document it came from — built-in audit trail
No model training	✅	Uses commercial APIs with enterprise agreements — no state data in training
No citizen data stored	✅	Session context lives in the browser only — nothing persists server-side

:::callout success The paperwork is manageable — and it runs in parallel with your build. Start your SIMM paperwork on Day 1, brief your ISO early, and compliance resolves while development is underway. :::

Phase 1: Curate the Knowledge

architecture diagram — Architecture Diagram

This is where your business and program staff do the work — not IT. They know the domain. They know what citizens ask. They know which documents matter.

The Two-Team Model

Track	Who	What	Hours
Knowledge	Business/Program SMEs	Identify existing sources, curate content, define test questions	8-16 hours
Build	IT staff (1-2 developers)	Structure content, build UI, wire infrastructure, test	40-60 hours

The handoff is simple: SMEs deliver a folder of organized documents + a list of 20-30 test questions with expected answers. IT builds from there.

ℹ️

This pattern curates existing public content — it doesn't require writing new material. If your agency has regulations, guidance documents, and program descriptions (every agency does), you have what you need to start.

The Content Pipeline

Your SMEs identify the source material. AI agents help organize, cross-reference, and identify gaps. The output is structured, searchable knowledge.

From Scattered Documents to Structured Knowledge

What Goes In

Regulations and statutes (PDFs, web pages)
Program guidance documents
FAQ pages and help articles
Compliance procedures
Application instructions
Fee schedules and eligibility criteria

What Comes Out

Organized markdown files with semantic headers
Categorized by topic (permits, funding, compliance, etc.)
Cross-referenced and deduplicated
URLs verified and validated
Ready for embedding and retrieval

Quality Gates (Non-Negotiable)

Gate	What It Catches	Method
Duplicate detection	Same content ingested twice	MD5 content hashing
URL verification	Broken links in source material	Batch URL testing
Coverage testing	Missing topics citizens actually ask about	20-30 adversarial test questions from SMEs
Gap remediation	Holes found during testing	SMEs identify additional source docs

Real example: WaterBot's knowledge base — 128 documents across 8 categories (permits, funding, compliance, water quality, entities, water rights, climate, public resources), curated entirely from existing SWRCB public content.

From Documents to Vectors

This is the technical core — how human-readable documents become machine-searchable knowledge. IT handles this step using the structured content from Phase 1.

How It Works

Step	What Happens	Why It Matters
Chunking	Documents split on section headers (H2) — not arbitrary character counts	Preserves semantic meaning; a chunk about "NPDES permits" stays together
Embedding	Each chunk converted to a 1,536-dimensional vector	Captures meaning, not just keywords — "water discharge permit" matches "NPDES"
Indexing	Vectors stored in PostgreSQL with pgvector extension	Standard database technology — no exotic infrastructure required
Similarity search	User questions matched against chunks by cosine distance	Returns the 5-8 most relevant chunks for any question

Technical Details

Chunking strategy: Semantic splitting on H2 headers. If a section exceeds 2,000 characters, it splits on paragraph boundaries. Document title (H1) is prepended to every chunk for context.

Embedding model: OpenAI text-embedding-3-small (1,536 dimensions). Cost: ~$0.02 per million tokens — the entire WaterBot knowledge base costs pennies to embed.

Vector index: IVFFlat (Inverted File with Flat quantization) for cosine similarity. Critical lesson: indexes must be rebuilt after bulk inserts — new vectors aren't automatically indexed.

Top-K retrieval: 5-8 chunks per query, filtered by similarity threshold. More chunks = more context but higher API costs and potential noise.

Phase 2: Build the Bot

IT takes the structured knowledge from Phase 1 and builds the bot. AI-assisted development means agents write components, configure workflows, and test integrations — a 1-2 person team can build production-quality bots.

What Gets Built

Component	Purpose	Effort
Chat interface	Conversational AI with RAG retrieval	Core — every bot has this
Decision tree tool	Step-by-step navigator for complex processes	Per domain need
Smart calculator/matcher	Eligibility checker, program finder	Per domain need
Intake form	Personalize responses by user type, location	Recommended
Workflow engine	Orchestrate embed → search → LLM → response	Core infrastructure

First bot: 40-60 hours. Second and third bots: 30-40 hours each — shared infrastructure already exists.

The Three-Mode UI Pattern

Every bot supports three ways for citizens to get help — each optimized for a different type of question.

Mode 1: Guided Chat

Ask any question in natural language. The bot searches the knowledge base, retrieves relevant documents, and generates an answer with source citations.

Best for: open-ended questions, exploring unfamiliar topics, "what do I need to know about..."

Mode 2: Decision Tree

Step-by-step guided navigation through complex processes. Each choice narrows the path until the citizen reaches their specific answer.

Best for: permit selection, program eligibility, "which form do I need..."

Mode 3: Smart Tool

Domain-specific calculators and matchers. Enter criteria, get filtered results with eligibility details.

Best for: funding programs, fee calculations, "am I eligible for..."

Why Three Modes?

Not every question fits a chatbot. A citizen who needs a specific permit shouldn't have to describe their project in conversation — a decision tree gets them there in 4-5 clicks. A citizen exploring funding options gets more value from a matcher that filters 58 programs by their criteria than from a chat response.

The pattern: Build the chat first (every bot needs it), then add decision trees and tools based on your domain's most common workflows.

ℹ️

User profiling without authentication. An optional intake form collects context (user type, location, primary concern) and stores it in the browser only — no accounts, no PII, no database. This context personalizes chat responses without creating compliance burden.

The Workflow Engine

When a citizen asks a question, here's the 2-3 second journey from query to answer.

What Happens at Each Step

Step	Component	What It Does
1. Receive	Webhook endpoint	Accepts the question, session context, and conversation history
2. Embed	Embedding API	Converts the question into a 1,536-dimensional vector
3. Search	Vector database	Finds the 5-8 most semantically similar knowledge chunks
4. Assemble	Prompt builder	Combines: system instructions + retrieved chunks + user context + question
5. Generate	LLM (Claude, GPT-4, etc.)	Produces a natural language answer grounded in the retrieved documents
6. Return	Response formatter	Delivers markdown response with source citations back to the frontend

Key Design Decisions

Source attribution on every answer — citizens can verify; auditors can trace
Conversation history included — multi-turn dialogue without citizens repeating themselves
Graceful fallbacks — if no relevant chunks found, the bot says so instead of hallucinating
Rate limiting + token validation — security at the API gateway level

How the System Prompt Works

The system prompt is assembled dynamically for each request:

Bot personality — "You are WaterBot, an expert on California water regulations..."
Instructions — "Always cite sources. If the knowledge base doesn't cover a topic, say so."
RAG context — The 5-8 most relevant knowledge chunks, injected verbatim
User profile — "The user is a homeowner in Santa Clara concerned about water quality"
Conversation history — Previous Q&As in this session

This means the LLM is never inventing answers — it's synthesizing from specific, cited documents with instructions to stay grounded.

The Infrastructure Stack

This pattern runs on infrastructure most departments already have. Here's how each component maps to Azure Gov and AWS GovCloud — the two platforms California state agencies primarily use.

Layer	What It Does	Azure Gov	AWS GovCloud
Frontend	User interface (the website)	Azure Static Web Apps	S3 + CloudFront / Amplify
Workflow	Orchestrates the RAG pipeline	Azure Logic Apps / Functions	Step Functions / Lambda
Vector DB	Stores and searches knowledge embeddings	Azure PostgreSQL Flex (pgvector)	RDS PostgreSQL (pgvector)
LLM	Generates natural language responses	Azure OpenAI Service	Amazon Bedrock
Embedding	Converts text to vectors	Azure OpenAI Embeddings	Bedrock Embeddings
Hosting	Runs the application	Azure Container Apps	ECS / Fargate

What It Costs

Monthly Operating Cost (Single Bot)

Component	Estimated Cost
Cloud hosting	$20-50/mo
LLM API calls	$50-200/mo
Embedding API	< $5/mo
Vector database	$15-30/mo
Total	$85-285/mo

For Comparison

Alternative	Cost
One full-time staff member	~$8,000-12,000/mo
Enterprise chatbot vendor license	~$2,000-10,000/mo
This pattern (3 bots)	~$200-500/mo

:::callout success This runs on commodity cloud infrastructure you already have. No special hardware. No GPU clusters. No massive budget line items. The marginal cost of adding a second or third bot is minimal — shared infrastructure means most costs are already covered. :::

Phase 3: Serve Citizens

The bot is live. Citizens ask questions and get accurate, sourced answers in seconds — 24 hours a day, 7 days a week.

What Citizens Experience

Aspect	What They See
Response time	2-3 seconds per answer
Source citations	Every answer links to the specific regulation or document
Three ways to ask	Chat for open questions, decision trees for processes, tools for eligibility
Personalized context	Answers tailored to their user type and location
No account required	Start asking immediately — no signup, no login

What Staff Gain

Routine questions handled automatically — staff focus on complex cases
Consistent answers — every citizen gets the same accurate information
24/7 availability — nights, weekends, holidays
Scalable — handles 10 or 10,000 concurrent users with the same infrastructure
Measurable — every interaction is trackable for service improvement

Quality and Maintenance

A bot is a garden, not a monument — it needs tending. Quality assurance happens at launch and on an ongoing cycle.

Launch Testing: SMEs Are the Quality Gate

Your subject matter experts write the test questions and evaluate the answers. They don't need to understand embeddings — they need to ask the bot questions and say "that's right" or "that's wrong."

Test Type	What It Catches	Who Does It
Happy path	Common questions answered correctly	SMEs — 10-15 standard questions
Edge cases	Unusual or compound questions	SMEs — questions from real citizen calls
Out of scope	Bot admits when it doesn't know	IT — test questions outside the knowledge base
Adversarial	Attempts to get incorrect or inappropriate responses	IT — deliberate trick questions

Scoring: 5-point scale per response. Target: average ≥ 4.0, no response below 3.0.

Quarterly Refresh Cycle

Step	What Happens	Who
ASSESS	Check content freshness, verify URLs, identify regulation changes	SMEs (2-4 hours)
UPDATE	Revise source documents with current information	SMEs (2-4 hours)
INGEST	Re-chunk, re-embed, rebuild vector indexes	IT (1-2 hours)
TEST	Run regression suite, verify no degradation	Both (2-3 hours)

⚠️

Content goes stale. Fee schedules change. Regulations update. Programs sunset. A quarterly check catches drift before citizens get outdated answers. Budget 4-8 hours per quarter per bot for maintenance.

The Real Timeline

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Gates

◆ Scope

◆ RAG Works

◆ Launch

SME

Content Curation (8-16h)

Test & Review (4-8h)

Embed

Build UI & Workflows (24-36h)

Test & Polish (8-12h)

Compliance

SIMM paperwork + ISO briefing (runs parallel)

The Work Is ~50-75 Hours. The Calendar Depends on Dependencies.

Track	Activity	Hours	Calendar
SME	Content curation from existing docs	8-16h	Weeks 1-2
IT	Structure content, chunk, embed, validate	8-12h	Weeks 2-3
IT	Build UI, wire workflows, integrate	24-36h	Weeks 3-5
SME + IT	Adversarial testing + response review	4-8h	Weeks 5-6
IT	Testing + polish	8-12h	Weeks 5-6
Compliance	SIMM paperwork + ISO briefing	4-6h	Parallel from Week 1

Decision Gates

Gate	When	What Gets Decided
Gate 1	After content audit	Scope confirmed — we know what the bot will cover
Gate 2	After RAG retrieval tested	"It works" — questions return relevant answers
Gate 3	After adversarial testing	Ready to launch — quality meets the bar

The Real Bottleneck

⚠️

40 hours built 3 bots. The bottleneck isn't the technology — it's the procurement.

Dependency	If You Have It	If You Don't
Cloud resources (Azure/AWS)	Days to provision	Months to procure
LLM API access	Check CDT master agreements	New vendor onboarding: 2-4 months
ISO availability	Brief on Day 1 → parallel review	Backlogged ISO → 4-6 week wait

Start the procurement and compliance paperwork today so the infrastructure is ready when your developer sits down.

After the first bot: Second and third bots take 30-40 hours IT + 8 hours SME — shared infrastructure already exists.

Getting Started

Your Next Steps

Pick your domain — Which program area gets the most routine citizen inquiries?
Identify your SMEs — Who answers those questions today? They're your knowledge team.
Check your infrastructure — Do you have Azure Gov or AWS GovCloud access? LLM API agreements?
Start compliance early — File your SIMM 150 EZ, brief your ISO, check CDT master agreements
Curate your knowledge — SMEs: gather your top 50-100 source documents
Build your bot — IT: structure, embed, build, test, launch

Resources

Resource	Link
SAM 4986 Series (GenAI governance)	dgs.ca.gov/Resources/SAM
CDT GenAI Toolkit	genai.cdt.ca.gov
GenAI Risk Assessment	genai.cdt.ca.gov/risk-assessment
CDT GenAI Sandbox	genai.cdt.ca.gov
Poppy AI Platform	genai.ca.gov/poppy
Little Hoover Report #284	lhc.ca.gov
NIST AI Risk Management Framework	nist.gov/itl/ai-risk-management-framework

Live Examples

Three bots built with this pattern are live at vanderdev.net:

Bot	Domain	Knowledge Base	Modes
WaterBot	CA water regulations (SWRCB)	128 docs, 8 categories	Chat, Permit Finder, Funding Navigator
BizBot	CA business licensing	Licensing guides, compliance	Chat, License Finder
KiddoBot	CA childcare programs	Program guides, eligibility	Chat, Program Finder, Eligibility Calculator

Pick your domain. Curate your knowledge. Build your bot.

The pattern is proven. The policy framework exists. The technology is commodity. The only thing missing is your agency's decision to start.

State Water Resources Control Board · Innovation Fellowship · March 2026