Building GenAI Chatbots for Government

A Reusable Pattern

Core Thesis: Your agency already has the knowledge. It's in your regulations, guidance documents, and program descriptions. This pattern makes that knowledge searchable, conversational, and available 24/7 — without writing a single new document.

State Water Resources Control Board · Innovation Fellowship · March 2026

The Problem We're Solving

Government agencies sit on mountains of domain knowledge — regulations, program guides, compliance procedures, FAQs. Citizens can't find what they need.

The Citizen Experience

  • Regulations buried in 200-page PDFs
  • Program eligibility scattered across 15 web pages
  • Phone lines backed up with routine questions
  • Different answers from different staff members
  • Business hours only — no weekend or evening access

The Staff Experience

  • Answering the same 50 questions repeatedly
  • Directing people to documents they can't navigate
  • Stretched thin between complex cases and routine inquiries
  • Knowledge trapped in individual expertise
  • No way to scale without hiring
ℹ️

The opportunity isn't creating new knowledge — it's unlocking what you already have. Every agency has regulations, guidance documents, program descriptions, and FAQs sitting in PDFs and web pages. This pattern structures that existing content and makes it conversational.

The Policy Landscape

California has built the framework for responsible GenAI adoption. The mandate is clear — the barrier is execution.

State Policy Framework

Policy What It Does
SAM 4986 Series 13-section governance framework for GenAI in state government (updated Feb 2025 via TL 25-01)
EO N-12-23 Directs agencies to adopt GenAI responsibly; streamline procurement
CDT GenAI Sandbox Cloud-based POC testing environment — won NASCIO State CIO award
Poppy AI Assistant CDT-built platform: 2,600+ users across 66 departments, 11 LLMs

The Momentum Is Real

State Level

  • 66 departments already using Poppy
  • 20+ AI training courses built by CalHR/ODI/CDT
  • Little Hoover Report #284: "Give all state workers AI access"
  • Caltrans, Finance, CDTFA all running GenAI pilots

Federal Tailwinds

  • OMB M-25-21/22: Accelerating federal AI adoption + procurement modernization
  • America's AI Action Plan: Open-model support, infrastructure investment
  • NIST AI Risk Management Framework remains the gold standard
  • Federal posture is pro-adoption — urgency to build competency now

The Compliance Fast Lane

Not all GenAI projects carry the same compliance burden. A public-data RAG chatbot is the lowest-risk GenAI project a department can build.

Full Process (High-Risk GenAI)

  • PAL (Project Approval Lifecycle)
  • PDL (Project Delivery Lifecycle)
  • Full SIMM 5305-F risk assessment
  • CDT consultation before procurement
  • Timeline: 6-12+ months

Public-Data RAG Path (Streamlined)

  • SIMM 150 EZ (streamlined assessment)
  • Possibly SIMM 71B certification
  • Standard ISO briefing
  • Timeline: Weeks, not months

Why Public-Data RAG Qualifies for the Fast Lane

Criterion Status Why
No PII ingested Only publicly available documents — regulations, guides, FAQs
No decisions made Bot retrieves and cites existing information — doesn't adjudicate
Source attribution Every answer cites the specific document it came from — built-in audit trail
No model training Uses commercial APIs with enterprise agreements — no state data in training
No citizen data stored Session context lives in the browser only — nothing persists server-side

:::callout success The paperwork is manageable — and it runs in parallel with your build. Start your SIMM paperwork on Day 1, brief your ISO early, and compliance resolves while development is underway. :::

Phase 1: Curate the Knowledge

architecture diagram
Architecture Diagram

This is where your business and program staff do the work — not IT. They know the domain. They know what citizens ask. They know which documents matter.

The Two-Team Model

Track Who What Hours
Knowledge Business/Program SMEs Identify existing sources, curate content, define test questions 8-16 hours
Build IT staff (1-2 developers) Structure content, build UI, wire infrastructure, test 40-60 hours

The handoff is simple: SMEs deliver a folder of organized documents + a list of 20-30 test questions with expected answers. IT builds from there.

ℹ️

This pattern curates existing public content — it doesn't require writing new material. If your agency has regulations, guidance documents, and program descriptions (every agency does), you have what you need to start.

The Content Pipeline

Your SMEs identify the source material. AI agents help organize, cross-reference, and identify gaps. The output is structured, searchable knowledge.

From Scattered Documents to Structured Knowledge

What Goes In

  • Regulations and statutes (PDFs, web pages)
  • Program guidance documents
  • FAQ pages and help articles
  • Compliance procedures
  • Application instructions
  • Fee schedules and eligibility criteria

What Comes Out

  • Organized markdown files with semantic headers
  • Categorized by topic (permits, funding, compliance, etc.)
  • Cross-referenced and deduplicated
  • URLs verified and validated
  • Ready for embedding and retrieval

Quality Gates (Non-Negotiable)

Gate What It Catches Method
Duplicate detection Same content ingested twice MD5 content hashing
URL verification Broken links in source material Batch URL testing
Coverage testing Missing topics citizens actually ask about 20-30 adversarial test questions from SMEs
Gap remediation Holes found during testing SMEs identify additional source docs

Real example: WaterBot's knowledge base — 128 documents across 8 categories (permits, funding, compliance, water quality, entities, water rights, climate, public resources), curated entirely from existing SWRCB public content.

From Documents to Vectors

This is the technical core — how human-readable documents become machine-searchable knowledge. IT handles this step using the structured content from Phase 1.

architecture diagram
Architecture Diagram

How It Works

Step What Happens Why It Matters
Chunking Documents split on section headers (H2) — not arbitrary character counts Preserves semantic meaning; a chunk about "NPDES permits" stays together
Embedding Each chunk converted to a 1,536-dimensional vector Captures meaning, not just keywords — "water discharge permit" matches "NPDES"
Indexing Vectors stored in PostgreSQL with pgvector extension Standard database technology — no exotic infrastructure required
Similarity search User questions matched against chunks by cosine distance Returns the 5-8 most relevant chunks for any question
Technical Details

Chunking strategy: Semantic splitting on H2 headers. If a section exceeds 2,000 characters, it splits on paragraph boundaries. Document title (H1) is prepended to every chunk for context.

Embedding model: OpenAI text-embedding-3-small (1,536 dimensions). Cost: ~$0.02 per million tokens — the entire WaterBot knowledge base costs pennies to embed.

Vector index: IVFFlat (Inverted File with Flat quantization) for cosine similarity. Critical lesson: indexes must be rebuilt after bulk inserts — new vectors aren't automatically indexed.

Top-K retrieval: 5-8 chunks per query, filtered by similarity threshold. More chunks = more context but higher API costs and potential noise.

Phase 2: Build the Bot

architecture diagram
Architecture Diagram

IT takes the structured knowledge from Phase 1 and builds the bot. AI-assisted development means agents write components, configure workflows, and test integrations — a 1-2 person team can build production-quality bots.

What Gets Built

Component Purpose Effort
Chat interface Conversational AI with RAG retrieval Core — every bot has this
Decision tree tool Step-by-step navigator for complex processes Per domain need
Smart calculator/matcher Eligibility checker, program finder Per domain need
Intake form Personalize responses by user type, location Recommended
Workflow engine Orchestrate embed → search → LLM → response Core infrastructure

First bot: 40-60 hours. Second and third bots: 30-40 hours each — shared infrastructure already exists.

The Three-Mode UI Pattern

Every bot supports three ways for citizens to get help — each optimized for a different type of question.

Mode 1: Guided Chat

Ask any question in natural language. The bot searches the knowledge base, retrieves relevant documents, and generates an answer with source citations.

Best for: open-ended questions, exploring unfamiliar topics, "what do I need to know about..."

Mode 2: Decision Tree

Step-by-step guided navigation through complex processes. Each choice narrows the path until the citizen reaches their specific answer.

Best for: permit selection, program eligibility, "which form do I need..."

Mode 3: Smart Tool

Domain-specific calculators and matchers. Enter criteria, get filtered results with eligibility details.

Best for: funding programs, fee calculations, "am I eligible for..."

Why Three Modes?

Not every question fits a chatbot. A citizen who needs a specific permit shouldn't have to describe their project in conversation — a decision tree gets them there in 4-5 clicks. A citizen exploring funding options gets more value from a matcher that filters 58 programs by their criteria than from a chat response.

The pattern: Build the chat first (every bot needs it), then add decision trees and tools based on your domain's most common workflows.

ℹ️

User profiling without authentication. An optional intake form collects context (user type, location, primary concern) and stores it in the browser only — no accounts, no PII, no database. This context personalizes chat responses without creating compliance burden.

The Workflow Engine

When a citizen asks a question, here's the 2-3 second journey from query to answer.

architecture diagram
Architecture Diagram

What Happens at Each Step

Step Component What It Does
1. Receive Webhook endpoint Accepts the question, session context, and conversation history
2. Embed Embedding API Converts the question into a 1,536-dimensional vector
3. Search Vector database Finds the 5-8 most semantically similar knowledge chunks
4. Assemble Prompt builder Combines: system instructions + retrieved chunks + user context + question
5. Generate LLM (Claude, GPT-4, etc.) Produces a natural language answer grounded in the retrieved documents
6. Return Response formatter Delivers markdown response with source citations back to the frontend

Key Design Decisions

  • Source attribution on every answer — citizens can verify; auditors can trace
  • Conversation history included — multi-turn dialogue without citizens repeating themselves
  • Graceful fallbacks — if no relevant chunks found, the bot says so instead of hallucinating
  • Rate limiting + token validation — security at the API gateway level
How the System Prompt Works

The system prompt is assembled dynamically for each request:

  1. Bot personality — "You are WaterBot, an expert on California water regulations..."
  2. Instructions — "Always cite sources. If the knowledge base doesn't cover a topic, say so."
  3. RAG context — The 5-8 most relevant knowledge chunks, injected verbatim
  4. User profile — "The user is a homeowner in Santa Clara concerned about water quality"
  5. Conversation history — Previous Q&As in this session

This means the LLM is never inventing answers — it's synthesizing from specific, cited documents with instructions to stay grounded.

The Infrastructure Stack

This pattern runs on infrastructure most departments already have. Here's how each component maps to Azure Gov and AWS GovCloud — the two platforms California state agencies primarily use.

Layer What It Does Azure Gov AWS GovCloud
Frontend User interface (the website) Azure Static Web Apps S3 + CloudFront / Amplify
Workflow Orchestrates the RAG pipeline Azure Logic Apps / Functions Step Functions / Lambda
Vector DB Stores and searches knowledge embeddings Azure PostgreSQL Flex (pgvector) RDS PostgreSQL (pgvector)
LLM Generates natural language responses Azure OpenAI Service Amazon Bedrock
Embedding Converts text to vectors Azure OpenAI Embeddings Bedrock Embeddings
Hosting Runs the application Azure Container Apps ECS / Fargate

What It Costs

Monthly Operating Cost (Single Bot)

Component Estimated Cost
Cloud hosting $20-50/mo
LLM API calls $50-200/mo
Embedding API < $5/mo
Vector database $15-30/mo
Total $85-285/mo

For Comparison

Alternative Cost
One full-time staff member ~$8,000-12,000/mo
Enterprise chatbot vendor license ~$2,000-10,000/mo
This pattern (3 bots) ~$200-500/mo

:::callout success This runs on commodity cloud infrastructure you already have. No special hardware. No GPU clusters. No massive budget line items. The marginal cost of adding a second or third bot is minimal — shared infrastructure means most costs are already covered. :::

Phase 3: Serve Citizens

architecture diagram
Architecture Diagram

The bot is live. Citizens ask questions and get accurate, sourced answers in seconds — 24 hours a day, 7 days a week.

What Citizens Experience

Aspect What They See
Response time 2-3 seconds per answer
Source citations Every answer links to the specific regulation or document
Three ways to ask Chat for open questions, decision trees for processes, tools for eligibility
Personalized context Answers tailored to their user type and location
No account required Start asking immediately — no signup, no login

What Staff Gain

  • Routine questions handled automatically — staff focus on complex cases
  • Consistent answers — every citizen gets the same accurate information
  • 24/7 availability — nights, weekends, holidays
  • Scalable — handles 10 or 10,000 concurrent users with the same infrastructure
  • Measurable — every interaction is trackable for service improvement

Quality and Maintenance

A bot is a garden, not a monument — it needs tending. Quality assurance happens at launch and on an ongoing cycle.

Launch Testing: SMEs Are the Quality Gate

Your subject matter experts write the test questions and evaluate the answers. They don't need to understand embeddings — they need to ask the bot questions and say "that's right" or "that's wrong."

Test Type What It Catches Who Does It
Happy path Common questions answered correctly SMEs — 10-15 standard questions
Edge cases Unusual or compound questions SMEs — questions from real citizen calls
Out of scope Bot admits when it doesn't know IT — test questions outside the knowledge base
Adversarial Attempts to get incorrect or inappropriate responses IT — deliberate trick questions

Scoring: 5-point scale per response. Target: average ≥ 4.0, no response below 3.0.

Quarterly Refresh Cycle

Step What Happens Who
ASSESS Check content freshness, verify URLs, identify regulation changes SMEs (2-4 hours)
UPDATE Revise source documents with current information SMEs (2-4 hours)
INGEST Re-chunk, re-embed, rebuild vector indexes IT (1-2 hours)
TEST Run regression suite, verify no degradation Both (2-3 hours)
⚠️

Content goes stale. Fee schedules change. Regulations update. Programs sunset. A quarterly check catches drift before citizens get outdated answers. Budget 4-8 hours per quarter per bot for maintenance.

The Real Timeline

Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Gates
◆ Scope
◆ RAG Works
◆ Launch
SME
Content Curation (8-16h)
Test & Review (4-8h)
IT
Embed
Build UI & Workflows (24-36h)
Test & Polish (8-12h)
Compliance
SIMM paperwork + ISO briefing (runs parallel)

The Work Is ~50-75 Hours. The Calendar Depends on Dependencies.

Track Activity Hours Calendar
SME Content curation from existing docs 8-16h Weeks 1-2
IT Structure content, chunk, embed, validate 8-12h Weeks 2-3
IT Build UI, wire workflows, integrate 24-36h Weeks 3-5
SME + IT Adversarial testing + response review 4-8h Weeks 5-6
IT Testing + polish 8-12h Weeks 5-6
Compliance SIMM paperwork + ISO briefing 4-6h Parallel from Week 1

Decision Gates

Gate When What Gets Decided
Gate 1 After content audit Scope confirmed — we know what the bot will cover
Gate 2 After RAG retrieval tested "It works" — questions return relevant answers
Gate 3 After adversarial testing Ready to launch — quality meets the bar

The Real Bottleneck

⚠️

40 hours built 3 bots. The bottleneck isn't the technology — it's the procurement.

Dependency If You Have It If You Don't
Cloud resources (Azure/AWS) Days to provision Months to procure
LLM API access Check CDT master agreements New vendor onboarding: 2-4 months
ISO availability Brief on Day 1 → parallel review Backlogged ISO → 4-6 week wait

Start the procurement and compliance paperwork today so the infrastructure is ready when your developer sits down.

After the first bot: Second and third bots take 30-40 hours IT + 8 hours SME — shared infrastructure already exists.

Getting Started

Your Next Steps

  1. Pick your domain — Which program area gets the most routine citizen inquiries?
  2. Identify your SMEs — Who answers those questions today? They're your knowledge team.
  3. Check your infrastructure — Do you have Azure Gov or AWS GovCloud access? LLM API agreements?
  4. Start compliance early — File your SIMM 150 EZ, brief your ISO, check CDT master agreements
  5. Curate your knowledge — SMEs: gather your top 50-100 source documents
  6. Build your bot — IT: structure, embed, build, test, launch

Resources

Resource Link
SAM 4986 Series (GenAI governance) dgs.ca.gov/Resources/SAM
CDT GenAI Toolkit genai.cdt.ca.gov
GenAI Risk Assessment genai.cdt.ca.gov/risk-assessment
CDT GenAI Sandbox genai.cdt.ca.gov
Poppy AI Platform genai.ca.gov/poppy
Little Hoover Report #284 lhc.ca.gov
NIST AI Risk Management Framework nist.gov/itl/ai-risk-management-framework

Live Examples

Three bots built with this pattern are live at vanderdev.net:

Bot Domain Knowledge Base Modes
WaterBot CA water regulations (SWRCB) 128 docs, 8 categories Chat, Permit Finder, Funding Navigator
BizBot CA business licensing Licensing guides, compliance Chat, License Finder
KiddoBot CA childcare programs Program guides, eligibility Chat, Program Finder, Eligibility Calculator

Pick your domain. Curate your knowledge. Build your bot.

The pattern is proven. The policy framework exists. The technology is commodity. The only thing missing is your agency's decision to start.

State Water Resources Control Board · Innovation Fellowship · March 2026