From Vibe Coding to Disciplined AI-Assisted Development
Core Thesis: AI can write code now. The question isn't whether to use it — it's whether to use it responsibly. Agentic engineering is the disciplined approach that makes AI-assisted development safe, auditable, and production-ready for government.
California Department of Technology | February 2026
Software development is being transformed by AI — faster than any technology shift in decades.
Two distinct approaches have emerged for working with AI in development:
The difference between them is the difference between a hobby project and a government system.
The term was coined by Andrej Karpathy (former Tesla AI Director, OpenAI co-founder) in February 2025:
"You just give in to the vibes, embrace exponentials, and forget that the code even exists."
Vibe coding produces code that works — until it doesn't. And in government, "until it doesn't" means data breaches, compliance violations, and systems that fail when people need them most.
Karpathy coined this term in February 2026 as the mature successor to vibe coding:
"You are not writing the code directly 99% of the time. You are orchestrating agents who do, and acting as oversight. 'Engineering' to emphasize that there is an art & science and expertise to it."
| Vibe Coding | Agentic Engineering | |
|---|---|---|
| Planning | None | Design doc / spec first |
| Testing | Manual, ad hoc | Automated test suites |
| Review | Glance and ship | Structured code review |
| Audit Trail | None | Full version control history |
| Security | Hope for the best | Security scanning, human review |
| Quality | "It works on my machine" | CI/CD, quality gates |
| Governance | None | Risk-tiered oversight |
Sources: Karpathy's X post (Feb 2026), Addy Osmani deep-dive
This is not a theoretical concern. Unstructured AI-generated code in government systems creates real risks:
AI models can generate code with injection vulnerabilities, hardcoded credentials, and insecure data handling. Without security review, these ship to production. In government, that means potential exposure of PII, financial data, and critical infrastructure.
State systems must comply with SIMM 5305-F, SAM 4986.1, NIST frameworks, and department-specific regulations. Vibe-coded systems have no documentation, no audit trail, and no evidence of compliance.
NYC's MyCity chatbot cost $600K and was found giving incorrect legal advice on labor law, harassment, and tips. No domain validation. No human-in-the-loop. Classic vibe coding at city scale. (AI Now Institute testimony, 2024)
When AI-generated code fails, who owns it? Vibe coding has no answer. Agentic engineering does: the human who reviewed, approved, and deployed it.
The position is clear: vibe coding has no place in production government systems. Experimentation is healthy. Shipping unreviewed AI code to systems Californians depend on is not.
The Numbers
California's Track Record
| Project | Budget | Outcome |
|---|---|---|
| FI$Cal (accounting) | $1.6B est. | 20 years, still incomplete |
| MyCalPAYS (payroll) | $373M spent | Terminated — never produced one clean paycheck |
| CCMS (courts) | $500M+ spent | Terminated — estimate grew from $260M to $1.9B |
| BreEZe (licensing) | $96M spent | Terminated — original budget was $28M |
| EDD Systems | $250M+ to Deloitte | Buckled during COVID — 40M calls/month unanswered |
Between 1994 and 2013, California terminated or suspended 7 IT projects after spending nearly $1 billion. (CA State Auditor 2014-602)
The failures aren't random. Waldo Jaquith (former 18F, White House OSTP) identified the structural causes (source):
30-year cost spiral — Each failed project becomes the cost benchmark for the next one. A $100M failure makes $200M seem reasonable.
No defensible pricing — Vendors provide ballpark figures, agencies average the responses, bids come back near that amount. No internal logic underlying the price tag.
Mega-contracts bundle everything — Build, host, help desk, documentation, maintenance — all crammed into single contracts that are too big to succeed.
Outsourced control — Agencies outsource so thoroughly they lose the ability to control outcomes. The contractor knows the system; the agency doesn't.
Procurement-development disconnect — Contracting says "specify everything upfront" (safest). Software development says "specifying everything upfront is dangerous."
Small projects work. Projects under $1M succeed 90% of the time (Standish Group). When 18F helped California apply modular contracting to the Child Welfare System, they reduced the RFP from 1,500 pages to 10 and created a pool of 11 agile vendors. (18F Blog)
Agentic engineering naturally produces small, testable, modular deliverables — the exact pattern that works in government.
1. PLAN Human defines architecture, requirements, constraints
↓
2. DECOMPOSE Break into well-defined tasks with acceptance criteria
↓
3. EXECUTE AI agent writes code, runs tests, iterates
↓
4. REVIEW Human reviews output against requirements
↓
5. TEST Automated test suite validates correctness
↓
6. GOVERN Security scan, code review, deployment approval
↓
7. DEPLOY Production release through standard CI/CD
An IT Specialist building an internal tool:
Total time: Days to weeks, not months to years.
This is the primary workstream. IT Specialists, developers, cybersecurity analysts, and system engineers use agentic engineering to dramatically increase throughput — with the same headcount.
IT Specialists add agentic engineering as a core competency at every level — not a replacement for technical skill, but an amplifier of it:
| Level | Current Role | With Agentic Engineering |
|---|---|---|
| IT Specialist I | Implements features, fixes bugs | Orchestrates agents for routine development; reviews AI output |
| IT Specialist II | Leads projects, advises management | Designs agent workflows; mentors team on agentic practices |
| IT Specialist III | Expert advisor, strategic leadership | Architects multi-agent systems; sets quality standards |
| IT Supervisor I/II | Manages IT staff, day-to-day ops | Governs agentic workflows; approves deployment of AI-generated code |
| IT Manager I/II | Strategic IT management | Sets department agentic engineering policy; manages risk framework |
The line between "business" and "IT" is blurring. That's fine — we accommodate it with a spectrum of capability, not a binary.
Any state employee uses tools like Poppy, Copilot, or domain-specific AI to accelerate their existing work.
No code. No technical risk. These are productivity tools, governed by existing CDT GenAI guidelines (TL 24-01, SAM 4986.1).
Some business staff have technical skills — scientists, data analysts, engineers, researchers. They can and do build their own tools using AI-assisted development.
These are not enterprise systems. They don't handle PII, don't connect to production databases, and don't serve the public. They're team-level productivity tools built by people who understand both the domain and the technology.
Governance: Department IT reviews for security and data handling. Standard acceptable-use policies apply.
The highest-impact pattern: domain experts partner with IT staff. The domain expert brings irreplaceable knowledge of business rules, eligibility logic, and edge cases. The IT specialist brings architecture, security, and production engineering.
This is where the contractor cost reduction is most dramatic. The domain expert who used to brief a $200/hr contractor on policy rules now works directly with an IT specialist using AI agents. The contractor is no longer needed.
Agentic engineering requires governance that scales with risk — not one-size-fits-all bureaucracy that kills adoption, and not zero oversight that creates liability.
| Tier | Description | Examples | Governance |
|---|---|---|---|
| Tier 1 | Personal productivity | Poppy, AI-assisted drafting, data analysis | Existing CDT GenAI guidelines (TL 24-01, SAM 4986.1) |
| Tier 2 | Internal team tools | Dashboards, automations, non-PII utilities | Department IT review; standard acceptable-use |
| Tier 3 | Enterprise / production | Citizen-facing apps, PII systems, financial systems | Full CDT PAL process; security review; IT-owned deployment |
| Role | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| Individual contributor | Owns their use | Builds with IT review | Provides domain expertise |
| IT Specialist | N/A | Reviews and approves | Builds and deploys |
| IT Supervisor/Manager | N/A | Oversees governance | Approves production release |
| CDT | Sets GenAI policy | Sets standards | Oversees via PAL process |
This aligns with California's existing frameworks: SIMM 5305-F (GenAI risk assessment), CDT Technology Letters (TL 24-01, 24-03, 25-01), and SAM 4986.1 (GenAI policy for responsible use).
Government pays a premium for outsourced development — and the knowledge leaves when the contract ends.
Contractor Costs
State Employee Costs
Sources: CalHR Pay Scales, CDT Career Paths (PDF)
An IT Specialist II with agentic engineering tools producing at 3-5x throughput costs the state roughly $63/hr fully loaded — versus $150-300/hr for the contractor they replace.
That's not a marginal improvement. That's a structural shift in the cost of building government technology.
"Outsourcing builds capacity, but insourcing builds capability." Nearly half of top-performing organizations plan to increase insourcing. (McKinsey Global Tech Agenda 2026)
Contractors are still needed for:
The goal isn't zero contractors. It's reducing dependency on contractors for routine development work that state staff can do better, faster, and cheaper with agentic tools.
| Government | Initiative | Results |
|---|---|---|
| UK Government | AI coding assistant trial — 2,500 licenses across 50+ public sector orgs | 56 min/day saved per developer (28 working days/year); 58% wouldn't go back (GOV.UK Report) |
| Singapore GovTech | Pair chatbot + GitHub Copilot for public officers | 60,000+ users, 46% admin time savings, 21-28% coding speed increase (GovTech Pair) |
| Australia DTA | AI assistant across 60 agencies | 1 hr/day saved per employee across 7,600 users (Microsoft) |
| Agency | Initiative | Results |
|---|---|---|
| US Treasury | AI fraud detection | $4B prevented/recovered in FY2024, up from $652.7M — 6x increase (Treasury) |
| GSA | OneGov deal with Anthropic (Aug 2025) | Claude available to all 3 branches for $1 — FedRAMP High (GSA) |
| GAO | Federal AI use case tracking | Use cases doubled from 571 to 1,110 in one year; GenAI up 9x (GAO-25-107653) |
| DoD | GenAI.mil | Secure LLMs for 3 million military, civil service, and contractor personnel (Breaking Defense) |
California is already building the foundation:
The foundation is laid. The next step is moving from AI-as-chatbot to AI-as-engineering-tool.
This is not speculation. The data is in.
Radiology: Geoffrey Hinton said "stop training radiologists" in 2016. By 2025, residency positions hit a record 1,208 — a 4% increase. 75% of FDA-approved medical AI devices target radiology. Demand for radiologists has never been higher. (TS2 Tech)
UK coding trial: Developers didn't lose jobs. They saved 28 working days per year for higher-value work. 58% said they wouldn't go back. (GOV.UK)
Harvard Business School: Study of 19,000 job tasks across 900 occupations found jobs requiring analytical/creative work saw 20% growth in demand post-ChatGPT. (HBS)
Gartner: By 2030, CIOs expect 0% of IT work done by humans without AI, 75% done by humans augmented with AI, 25% by AI alone. The augmented worker IS the future. (Gartner)
Agentic engineering aligns with organized labor's AI principles:
California has the policy foundation (EO N-12-23, CDT Technology Letters) and the awareness training (Foundations of GenAI certificate, 20+ programs). What doesn't exist yet:
| What Exists | What's Needed |
|---|---|
| GenAI awareness (Foundations certificate, ~42 hrs) | Agentic engineering for IT staff (structured development with AI agents) |
| Poppy chatbot training | AI-assisted productivity training for business staff by role |
| Ad hoc experimentation | Formalized training paths with certification |
| What Exists | What's Needed |
|---|---|
| SIMM 5305-F (GenAI risk assessment for procurement) | Risk-tiered governance for AI-generated code |
| CDT PAL process (for major projects) | Lightweight governance for Tier 2 internal tools |
| GenAI acceptable use policies | Code-specific policies: version control, testing, review requirements |
| What Exists | What's Needed |
|---|---|
| Poppy (chatbot/assistant) | Development-grade AI tools (Claude Code, Copilot) with state security |
| Individual experimentation | Shared infrastructure: repos, CI/CD, testing frameworks for AI-assisted development |
The question is no longer whether AI changes how government builds technology.
The question is whether California leads or follows.