Writing SQL with AI in 2026: what works, what does not
An honest review of LLM-generated SQL after two years in production: where the models are excellent, where they still fail, and the engineering patterns that close the gap.
After running LLM-generated SQL in production for over two years across PostgreSQL, MySQL, SQL Server and ClickHouse, the picture is clearer than it was. The models got dramatically better. The failure modes got more specific.
A practical, opinionated review.
What works well
The current generation of frontier models (mid-2026) is excellent at:
- Single-table aggregations: totals, counts, group-bys with one or two filters. Essentially never wrong.
- Two- and three-table joins when the schema is reasonably normalised and column names are clear.
- Window functions (
ROW_NUMBER,LAG,LEAD,SUM OVER), including the rightPARTITION BY90%+ of the time. - CTE structure: when given a complex question, models tend to decompose into 2–4 CTEs that are readable. This was unreliable in 2024.
- Translating between dialects: Postgres → MySQL, MySQL → SQL Server. With minor cleanup.
- Explaining existing SQL in plain language. Very strong.
For these cases, prompt the model with the schema and a few-shots from your domain and you are 95%+ accurate.
What still does not work
The failure list is more interesting:
1. Implicit business logic
“Active customers” means different things in different companies: paying, not churned, with at least one order in 90 days, with non-zero balance. The model can’t know which. It picks one and runs.
Fix: encode business definitions as views (active_customers, core_revenue, gross_margin) and only expose those views to the NL layer. The semantic layer earns its keep here.
2. Date handling
Astonishingly still a top failure mode. “Last quarter”: fiscal or calendar? “Yesterday”: IST or UTC? “Month-to-date”: including today or up to yesterday?
Fix: in the system prompt, explicitly define fiscal year start, timezone, “today” reference and what “last X” means. Two paragraphs, prevents 30% of the bugs.
3. NULL semantics
The classic ones still trip models:
WHERE x != 'a'excludes NULL rows. The user usually wanted them included.COUNT(col)ignores NULL butCOUNT(*)does not. Subtle, often wrong.LEFT JOIN ... WHERE right.col IS NULLfor “missing from right” gets confused with regular filters.
Fix: train a few-shots set with explicit NULL handling and reject SQL that uses != on a nullable column without a IS NULL companion clause.
4. Performance-unaware joins
Models will happily write a five-table cross-join that runs in 40 seconds. They don’t see EXPLAIN.
Fix: run EXPLAIN before execution. If estimated cost > threshold, route to a “this looks expensive, confirm?” path.
5. Subtle math errors
Percentage growth where the model writes (new − old) / new instead of / old. Ratios where it forgets to cast to a float and gets integer division.
Fix: integration tests with golden questions, run nightly. When a model version changes, the bad SQL gets caught before the user does.
The engineering patterns that close the gap
Five patterns that take an LLM-SQL system from “demo-grade” to “production-grade”:
Pattern 1: Semantic layer
Pre-build views (or dbt models) that encode business definitions. The LLM only sees the semantic layer, not raw tables. This is the highest-ROI investment.
Pattern 2: Few-shot rotation
A curated set of 30–80 (question, SQL, output) tuples per major data domain. Retrieve the 3–5 most relevant to the user’s question before prompting. Curate weekly from real user failures.
Pattern 3: Two-pass generation
For complex questions, two LLM calls:
- Plan: list the tables, the filters, the joins, the metric, the date range. No SQL.
- SQL: given the plan, write the SQL.
The plan pass is fast (small output) and surfaces ambiguity. If the plan is wrong, ask the user before generating SQL.
Pattern 4: Validation layer
Before execution:
- Parse to AST.
- Reject DDL/DML.
- Allow-list tables and schemas.
- Inject mandatory predicates (tenancy, soft-delete).
- Wrap in
LIMITif not present.
This is non-negotiable for production.
Pattern 5: Result review loop
After execution, a quick LLM pass to sanity-check the result:
- Does the row count look reasonable?
- Does the unit match? (₹ vs %)
- Are there obviously wrong values (negative for non-signed metrics)?
If the check fails, retry the SQL with a corrected prompt before showing the user.
When to NOT use LLMs for SQL
Three cases where deterministic SQL beats LLM SQL:
- Production reports: daily owner dashboard, monthly board pack. Write the SQL once, schedule it, version-control it. LLMs are for ad-hoc.
- Compliance numbers: GST returns, statutory filings. Use deterministic SQL with explicit sign-off.
- Customer-facing metrics: what your app shows to users. Hardcode it; don’t have an LLM mediate.
LLM SQL shines for the long tail of unique questions that nobody will ask twice. That is where it earns its place.
Doing this with AnalytAI
AnalytAI runs the patterns above by default: semantic layer, two-pass generation, validation, result review, scheduled reports for the things that should be deterministic. Connect a SQL database, ask anything, see the SQL and the rationale.
Book a 20-minute demo. Bring a read replica connection string.
Related reads:
More on Developer
-
10 May 2026
Natural-language query layer for your SaaS: build, buy or embed
If you run a B2B SaaS and customers want to ask questions of their own data, here is the build-vs-buy-vs-embed analysis, with the architecture sketches for each path.
-
1 May 2026
Auto-generate charts from any SQL database: patterns that actually work
How to go from an arbitrary SQL result to the right chart: column-shape rules, a default-picker algorithm, when to override, and the common traps.
-
28 Apr 2026
Talk to your PostgreSQL database in plain English: a developer's guide
How natural-language-to-SQL actually works on PostgreSQL: schema introspection, retrieval, prompt design, sandboxing, and the failure modes that don't show up in demos.
See AnalytAI on your data
Bring one Tally company or one SQL database. We will turn it into a live dashboard on a 20-minute call.
Book a demo