🖙'AI will make formal verification go mainstream' Sure Seems Like Wishful Thinking
Martin Kleppman has recently argued that Prediction: AI will make formal verification go mainstream. Huge if true!
To be clear, this isn't one of those thoroughly rotten takes that makes me think less of the writer, like the stuff Sam Altman says approximately once every six news cycles, or Thomas H. Ptacek's weirdly hostile post about his friends who don't like LLMs.
That being said...I have a lot of doubts.
Part of my reason for skepticism is that AI is relentlessly pitched as a way to go faster. Formal verification, for all its benefits, generally makes you go slower. Kleppman's logic seems to be that Speed(Formal verification + LLMs) >= Speed(regular coding). I'm not sure that's true, but even if it is, I'm not sure companies will want to give up much of the (perceived) speed benefit, even if they're still somewhat faster on net.
The other thing is that formal verification seems like the kind of thing that would take a long time to pay off, and, with the exception of people planning multi-year data center construction projects, AI has not been a situation where people are exercising patience.
I also strongly suspect AI is appealing because it allows for spending less on employees. Having them learn formal verification would make them cost more, both initially as your current developers have to take time to learn it, and over time as your less-senior developers have a longer learning curve.
Now, these observations are for the industry as a whole, given that Kleppman was saying it would go "mainstream." But perhaps LLMs will still make them mainstream among more quality-conscious developers, the sort of people who test pretty aggressively but haven't added formal verification yet. As critical as I was of the start, maybe someone like Thomas Ptacek would do it at a place like Fastly. Ptacek seems like a genuinely capable guy, hostility in that area aside. That's more likely to me, and I would be genuinely interested in seeing their results, even if what I really want is for developers to demand more and higher ethical standards for what they produce and the tools they use.
We'll certainly see experimentation along these lines. Simon Willison recently ported an HTML parsing library to JavaScript as an experiment, so there's certainly a lot of experimentation with LLMs happening.
Beyond industry and societal issues, the more technical reason I'm skeptical is that AI coding assistants sometimes will just delete stuff leading to errors rather than meaningfully fix them. It seems like an AI might just loosen specifications, add exceptions, or even delete them. Perhaps it's clear enough when they do that for it not to be a problem or there are other features of formal verification that make that harder. However, given the average developer doesn't have much experience, I'm not very confident they will build that experience. And even if they catch the assistant cheating, they might not know enough to prod it in the right direction or fix it themselves.
If I'd have to guess, Martin Kleppman has a lot more familiarity with formal verification than I do, so perhaps he's thought of this issue. Of course, a PhD doesn't prevent you from talking out of your ass. (For some people, it merely enables them to do it with greater authority in front of larger audiences.) Still, my money's on Kleppman knowing more than I do.
The other technical reason is that there's not as much formal verification code to learn from, and I don't think formal verification is quite similar enough to other languages.
Interestingly, formal verification has been seeing more attention on Google. It does get less than other testing methodologies. Looking at the data behind GitHub's "innovation graph" shows that some of the most common theorem provers and formal verification languages are varying and seem to have peaked as a group in 2023, although 2025 is incomplete and if scaled up (doubled because we have just the first half of the year), it would be in the ballpark of 2023. This depicts the number of people who've pushed code in that language. (GitHub chose the...interesting...name "pushers" as shorthand.)
| language | '20H1 | '20H2 | '21H1 | '21H2 | '22H1 | '22H2 | '23H1 | '23H2 | '24H1 | '24H2 | '25H1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Agda | 244 | 429 | 259 | 271 | 306 | 283 | 982 | 1011 | 274 | 267 | 347 |
| Alloy | 156 | 287 | 1013 | 1015 | 318 | 259 | 311 | ||||
| Coq | 5387 | 4315 | 3060 | 2671 | 3220 | 1805 | 1782 | 1736 | 2038 | 2566 | 2294 |
| Frege | 115 | ||||||||||
| Idris | 104 | 405 | 103 | 212 | 104 | 110 | 790 | 720 | 118 | 109 | |
| Lean | 112 | 232 | 254 | 391 | 348 | 464 | 475 | 821 | 1141 | 1636 | 2219 |
| TLA | 390 | 394 | 430 | 464 | 483 | 612 | 697 | 1234 | 916 | 715 | 933 |
| Total Result | 6237 | 5890 | 4106 | 4009 | 4617 | 3561 | 5739 | 6537 | 4687 | 5561 | 6213 |
The total row probably counts people twice.
Of course, the prediction is that formal verification will go mainstream, not that it already has. So the fact it there isn't clear growth is mostly absence of evidence, especially since the first half of 2025 was pretty strong and the Google Trends data suggest that either it got started very early following the first releases of GitHub copilot or there's some other trend increasing people's interest.
I'd love to be wrong about Kleppman's prediction. Formal verification has always seemed like an underutilized tool and one I've been meaning to learn some day myself. And of course, as a software user I'd love if software were more correct and reliable. The best case is that formal verification ends up being less scary once they've seen LLMs produce it, developers actually learn what it is about, and software gets genuinely better. The worst case is probably that developer unfamiliarity prevents the vast majority of verification from actually verifying much of anything, and it becomes a reflexive defense against critics, a rhetorical counter-move for when people bring up hallucinations.
Since I made this table aggregated by year already, here it is
| language | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 |
|---|---|---|---|---|---|---|
| Agda | 673 | 530 | 589 | 1993 | 541 | 347 |
| Alloy | 443 | 2028 | 577 | 311 | ||
| Coq | 9702 | 5731 | 5025 | 3518 | 4604 | 2294 |
| Idris | 509 | 315 | 214 | 1510 | 118 | 109 |
| Lean | 344 | 645 | 812 | 1296 | 2777 | 2219 |
| TLA | 784 | 894 | 1095 | 1931 | 1631 | 933 |
| Total Result (Selected) | 12012 | 8115 | 8178 | 12276 | 10248 | 6213 |