Skip to main content

Alys+

You like actuallyalys. You've always liked posts. Don't you deserve the smooth, luxury takes of...Alys Plus?

🖙'AI will make formal verification go mainstream' Sure Seems Like Wishful Thinking

written by alys on

Martin Kleppman has recently argued that Prediction: AI will make formal verification go mainstream. Huge if true!

To be clear, this isn't one of those thoroughly rotten takes that makes me think less of the writer, like the stuff Sam Altman says approximately once every six news cycles, or Thomas H. Ptacek's weirdly hostile post about his friends who don't like LLMs.

That being said...I have a lot of doubts.

Part of my reason for skepticism is that AI is relentlessly pitched as a way to go faster. Formal verification, for all its benefits, generally makes you go slower. Kleppman's logic seems to be that Speed(Formal verification + LLMs) >= Speed(regular coding). I'm not sure that's true, but even if it is, I'm not sure companies will want to give up much of the (perceived) speed benefit, even if they're still somewhat faster on net.

The other thing is that formal verification seems like the kind of thing that would take a long time to pay off, and, with the exception of people planning multi-year data center construction projects, AI has not been a situation where people are exercising patience.

I also strongly suspect AI is appealing because it allows for spending less on employees. Having them learn formal verification would make them cost more, both initially as your current developers have to take time to learn it, and over time as your less-senior developers have a longer learning curve.

Now, these observations are for the industry as a whole, given that Kleppman was saying it would go "mainstream." But perhaps LLMs will still make them mainstream among more quality-conscious developers, the sort of people who test pretty aggressively but haven't added formal verification yet. As critical as I was of the start, maybe someone like Thomas Ptacek would do it at a place like Fastly. Ptacek seems like a genuinely capable guy, hostility in that area aside. That's more likely to me, and I would be genuinely interested in seeing their results, even if what I really want is for developers to demand more and higher ethical standards for what they produce and the tools they use.

We'll certainly see experimentation along these lines. Simon Willison recently ported an HTML parsing library to JavaScript as an experiment, so there's certainly a lot of experimentation with LLMs happening.

Beyond industry and societal issues, the more technical reason I'm skeptical is that AI coding assistants sometimes will just delete stuff leading to errors rather than meaningfully fix them. It seems like an AI might just loosen specifications, add exceptions, or even delete them. Perhaps it's clear enough when they do that for it not to be a problem or there are other features of formal verification that make that harder. However, given the average developer doesn't have much experience, I'm not very confident they will build that experience. And even if they catch the assistant cheating, they might not know enough to prod it in the right direction or fix it themselves.

If I'd have to guess, Martin Kleppman has a lot more familiarity with formal verification than I do, so perhaps he's thought of this issue. Of course, a PhD doesn't prevent you from talking out of your ass. (For some people, it merely enables them to do it with greater authority in front of larger audiences.) Still, my money's on Kleppman knowing more than I do.

The other technical reason is that there's not as much formal verification code to learn from, and I don't think formal verification is quite similar enough to other languages.

Interestingly, formal verification has been seeing more attention on Google. It does get less than other testing methodologies. Looking at the data behind GitHub's "innovation graph" shows that some of the most common theorem provers and formal verification languages are varying and seem to have peaked as a group in 2023, although 2025 is incomplete and if scaled up (doubled because we have just the first half of the year), it would be in the ballpark of 2023. This depicts the number of people who've pushed code in that language. (GitHub chose the...interesting...name "pushers" as shorthand.)

language '20H1 '20H2 '21H1 '21H2 '22H1 '22H2 '23H1 '23H2 '24H1 '24H2 '25H1
Agda 244 429 259 271 306 283 982 1011 274 267 347
Alloy 156 287 1013 1015 318 259 311
Coq 5387 4315 3060 2671 3220 1805 1782 1736 2038 2566 2294
Frege 115
Idris 104 405 103 212 104 110 790 720 118 109
Lean 112 232 254 391 348 464 475 821 1141 1636 2219
TLA 390 394 430 464 483 612 697 1234 916 715 933
Total Result 6237 5890 4106 4009 4617 3561 5739 6537 4687 5561 6213

The total row probably counts people twice.

Of course, the prediction is that formal verification will go mainstream, not that it already has. So the fact it there isn't clear growth is mostly absence of evidence, especially since the first half of 2025 was pretty strong and the Google Trends data suggest that either it got started very early following the first releases of GitHub copilot or there's some other trend increasing people's interest.

I'd love to be wrong about Kleppman's prediction. Formal verification has always seemed like an underutilized tool and one I've been meaning to learn some day myself. And of course, as a software user I'd love if software were more correct and reliable. The best case is that formal verification ends up being less scary once they've seen LLMs produce it, developers actually learn what it is about, and software gets genuinely better. The worst case is probably that developer unfamiliarity prevents the vast majority of verification from actually verifying much of anything, and it becomes a reflexive defense against critics, a rhetorical counter-move for when people bring up hallucinations.

Since I made this table aggregated by year already, here it is

language 2020 2021 2022 2023 2024 2025
Agda 673 530 589 1993 541 347
Alloy 443 2028 577 311
Coq 9702 5731 5025 3518 4604 2294
Idris 509 315 214 1510 118 109
Lean 344 645 812 1296 2777 2219
TLA 784 894 1095 1931 1631 933
Total Result (Selected) 12012 8115 8178 12276 10248 6213
1186 words; 42 sentences
Stats for nerds
  • 1186 words
  • 42 sentences
  • 28.24 words/sentence
  • Flesch-Kincaid Reading Ease: 10.99 (grade: 11)
  • Dale-Chall Reading Ease: 10.94 (grades: college_graduate)

posts from friends

Untitled post

when I first think about it I think that people who are like "(service game) is bad right now. I still play it every day and I follow it very closely, but it's just really miserable right now" have a sickness, but then I remember that sports t…

via topposts.net January 19, 2026

Review: The Isle in the Silver Sea by Tasha Suri

My review of the sapphic romantasy novel The Isle in the Silver Sea by Tasha Suri. I read the hardcover edition published in 2025 by Orbit.

via Nullrouted Space January 17, 2026

new music roundup: june 2025

hey friends, gonna start this one off with a request: i’m currently without stable housing and without employment, and i’m in extremely dire financial straits. as of this writing my account is overdrafted and i have very limited options for fixing that. i…

via BLOOD CHURCH June 6, 2025

Generated by openring

alys