Evolving the OCaml Programming Language (2025) [pdf]

148 matt_d 34 9/5/2025, 12:05:50 AM kcsrk.info ↗

Comments (34)

kcsrk · 15h ago
I am the author of the talk here o/.

This talk is a _subjective_ take on how the OCaml programming language evolves, based on my observations over the last 10 years I've been involved with it. My aim/hope is to demystify the compiler development process and the tradeoffs involved and encourage more developers to take a shot at contributing to the OCaml compiler.

Happy to answer questions, but also, more importantly, hear your comments and criticisms around the compiler development process, ideas to make it more approachable, etc.

rwmj · 13h ago
I'm at a conference at the moment so can't give a lengthy answer, but I'm the maintainer of virt-v2v, one large open source OCaml project (large if you include all the dependencies) which generates actual multi-millions in annual revenue, but is often overlooked in all this discussion of the OCaml ecosystem. Glad to talk by email some time.

[BTW we currently have open positions for two developers]

kcsrk · 7h ago
It was overlooked because I didn't know!

Great to hear about virt-v2v. I will reach out to you by email.

alabhyajindal · 9h ago
This is not a direct comment on compiler development but on industrial projects in general: how do you begin contributing to something that is so large?

What should a beginner in compiler development, someone who has written a few compilers of their own, do to get involved in a project such as OCaml? I understand this issue is not specific to compilers, but is faced by any sufficiently large project. Still, I think it's an important issue. I believe there are many resources for people to get up and running in a field but not enough for them to make the next jump into industrial projects.

rwmj · 7h ago
(I think this advice applies to most large open source projects)

Make sure you have installed and are using the software. Ideally you'd have an ongoing interest in it because it's something you use regularly (whether personally or for work).

Read first, especially the documentation, guidelines to contributing, mailing lists / Github issues / however else the upstream maintainers engage with each other.

Start small. Actually a great place is just to go and fix spelling mistakes and typos in documentation, code, comments, etc. Follow the guidelines for contributing to the letter, even if they appear over-complicated at first.

After you've engaged with small patches, build up. Look through their issues and (since you're using the software every day) find something that is an "itch" that you want to "scratch", and attempt to fix that.

I don't really need to go further because either at some point in this process you'll have become discouraged (for good or bad reasons), or you'll have found your community and will want to contribute more and more.

alabhyajindal · 7h ago
Thank you - I appreciate it!
ofrzeta · 14h ago
To be honest the story about the two closed PRs for dynamic arrays doesn't really inspire contributions :)
kcsrk · 14h ago
You are right that the dynamic arrays story does not read like a straightforward “how to inspire contributions.” But part of what I wanted to do in the talk was to show things as they actually unfolded. In OCaml compiler development, there is a very strong emphasis on correctness and long-term stability. That can make contributions, especially to core language features, feel harder than they might in faster-moving ecosystems.

The dynamic arrays case is a good illustration. What began as a small PR grew into years of design iterations, debates about representation, performance, and multicore safety, and eventually a couple of thousand lines of code and more than 500 comments before it landed. From one perspective, that looks discouraging. From another, it shows the weight we place on getting things right, because once a feature ships, it is very hard to undo.

That tension, between wanting to be open and encouraging contributions but also needing to protect stability, is something I think we should be talking about openly. My hope is that by making the process more visible we can demystify it and help contributors understand not just what happened, but why.

octachron · 12h ago
A point that I find missing in the timeline for dynamic array is that there have been implementation for dynamic arrays available in libraries for more than twenty years.

However, none of the authors of those libraries were really happy with their own implementation because those implementations had to choose between performance, API usability or thread safety.

When I closed the student pull request (which was a naive implementation with no unsafe features), it was with the idea that it was unfair to expect a beginner use to solve those issues.

The subsequent iterations explored different part of the design space before the final iteration which converged to safely using unsafe language features to reach a new local API optimum.

sidkshatriya · 11h ago
I think this is the tension in most software. If you want to have excellent and correct software it will take time.

And if you want more features with a "fix as you go approach" you will often have huge technical debt and get saddled with poor interfaces, often forever.

But, I think OCaml errs too much on the side of getting it right the first time. The result is that state of the art keeps moving far ahead. By the time OCaml "catches up" the field of programming languages has moved far ahead. So OCaml always remains the Jack of all trades and the master of none (IMHO).

I like the direction OxCaml is taking. But the problem is that no one has another 10 years to see its learnings get folded back into OCaml. There is a real chance that OxCaml may diverge so much that it becomes impractical to merge it into OCaml. Flambda2 is another great piece of software that may also take a long time to come into OCaml proper.

So I feel that things need to be "speeded up" if OCaml has to become a bigger ecosystem. You can see that some big projects are moving away from OCaml -- facebook for instance used to have their python typechecker in OCaml. Their new one, pyrefly is in Rust. This could be an isolated story, no doubt.

sidkshatriya · 11h ago
Now OCaml values adding features carefully to the language so that there is no future regret. But being slow and conservative has _not_ minimized regret. The "O" in OCaml i.e. Objects (and Classes) is almost ignored nowadays. Janestreet, a large industrial user seems to be actively against using the "O" part of OCaml.

So here we have gotten the worst of both worlds -- a language that is evolving slowly and a language that has large features that are almost soft discouraged. My primary language is Rust and not OCaml (mostly dabble in OCaml) so I may not fully know what I'm talking about when it comes to OCaml.

jact · 8h ago
The distaste for the OCaml object system is mostly misplaced in the community. While first class modules can mostly replace them — sometimes you really need open recursion. Object types are also a very useful feature used by core libraries.
StopDisinfo910 · 7h ago
Ocaml objects are structurally typed which can also be very nice. They definitely have their place.
spit2wind · 2h ago
Interesting. I'm still working my way through Correct+Efficient+Beautiful. My takeaway so far has been that Modules _are_ the "O" part of OCaml. I guess there's something more "traditionally" an object? Does that mean there were modules in Caml (or whatever the predecessor was) and it was decided classes might be a good feature to add?
debugnik · 9h ago
> and a language that has large features that are almost soft discouraged

It's literally just objects, one large (and early!) feature. Arguably too large compared to the rest of the language: first-class modules or polymorphic variants can handle most of their use cases while being much simpler, and faster than the existing class system. (Objects and object types without the actual classes are maybe ok.)

The only other controversial feature I can think of is Seq and that's just because it can be allocation-heavy. Then again ordinary OCaml lists are not much cheaper (thankfully immutable arrays are already in for 5.4).

StopDisinfo910 · 6h ago
> By the time OCaml "catches up" the field of programming languages has moved far ahead.

Hard to reconcile with the fact that Ocaml had 90% of the features people like in Rust today twenty years ago, a module system which is still better than Haskell, and is currently implementing a full effect system.

It still pretty much ahead of every mainstream languages.

rwmj · 10h ago
OCaml is doing just fine, thanks.
aseipp · 13h ago
I think it's just the nature of the beast, in this case. Serious "industrial" implementations of a programming language might stick around for a long time, and breaking things a lot can mar the appeal; getting it right the first time pays off in that case.

I think the acceptance threshold can be much lower in other kinds of tooling. "It is what it is", so to speak.

Quekid5 · 8h ago
Add a sane deprecation process and this is much less of an issue -- see e.g. the Java language. Sure, it's not ideal to have multiple implementations of the 'same' data structure (if a better way is found, say)... but at least you aren't stalling everything and causing API interop issues for years and years.
klodolph · 14h ago
Maybe what I read here is “this is how contributions go”…

Get the API right first. Make sure it’s correct, safe, and useful. Iterate on the performance afterwards.

IMO, a lot of contributions should take this shape.

kcsrk · 14h ago
It is often hard to see the shape of these things before a serious PR attempt is made. Each of the PRs reveals more of the shape of the problem being solved. Hard to skip them in practice, especially for new contributors.
zerr · 10h ago
I remember the go to alternative "standard" library was being developed by some bank from Wall Street. Is it still the case? i.e. do most people still use that 3rd party lib or did the real standard library evolve since then?
debugnik · 9h ago
My impression is that most people, as in a majority, aren't using Jane Street's Base and Core. Maybe some or even many, but not most, and specially not in the FOSS ecosystem. I think this idea comes from so many learning materials using their libs, you feel kind of funneled towards them at the start.

But yes, the standard library has added many helper functions that were sorely needed during the last few years, and the upcoming 5.4 keeps adding more. Still not as many goodies as Jane Street's libraries, but nowadays I don't miss them as long as I can use just a few small libraries, mostly by dbunzli and c-cube.

aguluman · 7h ago
Is stdlib the original then base and core are extensions?
debugnik · 6h ago
Yes, although they're replacements more than extensions. Stdlib is OCaml's builtin standard library, Base is Jane Street's lightweight replacement of Stdlib (although the most basic types are compatible), and Core is Jane Street's full standard library extending Base.

There's nothing inherently wrong with them, aside from their API being unstable, but they're an opinionated wedge in an ecosystem already lacking the cohesion of newer languages.

frou_dh · 9h ago
That's https://opensource.janestreet.com/core/. However I think its importance is often a bit overblown. I doubt MOST people were choosing it at any point, never mind today.
rs186 · 7h ago
That's my concern for even adopting OCaml for my hobby projects. Poking around Jane Street's "alternative" standard library does not give me much confidence about its state. Just the fact that there are Core and Base isn't encouraging. If I ever use OCaml in a project, I want to spend time getting things done, not looking for implementations or writing code myself (unless I am really in a mood to do so).
kubb · 11h ago
For people using OCaml, there’s one thing that kinda discourages me in it, that is exceptions as part of the API in the standard library.

Because exceptions aren’t checked, this effectively means that a language designed for type safety has as much type safety as python, because it’s very easy to forget handling something, and get runtime errors.

How do you deal with this day to day? I assume it’s impossible to just believe that all the code you pull in doesn’t use exceptions?

yawaramin · 1h ago
Every mainstream language has exceptions. Everyone knows how to use exceptions. They're easy to use and get the job done. OCaml suffers no type safety issues from the use of exceptions. It also has option and result types so people who need more control flow can use those. The OCaml standard library typically uses exceptions for real exceptional conditions like eg trying to access a key that doesn't exist in a map. Even Rust has panics which are basically exceptions.

You criticized Haskell as not a great example of error handling. Well, Erlang/Elixir also have exceptions, and they are considered the industry leader in error recovery.

Exceptions are actually fine, it doesn't really take much to install handlers which take care of catching, logging, telemetry, re-raising etc. They mostly get a bad rep because of the latest fashions in the PL space.

yodsanklai · 9h ago
> as much type safety as python

That's an exaggeration.

You can use error types / monads like you would do in Rust/Haskell. When you use the Core standard library, you can use function who don't throw exceptions. Those who do use specific name conventions (foobar_exn).

debugnik · 8h ago
> as much type safety as python

There's no type unsafety from unchecked exceptions, because uncaught exceptions are not unsound. Even Haskell has them (error and undefined), because from a theoretical standpoint they're equivalent to reaching an infinite loop. (Now, recovering from an exception isn't unsound either, but it might mess with your usual mutable invariants.)

In more practical terms, concerning overall correctness, OCaml has been adding option-returning variants of those functions, so most exceptions raised from the stdlib nowadays are much more likely to be intended by the author.

ux266478 · 4h ago
I don't think Haskell is a good language to model our idea of error handling off of. It's one of many bugbears I have with that language, that it uses the Maybe monad as an error type. It technically works, but doesn't provide a meaningful distinction between "This function might not return anything, and this is defined behavior" and "This function has a singularity". MonadError exists, but I can't think of anywhere it shows up without digging deep into dragon caves of the compiler. Everything a normal user is going to touch will deal exclusively in Maybes.

I'm not a fan of Rust as a language for many reasons, but I will give it credit for making proper usage of the Result monad. They could have abused Option the same way Haskell abuses Maybe, but they didn't.

debugnik · 3h ago
I was just using Haskell's reputation to push back on the "as much type safety as python" hot take.

> [Haskell] doesn't provide a meaningful distinction between "This function might not return anything, and this is defined behavior" and "This function has a singularity"

I think Haskellers should fear divergence less, or push for SPARK-like static checking. In OCaml, the current trend would be to represent "not return anything" as None; and "has a singularity" by raising Invalid_argument or similar when the singularity check was considered a precondition, or returning Error (or an equivalent variant) for expected inputs.

Usage of Result in OCaml is also growing, thankfully. It's part of the stdlib, and we can use binding operators (let* foo = result) to do the same as ? in Rust (or let! in F#). OCaml 5.4 is even adding a Result.Syntax module so we can just open it instead of defining (let*) ourselves.

On the other hand, Result doesn't give us backtraces, and composes badly with other combinators or imperative flow. In my current project I'm instead giving a try to an effectul result_scope/get_ok API, which composes better.

johnisgood · 8h ago
Your comment does not make much sense even if it is true.

Factor (Forth-like language) implements even its own ":" (defines a word, i.e. a function) using the language itself, it is not builtin, same with "if", and so forth. Thus, "MEMO:" or locals[1] ("::") being implemented as a library does not mean it is a bad thing, on the contrary, in the case of Factor, it makes it quite powerful. The object system is entirely implemented in Factor, too. "Large chunks of functionality are not part of the core language, they are in just as library".[2]

And to compare OCaml's type system to Python's is straight out absurd.

[1] Locals are entirely implemented in Factor, too, which is only about ~500 lines of code. It is not part of the core language, and on top of that, there is no performance penalty whatsoever!

[2] See more here: https://www.youtube.com/watch?v=f_0QlhYlS8g.