WebMonkeys: parallel GPU programming in JavaScript (2016)

114 surprisetalk 25 5/4/2025, 5:00:04 PM github.com ↗

Comments (25)

butokai · 2d ago
By coincidence I was just having a look at the work by the same author on languages based on Interaction Nets. Incredibly cool work, although the main repos seem to have been silent in the last couple of months? This work however is much older and doesn't seem to follow the same approach.
mattdesl · 2d ago
The author is working on a program synthesizer using interaction nets/calculus, which should be released soon. It sounds quite interesting:

https://x.com/VictorTaelin/status/1907976343830106592

FjordWarden · 2d ago
WebMonkeys feels a bit like array programming, you create buffers and then have a simple language to perform operations on those buffers.

HVM is one of the most interesting developments in programming languages that I know off. I just don't know if it will prove to be relevant for the problem space it is trying to address. It is a very difficult technology that is trying to solve another very complex problem (AI) by seemingly sight stepping the issues. Like you have to know linear algebra and statistics to do ML, and they are saying: yes and you have to know category theory too.

foobarbecue · 1d ago
FYI, just in case you didn't know, it's "side-stepping," not "sight-stepping."

Thanks for introducing me to the concept of higher-order virtual machines.

Anduia · 2d ago
The title should say 2016
kreetx · 1d ago
Unfortunately this is not maintained since 2017: https://github.com/VictorTaelin/WebMonkeys/issues/26

Are there other projects doing something similar on current browsers?

kaoD · 1d ago
Still a draft, experimental and not widely used[0], but WebGPU[1] will bring support for actual compute shaders[2] to the web.

It's much more low level than these "web monkeys" but I'd say if you really need GPU performance instead of toy examples like squaring a list of numbers, you really need to go low level and understand how GPU threads and work batching works.

[0] https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API

[1] https://en.m.wikipedia.org/wiki/WebGPU

[2] https://webgpufundamentals.org/webgpu/lessons/webgpu-compute...

kreetx · 1d ago
With "going low level" do you mean leaving the browser all together and shipping a native application?

Although I currently don't need anything like this for work then still, the use case I see for GPU use in browser is that it's often times the easiest way to run a program on the user's machine - anything else requires an explicit install.

kaoD · 1d ago
I meant to compare abstract-ish stuff (like these monkeys) vs actual low-level within the GPU realm, i.e. thinking in GPU architecture terms. E.g. appropriately choosing a workgroup[0] size, optimizing your buffer layouts for specific access patterns, knowing when and how to read/write from/to VRAM, when to (or if) split into multiple stages, etc.

I see space for abstractions over this mess of complexity[1] but there's not a lot of room for simplification.

It's almost like thinking in bare-metal terms but the GPU driver is your interface (and the browser's sandbox of course).

Although WGSL is not low-level itself (in the sense that you're not writing SPIR-V) that's for a good reason because it needs to be portable and each vendor does their own things so it's often hardware dependent.

Going native will still help with performance AFAIK (the aforementioned sandbox has a cost for example) but I agree with you. I love the web as a platform.

[0] https://gpuweb.github.io/gpuweb/wgsl/#compute-shader-workgro...

[1] https://developer.chrome.com/docs/capabilities/web-apis/gpu-...

sylware · 2d ago
Maybe the guys here know:

Is there a little 3D/GFX/game engine (plain and simple C written) strapped to a javascript interpreter (like quickjs) without being in apple or gogol gigantic and ultra complex web engines?

Basically, a set of javascript APIs with a runtime for wayland/vulkan3D, freetype2, and input devices.

FjordWarden · 2d ago
You can access the gpu without a browser using Deno[1] (and probably Node too if you search for it).

Not to be patronising here, but if you are looking for something that makes 3D/GFX/game programming easier without all the paralysing complexity, you should recalibrate how hard this is going to be.

[1] https://windowing.deno.dev/

gr4vityWall · 1d ago
You can use Node.js or Bun with bindings for stuff like raylib or SDL.

Examples:

https://github.com/RobLoach/node-raylib https://github.com/kmamal/node-sdl

afavour · 1d ago
I assume OP mentioned QuickJS specifically because they're looking for a tiny runtime. Node and Bun aren't that.
gr4vityWall · 22h ago
Ahh, my bad. That makes sense.

I wonder if QuickJS not having JIT capabilities could have a noticeable impact in a bigger game.

sylware · 1d ago
And if it could compile without requiring gcc or clang, for instance cproc, tinycc, scc, etc...
ronsor · 1d ago
You could take raylib (https://www.raylib.com) and bolt quickjs to that.
rossant · 1d ago
https://datoviz.org will have a webgpu js backend in a year or so.
jkcxn · 1d ago
You can quite easily make bindings for raylib/sokol-gpu/bgfx from Bun
chirsz · 2d ago
You could use Deno with WebGPU.
zackmorris · 1d ago
This is cool but doesn't actually do any heavy lifting, because it runs GLSL 1.0 code directly instead of transpiling Javascript to GLSL internally.

Does anyone know of a Javascript to GLSL transpiler?

My interest in this is that the world abandoned true multicore processing 30 years ago around 1995 when 3D video cards went mainstream. Had it not done that, we could have continued with Moore's law and had roughly 100-1000 CPU cores per billion transistors, along with local memories and data-driven processing using hash trees and copy-on-write provided invisibly by the runtime or even in microcode so that we wouldn't have to worry about caching. Apple's M series is the only mainstream CPU I know of that is attempting to do anything close to this, albeit poorly by still having GPU and AI cores instead of emulating single-instruction-multiple-data (SIMD) with multicore.

So I've given up on the world ever offering a 1000+ core CPU for under $1000, even though it would be straightforward to design and build today. The closest approximation would be some kind of multiple-instruction-multiple-data (MIMD) transpiler that converts ordinary C-style code to something like GLSL without intrinsics, pragmas, compiler-hints, annotations, etc.

In practice, that would look like simple for-loops and other conditionals being statically analyzed to detect codepaths free of side effects and auto-parallelize them for a GPU. We would never deal with SIMD or copying buffers to/from VRAM directly. The code would probably end up looking like GNU Octave, MATLAB or Julia, but we could also use stuff like scatter-gather arrays and higher-order methods like map reduce, or even green threads. Vanilla fork/join code could potentially run thousands of times faster on GPU than CPU if implemented properly.

The other reason I'm so interested in this is that GPUs can't easily do genetic programming with thousands of agents acting and evolving independently in a virtual world. So we're missing out on the dozen or so other approaches to AI which are getting overshadowed by LLMs. I would compare the current situation to using React without knowing how simple the HTTP form submit model was in the 1990s, which used declarative programming and idempotent operations to avoid build processes and the imperative hell we've found ourselves in. We're all doing it the hard way with our bare hands and I don't understand why.

ralphc · 1d ago
Would your 1000 core CPU do well on neuroevolution?
zackmorris · 15h ago
Thank you, believe it or not I hadn't heard that word so it filled in a piece of the puzzle for me.

https://en.wikipedia.org/wiki/Neuroevolution

https://medium.com/@roopal.tatiwar20/neuroevolution-evolving...

That was the state of the art when I got my ECE degree in 1999. They were using genetic algorithms (GAs) to evolve initial weights for neural net (NN) gradient descent right before the Dot Bomb. Then pretty much all R&D was cut/abandoned overnight, and the 2000s went to offshoring jobs and shuttering factories which led to the Housing Bubble popping. IMHO that set AI research back at least 10 years, maybe 20, at least in the US. I know it derailed my career dreams.

I feel that GPUs by their SIMD nature do poorly with a large number of concurrent processes. For example, a population of GA agents running lisp with each instruction of the intermediate code (icode) tree encoded as bits in a bitfield which evolves using techniques like mutation, crossover and sexual reproduction.

Some of those bits represent conditional branching. Ideally we'd want the opposite of a GPU, more like a transputer, on the order of thousands of independent arithmetic logic units (ALUs) with their own local memories, and perhaps able to use custom instructions. That way each lisp tree can execute concurrently in isolation. There may be branch-avoidance transformations from shaders that could help with this, to transpile the von Neumann SISD and MIMD code we're used to into the narrower SIMD so it can run on GPUs. Companies like Nvidia aren't touching this for some reason, and I don't know if it's just a blind spot or deliberate.

This is one reason why I wanted FPGAs to take off, since they'd make great transputers. Unfortunately they stagnated due to proprietary compilers and a lack of profitable applications. Today it looks like microcontroller units (MCUs) might take their place, but they're still at least 4 orders of magnitude or more too small and 100 times too slow to be cost-effective.

Imagine having a 1 GHz CPU that can run 1000 concurrent UNIX processes running lisp that only communicate through sockets. It's easy to see how mutating them is trivial, then running them as unit tests to see which ones pass is also straightforward. Erlang and Go figured that out years ago.

It's so easy in fact to "just come up with the answer" this way, that I think this technique has been suppressed.

With that foundation and removing the element of time, it's easy to see that this could be a drop-in replacement for gradient descent. Then the work shifts to curating large data and training sets. I suspect that this is the real work of AI research, and that the learning technique doesn't matter so much.

I think of it as: GAs are good for getting to a local minimum error - finding a known-good state. NNs are better for hill climbing - exploring the local solution space.

Another way to say that is, corporations spent billions of dollars to train the first LLM MVPs with gradient descent and other "hands on" techniques. When maybe they could have spent thousands of dollars if they had used GAs instead and "let go".

Now that we're here, refinement is more important, so gradient descent is here to stay. Although I think of LLM blueprints as eventually fitting on a napkin and running on one chip. Then we'll orchestrate large numbers of them to solve problems in a coordinated fashion. At which point it might make sense to use techniques from GAs to create an orchestra of the mind, where each region of an artificial brain is always learning and evolving and bouncing ideas off the others.

ralphc · 14h ago
Erlang figured this out so much you've described the BEAM. You can have hundreds of thousands of small "processes" communicating through BEAM mailboxes on a modern machine. Elixir has protocols and macros, improvements on Erlang, and if you really require lisp there is LFE, Lisp Flavored Erlang, that has true lisp homoiconicity.
qoez · 1d ago
Awesome stuff. Btw: "For one, the only way to upload data is as 2D textures of pixels. Even worse, your shaders (programs) can't write directly to them" With webgpu you have atomics so you can actually write to them.
punkpeye · 2d ago
So what are the practical use cases for this?