Hi, main developer of Nova here if you want to ask any questions! I'm at a choir event for the rest of the week though, so my answers may tarry a bit.
ameliaquining · 3h ago
Have you been following Meta's work on Static Hermes? It's one of two efforts I'm aware of* to define a subset of JavaScript with different runtime semantics, by putting limitations on dynamic behavior. Their primary goal is performance, but correctness benefits also seem likely, and they share your idea that being intended for embedding in a particular application lets you get away with breaking compatibility in ways that you couldn't in a browser. And if their thing ships, and you want to reduce fragmentation, then maybe you want your "sane subset" to match theirs.
* The other being Google's Closure Compiler, which probably isn't relevant to you as it assumes that its output has to run on existing engines in browsers.
eviks · 11h ago
Given the fact that you were so precise in your time estimate on interleaved garbage collection, how long do you think it would take to get to 99% of the tests?
aapoalas · 10h ago
Haha, I think that was a one time fluke! :D
I'm aiming for something like 75-85% this year; basically get iterators properly done (they're in the engine but not very complete yet), implement ECMAScript modules, and then mostly focus on correctness, builtins, and performance improvements after that. 99% would perhaps be possible by the end of next year, barring unforeseeable surprises.
kumavis · 9h ago
have you considered using js polyfills to help you get closer to 100% coverage and then replacing with native implementations prioritized by performance impact?
aapoalas · 6h ago
Not really, no. Its an interesting proposition, but for the most part I believe I'll be sticking it out the "hard way". The ECMAScript spec is fairly easy to read as well, after all. (Nevermind that I spent the single free hour I had today cursing at my incapability of understanding what is going wrong with my iterator code and what it even should do vis-à-vis the spec :D )
Permik · 11h ago
Gz on your grant!
I must have missed the announcement, but working on OSS for a living (even for just a bit) would be super awesome.
aapoalas · 10h ago
Thank you! I've been a bit bad with announcing it; blog post was a month late and all that. But indeed, it's really cool to be able to do this for half a year!
glutamate · 11h ago
This may be too early to ask, but are you targeting a near-v8 level of performance? Or more like quickjs or duktape?
aapoalas · 11h ago
Of course, and thank you for taking the time to ask!
For the foreseeable future the aim will be rather on the QuickJS/DuckTape level than beating V8. But! That is only because they need to be beat before V8 can be beaten :)
I'm not rushing to build a JIT, and I don't even have exact plans for one right now but I'm not barring it out either.
If my life, capabilities, and external support enable it then I do want Nova to either supplant existing mainstream engines, or inspire them to rethink at least some of their heap data structures. But of course it is fairly unlikely I will get there; I will simply try.
afavour · 11h ago
FYI I'm getting an SSL certificate error trying to load the site.
eliassjogreen · 6h ago
It's hosted by GitHub pages with Cloudflare DNS so any issues are probably related to that.
Uses "data-oriented design", so it's likely striving to be faster than other non-JIT runtimes by being more cache-friendly.
Still at early stages, quite incomplete, not nearly ready for real use, AFAICT.
Permik · 11h ago
Essentially implementing JavaScript on top of the ECS architecture :D
aapoalas · 11h ago
Yup! My whole inspiration for this came from a friend explaining ECS to me and me thinking "wouldn't that work for a JS engine?"
SkiFire13 · 10h ago
I've seen this brought up a couple times now, but I never get it. Why would ECS fit a JS engine? The ECS pattern optimizes for iterating over ton of data, but a JS engine does the opposite of that, it need to interpret instruction by instruction which could access random data.
aapoalas · 9h ago
Indeed, there's no guarantee that it will fit: I think it will but I don't know and want to find out.
There are strong (IMO) reasons to think it will fit, though. User code can indeed do whatever but it rarely does. Programs written in JS are no less structured and predictable than ones written in C++ or Rust or any other language: they mostly operate on groups of data running iterations, loops, and algorithms over and over again. So the instructions being interpreted are likely to form roughly ECS System-like access patterns.
Furthermore, it is more likely that data that came into the engine at one time (eg. one JSON.parse call or fetch result) will be iterated through at the same time. Thus, if the engine can ensure that data is and stays temporally colocated, then it is statistically likely that the interpreter's memory access patterns will not only come from System-like algorithms, they will access Component-array like memory.
So: JS objects (and other heap allocated data) are Entities, their data is laid out in arrays of Components (TODO laying out object properties in Component arrays, at least in some cases), and the program forms the Systems. ECS :)
eyelidlessness · 9h ago
Disclaimer: I’m way out of my depth on the theoretical front, despite similarly taking interest in ECS in unconventional places. I’m responding from the perspective of most of my career being in JS/TS.
I think your instincts about program structure are mostly right, but the outliers are pretty far out there.
I’m much less optimistic about how you’re framing arbitrary data access. In my experience, it’s very common for JS code (marginally less common when authored as TS) to treat JSON (or other I/O bound data) as a perpetual blob of uncertainty. Data gets partially resolved into program interfaces haphazardly, at seemingly random points downstream, often with tons of redundancy and internal contradictions.
I’m not sure how much that matters for your goals! But if I were taking on a project like this I’d be looking at that subset of non-ideal patterns frequently to reassess my assumptions.
aapoalas · 6h ago
Hey, thank you for the viewpoint. I'm myself a career JS/S programmer as well, and I do appreciate that the lived reality is quite varied.
The partial resolving and haphazardness of JSON data usage shouldn't matter too much. I don't mean to make JSON parsed objects to be some special class, per se, or for the memory layout to depend on access patterns on said data. Only, I force data that was created together to be close together in memory (this is what real production engines already do, but only if possible) and for that data to stay together (again, production engines do this but only as is reasonably possible; I force the issue). So I explicitly choose temporal coherence. Beyond that, I use interface inheritance / removal of structural inheritance to reduce memory usage. eg. Plain Arrays (used in the common way) I can push to 9 bytes or even 8 bytes if I accept that Arrays with a length larger than 2^24 are always pessimised. ECS / Struct-of-Arrays data storage then further allows me to choose to move some data onto separate cache lines.
But; it's definitely true that some programs will just ruin all reasonable access patterns and do everything willy-nilly and mixed up. I expect Nova to perform worse on those kinds of cases: as I am adding indirection to uncommon cases and splitting up data onto multiple cache lines to improve common access patterns, I do pessimise the uncommon cases further and further down the drain. I guess I just want to see what happens if I kick those uncommon cases to the curb and say "you want to be slow? feel free." :) I expect I will pay for that arrogance, and I look forward to that day <3
eyelidlessness · 3h ago
Thank you for your response! I’ve been loosely following the project already and now my interest is piqued even more. Your explanation and approach makes a lot of sense to me, now I’m curious to see how it plays out!
nine_k · 8h ago
Hmm, a compacting garbage collector that would try to put live data together, according to its access patterns, might be fun to consider. Along these lines, it could even split objects' attributes along ECS-friendly lines, working in concert with a profiler.
aapoalas · 6h ago
Nova's GC doesn't use access patterns for this, but this is basically what we do, or in some cases aim to do.
Arrays, Objects, ArrayBuffers, Numbers, Strings, BigInts, ... all have their data allocated onto different heap vectors. These heap vectors will eventually be SoA vectors to split objects' attributes along ECS-friendly lines; eg. Array length might be split from the elements storage pointer, Object shape pointer split from the Object property storage pointer etc. Importantly, what we already do is that an Array does not hold all Object's attributes but instead holds an optional pointer to a "backing Object". If an Array is used like an Object (eg. `array.foo = "something"`) then a backing object is created and the Array's backing Object pointer is initialised to point to that data. Because we use a SoA structure, that backing Object pointer can be stored in a sparse column, meaning that Arrays that don't have a backing Object initialised also do not initialise the memory to hold the pointer.
I'm also interested in maybe splitting Object properties so that they're stored in ECS-friendly lines (at least if eg. they're Objects parsed from an Array in JSON.parse).
Our GC is then a compacting GC on these heap vectors where it simply "drops" data from the vector and moves items down to perform compaction. This also means it gets to perform the compaction in a trivially parallel manner <3
k__ · 8h ago
I had the impression, ECS would boost performance mainly by allowing the systems to run in parallel on the entities. Isn't this kinda moot in a single threaded runtime?
aapoalas · 5h ago
This is definitely the more meaningful/influential performance benefit of ECS in game development, I believe. JavaScript will not allow for that as you point out. Perhaps a sufficiently crazy JIT might claw some of those benefits back, though? Not sure.
But: the lesser but still impactful performance benefit of ECS is the usage of Struct-of-Array vectors for data storage. JavaScript can still ruin that benefit by always accessing all parts and features of an Object every time it touches one, but it is a less likely thing to happen. So, there is a benefit that JavaScript code itself can enjoy.
Finally, there is one single "true System" in a JavaScript engine's ECS: the garbage collector. The GC will run through a good part of the engine heap, and you can fairly easily write it to be a batched operation where eg. "all newly found ordinary Objects" are iterated through in memory access order, have their mark checked, and then gather up their referents if they were unmarked. Rinse and repeat to find all live/reachable objects by constantly iterating mostly sequential memory in batches. This can also be parallelised, though then the batch queue needs to become shareable across threads.
The sweep of the heap after this is then a True-True System where all items are iterated in order, unmarked ones are ignored, marked ones are copied to their post-compaction location, and any references they hold are shifted down to account for the locations of items changing post-compaction.
k__ · 4h ago
"a sufficiently crazy JIT might claw some of those benefits back"
Good point.
If you know the data can't be accessed in parallel by the user code, that safety guarantee might allow the JIT to do it anyway.
chris37879 · 9h ago
I'll be checking this project out! I'm a big fan of ECS and have lofty goals to use it for a data processing project I've been thinking about for a long time that has a lot in common with a programming language, enough that I've basically been considering it as one this whole time. So it's always cool to see ECS turn up somewhere I wouldn't otherwise expect it.
aapoalas · 11h ago
Hi, Nova dev here.
Yes, basically. And removing structural inheritance.
throwaway894345 · 11h ago
Can you elaborate on "And removing structural inheritance"? Does that mean Nova doesn't use traits, and if so, why would that matter?
aapoalas · 10h ago
Traits are a type of interface inheritance; base classes and inherited classes à la C++ is structural inheritance.
So basically it just means that I have to write more interfaces and implementations for them, because I don't have base classes to fall onto. Instead, in derived type/class instances I have an optional (maybe null) "pointer" to a base type/class instance. If the derived instance never uses its base class features, then the pointer stays null and no base instance is created.
Often derived objects in JS are only used for those derived features, so I save live memory. But: the derived object type needs its own version of at least some of the base class methods, so I pay more in instruction memory (executable size).
okthrowman283 · 2h ago
How does Nova compare to Boa? Regardless it’s great to see new js engines popping up
Ericson2314 · 11h ago
More ways for Servo to be all-Rust, OK!
aapoalas · 10h ago
That is one explicit goal, maybe next year realistically: Servo has asked for help making their JS engine bindings layer modular, and I have a self-serving interest in helping achieve that :)
Ericson2314 · 9h ago
Nice to hear!
ComputerGuru · 10h ago
OP, since you're here in the comments can you talk about the binary and memory size and sandboxing support? Ability to import and export functions/variables across runtime boundaries? Is this a feasible replacement for Lua scripting in a rust application?
aapoalas · 9h ago
Hmm, sorry, I'm not sure what you mean.
The engine is written with a fair bit of feature flags to disable more complicated or annoying JS features if the embedder so wants: it is my aim that this would go quite deep and enable building a very slim, simple, and easily self-optimising JS engine through this.
That could then perhaps truly serve as an easy and fast scripting engine for embedding use cases.
progval · 9h ago
> written with a fair bit of feature flags
I see you use Cargo feature for this. One thing to be aware of is Cargo's feature unification (https://doc.rust-lang.org/cargo/reference/features.html#feat...), ie. if an application embeds crate A that depends on nova_vm with all features and crate B that depends on nova_vm without any security-sensitive features like shared-array-buffer (eg. because it runs highly untrusted Javascript), then interpreters spawned by crate B will still have all features enabled.
Is there an other way crate B can tell the interpreter not to enable these features for the interpreters it spawns itself?
ComputerGuru · 9h ago
Nice catch, thanks for pointing that out! This also might be less than ideal if it’s the only option (rather than in addition to a runtime startup flag or a per-entrypoint/execution flag) because one could feasibly want to bundle the engine with the app with features x, y, and z enabled but only allow some scripts to execute with a subset thereof while running different scripts with a different subset.
ComputerGuru · 9h ago
That answers half my question (eg disable networking), thank you. The other part was about the overhead of adding this to an app (startup memory usage and increase in binary size) and how much work has been done on interop so that you can execute a static rust function Foo() passing in a rust singleton Bar, or accessing properties or methods on a rust singleton Baz, i.e. calling whitelisted rust code from within the JS env (vice-versa is important but that’s possible by default simply by hard-coding a JS snippet to execute, though marshaling the return value of a JS function without (manually) using JSON at the boundary is also a nice QOL uplift).
* The other being Google's Closure Compiler, which probably isn't relevant to you as it assumes that its output has to run on existing engines in browsers.
I'm aiming for something like 75-85% this year; basically get iterators properly done (they're in the engine but not very complete yet), implement ECMAScript modules, and then mostly focus on correctness, builtins, and performance improvements after that. 99% would perhaps be possible by the end of next year, barring unforeseeable surprises.
For the foreseeable future the aim will be rather on the QuickJS/DuckTape level than beating V8. But! That is only because they need to be beat before V8 can be beaten :)
I'm not rushing to build a JIT, and I don't even have exact plans for one right now but I'm not barring it out either.
If my life, capabilities, and external support enable it then I do want Nova to either supplant existing mainstream engines, or inspire them to rethink at least some of their heap data structures. But of course it is fairly unlikely I will get there; I will simply try.
Still at early stages, quite incomplete, not nearly ready for real use, AFAICT.
There are strong (IMO) reasons to think it will fit, though. User code can indeed do whatever but it rarely does. Programs written in JS are no less structured and predictable than ones written in C++ or Rust or any other language: they mostly operate on groups of data running iterations, loops, and algorithms over and over again. So the instructions being interpreted are likely to form roughly ECS System-like access patterns.
Furthermore, it is more likely that data that came into the engine at one time (eg. one JSON.parse call or fetch result) will be iterated through at the same time. Thus, if the engine can ensure that data is and stays temporally colocated, then it is statistically likely that the interpreter's memory access patterns will not only come from System-like algorithms, they will access Component-array like memory.
So: JS objects (and other heap allocated data) are Entities, their data is laid out in arrays of Components (TODO laying out object properties in Component arrays, at least in some cases), and the program forms the Systems. ECS :)
I think your instincts about program structure are mostly right, but the outliers are pretty far out there.
I’m much less optimistic about how you’re framing arbitrary data access. In my experience, it’s very common for JS code (marginally less common when authored as TS) to treat JSON (or other I/O bound data) as a perpetual blob of uncertainty. Data gets partially resolved into program interfaces haphazardly, at seemingly random points downstream, often with tons of redundancy and internal contradictions.
I’m not sure how much that matters for your goals! But if I were taking on a project like this I’d be looking at that subset of non-ideal patterns frequently to reassess my assumptions.
The partial resolving and haphazardness of JSON data usage shouldn't matter too much. I don't mean to make JSON parsed objects to be some special class, per se, or for the memory layout to depend on access patterns on said data. Only, I force data that was created together to be close together in memory (this is what real production engines already do, but only if possible) and for that data to stay together (again, production engines do this but only as is reasonably possible; I force the issue). So I explicitly choose temporal coherence. Beyond that, I use interface inheritance / removal of structural inheritance to reduce memory usage. eg. Plain Arrays (used in the common way) I can push to 9 bytes or even 8 bytes if I accept that Arrays with a length larger than 2^24 are always pessimised. ECS / Struct-of-Arrays data storage then further allows me to choose to move some data onto separate cache lines.
But; it's definitely true that some programs will just ruin all reasonable access patterns and do everything willy-nilly and mixed up. I expect Nova to perform worse on those kinds of cases: as I am adding indirection to uncommon cases and splitting up data onto multiple cache lines to improve common access patterns, I do pessimise the uncommon cases further and further down the drain. I guess I just want to see what happens if I kick those uncommon cases to the curb and say "you want to be slow? feel free." :) I expect I will pay for that arrogance, and I look forward to that day <3
Arrays, Objects, ArrayBuffers, Numbers, Strings, BigInts, ... all have their data allocated onto different heap vectors. These heap vectors will eventually be SoA vectors to split objects' attributes along ECS-friendly lines; eg. Array length might be split from the elements storage pointer, Object shape pointer split from the Object property storage pointer etc. Importantly, what we already do is that an Array does not hold all Object's attributes but instead holds an optional pointer to a "backing Object". If an Array is used like an Object (eg. `array.foo = "something"`) then a backing object is created and the Array's backing Object pointer is initialised to point to that data. Because we use a SoA structure, that backing Object pointer can be stored in a sparse column, meaning that Arrays that don't have a backing Object initialised also do not initialise the memory to hold the pointer.
I'm also interested in maybe splitting Object properties so that they're stored in ECS-friendly lines (at least if eg. they're Objects parsed from an Array in JSON.parse).
Our GC is then a compacting GC on these heap vectors where it simply "drops" data from the vector and moves items down to perform compaction. This also means it gets to perform the compaction in a trivially parallel manner <3
But: the lesser but still impactful performance benefit of ECS is the usage of Struct-of-Array vectors for data storage. JavaScript can still ruin that benefit by always accessing all parts and features of an Object every time it touches one, but it is a less likely thing to happen. So, there is a benefit that JavaScript code itself can enjoy.
Finally, there is one single "true System" in a JavaScript engine's ECS: the garbage collector. The GC will run through a good part of the engine heap, and you can fairly easily write it to be a batched operation where eg. "all newly found ordinary Objects" are iterated through in memory access order, have their mark checked, and then gather up their referents if they were unmarked. Rinse and repeat to find all live/reachable objects by constantly iterating mostly sequential memory in batches. This can also be parallelised, though then the batch queue needs to become shareable across threads.
The sweep of the heap after this is then a True-True System where all items are iterated in order, unmarked ones are ignored, marked ones are copied to their post-compaction location, and any references they hold are shifted down to account for the locations of items changing post-compaction.
Good point.
If you know the data can't be accessed in parallel by the user code, that safety guarantee might allow the JIT to do it anyway.
Yes, basically. And removing structural inheritance.
So basically it just means that I have to write more interfaces and implementations for them, because I don't have base classes to fall onto. Instead, in derived type/class instances I have an optional (maybe null) "pointer" to a base type/class instance. If the derived instance never uses its base class features, then the pointer stays null and no base instance is created.
Often derived objects in JS are only used for those derived features, so I save live memory. But: the derived object type needs its own version of at least some of the base class methods, so I pay more in instruction memory (executable size).
The engine is written with a fair bit of feature flags to disable more complicated or annoying JS features if the embedder so wants: it is my aim that this would go quite deep and enable building a very slim, simple, and easily self-optimising JS engine through this.
That could then perhaps truly serve as an easy and fast scripting engine for embedding use cases.
I see you use Cargo feature for this. One thing to be aware of is Cargo's feature unification (https://doc.rust-lang.org/cargo/reference/features.html#feat...), ie. if an application embeds crate A that depends on nova_vm with all features and crate B that depends on nova_vm without any security-sensitive features like shared-array-buffer (eg. because it runs highly untrusted Javascript), then interpreters spawned by crate B will still have all features enabled.
Is there an other way crate B can tell the interpreter not to enable these features for the interpreters it spawns itself?