Implementing a Struct of Arrays

126 mpweiher 50 5/9/2025, 10:52:15 AM brevzin.github.io ↗

Comments (50)

PessimalDecimal · 32d ago

I started thinking this would be a rehashing of how column-oriented storage formats can be far more efficient for certain access patterns. But it really was a showcase of what a bloated mess C++ has become.

thechao · 32d ago

I showed up to the C++ rodeo right as C++98 was taking off, and got to work with Jaakko & Gaby & Bjarne (at TAMU) on what would be C++11 (C++0x). Man — those were great times. I still can't decide which craziness — reflection or coroutines — I detest worse in "modern" C++. In either case, I really feel like I'm channeling my inner/outer old-man-shakes-fist-at-clouds. On the other hand, I see languages like Nim, Zig, Rust, Swift, Go, ... and, I mean, seriously.

ranger_danger · 31d ago

I wished Swift got more love, I feel like it takes the "most familiar" (to C++ devs) approach to improving things, as it keeps the syntax extremely similar to C++ but fixes a lot of the issues it has, and adds some very nice things too, like memory safety and named function parameters (which Objective-C had since forever).

dexterous8339 · 32d ago

I recently went down this rabbit hole myself and came out with my own column-major array type.

The explanation drifts into thinking of 2D byte arrays as 3D bit matrices, but in the end it was a 20-30x improvement in speed and binary size.

I was honestly surprised that C++ doesn't have anything built in for this, but at least it's trivial to write your own array type

monkeyelite · 32d ago

And you just do not need to be doing this. Just write out which columns you need and use an index. You can write generic sort and remove, and just overload the swap function for your SOA thing.

pragma_x · 32d ago

That was my takeaway as well. Pivoting a conventional structure in a column-major storage system just smells like a look-aside of some kind (a database). Plus, you lose the benefits of bulk copy/move that the compiler will likely give you in a standard struct (e.g. memcpy()), should it be on the big side. From there, we can use lots of other tricks to further speed things up: hashes, trees, bloom filters...

At the same time, I don't completely understand how such a pivot would result in a speedup for random access. I suppose it would speed up sequential access, since the multiple array storage scheme might force many more cache line updates.

thesz · 32d ago

These days one's program rarely does purely sequential random access. Usually, there are several threads or requests, all going into same data structure.

In the extreme case where all memory accesses of a immensely massively parallel program are deferred, the result is complete absence of cache misses [1].

  [1] https://disneyanimation.com/technology/hyperion/

While our programs usually not ran at Hyperion scale, they still can benefit from accesses shared between processing of several requests.

For one such example, consider speech recognition with the HCLG graph. That graph is created through composition of four WFST [2] graphs, the last one is Grammar, derived from SRILM n-gram language model. This HCLG graph has scale free property [3] due to all G FST states has to backoff to the order-0 model, and some states are visited exponentially often than others.

  [2] https://en.wikipedia.org/wiki/Finite-state_transducer#Weighted_automata
  [3] https://en.wikipedia.org/wiki/Scale-free_network

By sorting HCLG graph states by their fan-ins (number of states referring to a particular one), we sped recognition process up 5%. E.g., most referred state gets index 0, second most referred is 1, etc, and that's it.

No code change, just preprocessing step that, I believe, lessen the pressure on the CPU's memory protection subsystem.

The speech recognition process with HCLG graph uses beam search [4], there are several (9-20) hundredths of states in the beam front to be evaluated. Most of them have outgoing arc to some of the most visited (fan-in) states. By having these states close to each other we incur less page faults exception handling during processing.

  [4] https://en.wikipedia.org/wiki/Beam_search

Basically, we shared more MMU information between requests in beam front.

Fun fact: WFST created by OpenFST has "structure of arrays" layout. ;)

nickysielicki · 32d ago

Boo fucking hoo, C++ isn’t C with classes. Anyone with a basic grasp of c++11 metaprogramming can more-or-less follow this easily. God forbid the language grows.

teleforce · 31d ago

The GP is not wrong though.

Perhaps someone can write similar logic in D programming and showcase how much more intuitive the programming will be since Dlang has first class arrays and its metaprogramming facility is second to none [1].

[1] D (programming language):

https://en.wikipedia.org/wiki/D_(programming_language)

Dwedit · 32d ago

Struct of Arrays vs Array of Structs have different performance, depending on how they are accessed, mainly due to how memory will be cached.

If you are iterating over objects sequentially, and looking at every field, they will perform about the same. If you are iterating over objects sequentially, and only looking at a few fields, Struct of Arrays performs better. If you are accessing objects at random, and reading every field, Array of Structs performs better.

Array of Structs has a multiplication step when you calculate the address of an element. Struct of Arrays basically guarantees that the multiplication will be by a power of 2, so it's a bitshift rather than a multiplication. (Unless your struct contains a type whose size is not a power of 2). Multiplication is still very fast though on most architectures, so that's not all that much of a difference.

compiler-guy · 32d ago

Struct of arrays have the potential to be much more memory efficient if the original struct had padding.

struct Foo {

  int64_t i;

  int8_t  c;

  // 7 bytes of padding for alignment.

};

A Foo array will have seven bytes of wasted space per element. That can add up, and can even blow away what would be nice caching effects if looking at both elements.

Access patterns do matter a lot to this, but it isn't as easy as describing one. Always profile.

[Edit, I wish hn supported markdown.]

WalterGR · 32d ago

Lines that start with 2 spaces are monospaced.

https://news.ycombinator.com/formatdoc

silisili · 32d ago

I haven't benched them, but it feels like in some languages bounds checking would hamper SoA performance, no?

bombela · 32d ago

How could it not be a power of 2?

The compiler makes sure to align fields and pad structs to power of 2 anyways.

tomsmeding · 32d ago

If your struct contains a field that is itself a struct, e.g. one containing three int32_t's, then that field has size 12, which is not a power of two.

compiler-guy · 32d ago

Doesn't even need to be a struct within a struct:

struct Foo {

  int8_t a;

  int8_t b;

  int8_t c;

};

will have a size of 3 and an alignment of 1. Alignment is always power of 2, and size modulus alignment will always be zero. But alignment can be less than size, and size need not be a power of two, just have one power-of-two factor.

bombela · 32d ago

Brain was not fully loaded yet, thank you.

I don't think the struct in struct is relevant in your example by the way. i32 requires 4-bytes alignment. An array of a struct{3×i32} will be aligned to 12 bytes. Which maintains the 4 bytes alignment of the fields.

If this structure is embedded into another one, it will be padded accordingly to the alignment requirement of the outer field.

{ { 3×i32 } i32 } will be 16 bytes.

{ { 3×i32 } i64 } will be 12, padding 4, 8. Totalling 24.

I am assuming x86_64.

gr4vityWall · 32d ago

Interesting article. It does show how modern C++ can be quite scary.

It reminded me of a Haxe macro used by the Dune: Spice Wars devs to transform an AoS into a SoA at compile time to increase performance: https://youtu.be/pZcKyqLcjzc?t=941

The end result is quite cool, though those compile time type generation macros always look too magical to me. Makes me wonder if just getting values using an index wouldn't end up being more readable.

perihelions · 32d ago

I attempted to write a minimal version of the idea in Common Lisp, if anyone was curious about how it'd look like in that language,

https://peri.pages.dev/struct-of-arrays-snippet

monkeyelite · 32d ago

Interest in SOA is bringing to mind the “art of the meta object protocol” which argues for a stage between class definition and implementation that would allow you to choose the layout and access method for instances of a class.

corysama · 32d ago

Yep. We’re in a situation where C-like languages couple layout and access interface very tightly. But, now cache is such an overriding issue in optimization, you really want to rapidly experiment with different layouts without rewriting your whole algorithm every time. AOS, SOA, AOSOA, hot/cold data for different stages, etc…

Jon Blow’s Jai language famously added a feature to references that allowed you to easily experiment with moving data members between hot/cold arrays of structs.

https://halide-lang.org/ tackles a related problem. It decouples the math to be done from the access order so as to allow you to rapidly test looping over data in complicated ways to achieve cache-friendly access patterns for your specific hardware target without rewriting your whole core loop every time.

Halide is primarily an about image processing convolution kernels. I’m not sure how general purpose it can get.

tialaramex · 32d ago

AIUI Jai no longer has that SOA versus AOS functionality which Jon was very proud of when he invented it, I expect it's one of those "hot idea" things where for one week it seems as though this is a breakthrough with applications everywhere and then a week later you realise it was not as fundamental and you were just seeing applications everywhere due to recency illusion.

The week after I first saw a Bloom Filter I imagined lots of places this would surely be great, in the decades since I probably have one bona fide use for a Bloom Filter per year, maybe less.

monkeyelite · 32d ago

Exactly. The SOA thing makes sense in so few cases - usually the one big data type your program operates on.

jayd16 · 32d ago

I suppose something like this would be considered a violation of OOO...

That said Java might be able to naturally decompose record types into SoA collections without a big lift. Same for C# struct.

You might even be able to access views types that still code like objects but point to the backing fields in the SoA collection.

mpweiher · 32d ago

Yep. Objective-S[1] takes the ideas of "the art of the metaobject protocol" and expands on them...using software architectural principles [2].

One early result was the idea of storage combinators[3], and with those, AoS/SoA pretty much just falls out.

Storage Combinators are basically an interface for data. Any kind of data, so generalized a bit from the variations of "instances of a class" that you get in CLOS's MOP.

If you code against that interface, it doesn't matter how your data is actually stored: objects, dictionaries, computed on-demand, environment variables, files, http servers, whatever. And the "combinator" part means that these can be stacked/combined.

While you can do this using a library, and a lot is in fact just implemented in libraries, you need the language support so that things like objects/structs/variables can be accessed quickly using this mechanism.

SoA storage for tables has been on the list of things to do for a long time. I just started to work on some table ideas, so might actually get around to it Real Soon Now™.

Currently I am working on other aspects of the table abstraction, so for example just integrated query, so I ca write the following:

    invoices[{Amount>30}]

And have that query work the same against an array of objects (also a kind of table) and a SQL database.

[1] https://objective.st/

[2] https://dl.acm.org/doi/10.1145/3689492.3690052

[3] https://2019.splashcon.org/details/splash-2019-Onward-papers...

mac3n · 32d ago

JOVIAL language had TABLE structures, which could be declared SERIAL or PARALLEL (https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDeta... page 281).

the__alchemist · 32d ago

I think I've lost the thread on the abstractions. (Me not being very familiar with Zig outside of its most basic syntax is probably why.) I've been doing a lot of SoA work in rust lately; specifically because I have numerical/scientific code that uses CPU SIMD and CUDA; SoA works great for these.

The workflow is, I set up Vec3x8, and Quaternionx8, which wrap a simd instrinsic for each field (x: f32x8, y: f32x8...) etc.

For the GPU and general flattening, I just pack the args as Vecs, then the fn signature contains them like eps: &[f32], sigma: &[f32] etc. So, I'm having trouble mapping this SoA approach to the abstractions used in the article. Then the (C++-like CUDA) kernel sees these as *float3 params etc. It also feels like a complexity reverse of the Rust/Zig stereotypes...

Examples:

  struct Vec3x8 {
    x: f32x8,
    y: f32x8,
    z: f32x8
  } // appropriate operator overloads...


  struct Setup {
      eps: Vec<f32>,
      sigma: Vec<f32>,
  }

So, Structs of Arrays, plainly. Are the abstractions used here something like Jai is attempting, where the internal implementation is decoupled from the API, so you don't compromise on performance vice ergonomics?

jcelerier · 32d ago

> The workflow is, I set up Vec3x8, and Quaternionx8, which wrap a simd instrinsic for each field (x: f32x8, y: f32x8...) etc.

The point is why do that in a human, error-prone way every time (in the current software I work on, I would have hundreds of those structs for instance) when you can program your compiler to do it once for you

sph · 32d ago

        define_aggregate(^^Pointers,
            nsdms(^^T)
            | std::views::transform([](std::meta::info member){
                return data_member_spec(add_pointer(type_of(member)),
                                        {.name = identifier_of(member)});
            }));

What operator is ^^type?

tialaramex · 32d ago

This is the reflection operator. The committee† spent some time bikeshedding different squiggles for this operator, but eventually it looks like ^^ won because it was least awful.

† WG21 of SC22 of JTC1 between ISO and the IEC, "the C++ committee".

See P2996R11 for the latest draft of the work - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p29...

diath · 32d ago

https://isocpp.org/files/papers/P2996R4.html#the-reflection-...

tialaramex · 32d ago

back in R4 this operator is spelled ^ but in the accepted draft it was spelled ^^

TheHideout · 32d ago

So, basically it's allowing you to use a struct like you would in OOP, but get the array benefits of ECS when that struct is in a vector?

dgb23 · 32d ago

It's not necessary to think about the data interface in terms of object orientation.

You can think about it as being a composition of fields, which are individually stored in their respective array.

(Slightly beside the point: Often they are also stored in pairs or larger, for example coordinates, slices and so on are almost always operated on in the same function or block.)

The benefit comes from execution. If you have functions that iterate over these structs, they only need to load the arrays that contain the fields they need.

Zig does some magic here based on comptime to make this happen automatically.

An ECS does something similar at a fundamental level. But usually there's a whole bunch of additional stuff on top, such as mapping ids to indices, registering individual components on the fly, selecting a components for entities and so on. So it can be a lot more complicated than what is presented here and more stuff happens at runtime. It is also a bit of a one size fits all kind of deal.

The article recommends watching Andrew Kelley's Talk on DoD, which inspired the post. I agree wholeheartedly, it's a very fun and interesting one.

One of the key takeaways for me is that he didn't just slap on a design pattern (like ECS), but went to each piece individually, thought about memory layout, execution, trade offs in storing information versus recalculating, doing measurements and back of the envelope calculations etc.

So the end result is a conglomerate of cleverly applied principles and learnings.

jayd16 · 32d ago

More like they used reflection to take a struct and generate a SOA collection for that type. Funnily enough, they skip the part where you can actually get at the arrays and focus on the struct type deconstruction and construction.

throwawaymaths · 32d ago

Isn't ECS often Type-Heterogeneous? I think you mean DoD.

jayd16 · 32d ago

Towards the middle you realize this is about the reflection more than anything else.

I do like how directly accessing the fields individually (the whole reason you would do this) is a hypothetical presented as an after thought. Enjoyably absurd.

gpderetta · 32d ago

The use of reflection is interesting, but is there a significant advantage, compared to something like this:

  template<template<class> class G>
  struct Point {
     G<int> x;
     G<int> y;
     auto get() { return std::tie(x,y); }
  };

  template<template<template<class> class> class C>
  struct SOA  {
    template<class T> using Id = T;
    template<class T> using Ref = T&;
    C<std::vector> vs;
    void push_back(C<Id> x) {
        std::apply([&] (auto&&... r) {
            std::apply([&](auto&&... v){ ( (r.push_back(v)),...); }, x.get());
        }, vs.get());
    }
    C<Ref> operator[](size_t i) {
        return std::apply([&] (auto&&... r) { return C<Ref>{ r[i]...}; }, vs.get());
    }
  };

  int main() {
    SOA<Point> soa_point;
    soa_point.push_back({1,2});
    auto [x,y] = soa_point[0];
  }

OskarS · 32d ago

> template<template<template<class> class> class C>

Boy howdy!

gpderetta · 31d ago

If you are not doing three level deep templates, are you even programming?

gwenzek · 32d ago

Interesting to see the new C++ at work. But also I'm sure it's easier to learn Zig full language that just the new C++ metaprogramming

hinkley · 32d ago

The problem I run into with these sorts of structures is the whole aggregation versus composition issue.

With composition the SoA problem is pretty simple. This object belongs to an owner or a category and it just never 'moves' so the most naive solution is just fine.

I haven't looked at C++ in a million years, but looks like this implementation fixes the aggregation problem by handing out handles to the data, but I don't see anything in here where it tries to hand out the same handle for the same offset. I don't know about C++ but that matters in some languages.

ozgrakkurt · 31d ago

I would recommend apache arrow, it has structures that handle nesting and keep data still fully columnar

adzm · 32d ago

Neat to see the reflection / unibrow operator ^^ in the wild!

No comments yet

ozgrakkurt · 31d ago

It is skeptical if this is even needed, why not design the structs etc. properly from the start so they are columnar if columnar works better?

Meta programming is never free

Enthouan · 32d ago

I’m definitely missing the point, but reading the article, I kept thinking "This would’ve been so much easier in C."

quotemstr · 32d ago

No, it would be impossible in C, the "it" here being the automatic conversion of a struct to an array of the struct's fields. Sure, you can do that pretty easily by hand in C, but you can also do it easily by hand in C++, so whatever. The point of the article is the metaprogramming.

zahlman · 32d ago

I get the impression that ("pure") C programmers say things like that because they don't get themselves into situations where they feel they'd want the metaprogramming.

quotemstr · 32d ago

They might think that they don't need metaprograming, but like everyone else, they eventually do.

That's how we get monstrosities like https://github.com/freebsd/freebsd-src/blob/master/sys/sys/q... (plenty of Linux equivalents --- e.g. the tracepoint stuff)

Better to use a language with thoughtful metaprogramming than tack it on badly later.

trealira · 32d ago

I don't see how it would be easier. Whatever you can do in C, you can do in C++. That said, I was also a bit confused, since I guess I don't keep up with the latest C++ updates enough.

Why Vertical AI Agents May Replace RPA in Complex Enterprise Workflows

Ask HN: What is the latest on treatment of Metastatic Breast Cancer?

Ask HN: What cool skill or project interests you, but feels out of reach?

Tell HN: Help restore the tax deduction for software dev in the US (Section 174)

Ask HN: Any Way to Sidestep Stripe's "alternate currency payout fee"

How to get started with writing tech video essays

Ask HN: How to learn CUDA to professional level

Ask HN: Would You Use a Declarative Back End (APIs, DB, Auth, Sync)?

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

Ask HN: In 15 years, what will a gas station visit look like?

Ask HN: Does Google train on Google Docs?

Data manipulation using natural language prompts

Rust password hashing functions: Argon2, scrypt, PBKDF2

Ask HN: Has anybody built search on top of Anna's Archive?

Building an Audit Readiness Platform for Startups – Would Love Your Feedback

Heroku Outage from AWS CloudFront Issue?

Ask HN: Any good tools for viewing congressional bills?

Control PC games with body movements using webcam andPython+ MediaPipe

Ask HN: What API or software are people using for transcription?

List of Countries Without a Value-Added Tax

Ask HN: What's your open source stack?

Ask HN: What Happened to the Apple Vision Pro?

Ask HN: Do founders get honest feedback on their pitch decks?

Ask HN: Is there any demand for Personal CV/Resume website?

Ask HN: What are some of the best books to learn signal processing?

Ask HN: Options for One-Handed Typing

Ask HN: How do you integrate AI assistants into your note taking?

Best place for small remote gigs?

Bill Atkinson has passed away

Ask HN: Anyone else feeling increasingly alienated from the industry?

Ask HN: I'm considering open-sourcing my UI library – how to generate income?

Ask HN: What would you work on if you couldn't fail?

Ask HN: What Does Your Self-Hosted LLM Stack Look Like in 2025?

Ask HN: Share you personal favorite productivity tools, workflows or setup

Ask HN: How long do you ever leave your server running without updates?

Ask HN: Running AI agents in isolated environments

Get Your Dev Tool Mentioned by ChatGPT, Gemini Not Just Ranked on Google

When Profit Overshadows Community: A Look at Golang Conferences

What is a modern successor to HyperCard?

Ask HN: Share real complaints about outsourcing data annotation

Heroku login down for 7 hours

Ask HN: Is anyone still programming the old-fashioned way (without LLMs)?

Reaching my first 100 users without money or audience (at 10K users now)

Ask HN: What do you put in claude.md and what you leave out?

Ask HN: Wrongly banned from Facebook Marketplace – can anyone help at all?:/

Implementing a Struct of Arrays

Comments (50)