Claude Code IDE integration for Emacs (github.com)

This is really clever but better to call this a list rather than an array; functions which expect array semantics will simply not work, and there's no way to transparently pass slices of this data structure around.

In the past I've abused virtual memory systems to block off a bunch of pages after my array. This lets you use an array data structure, have guard pages to prevent out of bounds access, and to have stable pointers in the data structure.

zokier · 2h ago

> Today’s computers use only 48 bits of the 64 bits in a pointer

https://en.wikipedia.org/wiki/Intel_5-level_paging introduced in Ice Lake 6 years ago.

But anyways, isn't this just variant of std::deque? https://en.cppreference.com/w/cpp/container/deque.html

sigbottle · 1h ago

I don't know the precise details of how deques are implemented in C++, but given the most popular stack overflow explanation of them, some immidiate pitfalls are that the T* map itself sounds unbounded and if each chunk allocates only a fixed constant size it's probably horrible for fragmentation or overallocation. The indexing also seems dependent on division.

With this power of twos approach you can't really truly delete from the front of the array but the amount of pointers you store is constant and the memory fragmentation is better. (Though OP never claimed to want to support deque behavior, it shouldn't be that hard to modify, though indexing seems like it has to go thru more arithmetic again)

I haven't used OP's array, but I have been bit plenty of times with std::deque's memory allocation patterns and had to rewrite with raw arrays and pointer tracking.

cornstalks · 1h ago

What kind of setups use over 256 TiB of RAM?

TapamN · 1h ago

It not necessarily physical RAM. If you memmap large files, like maybe a large file from RAID or network share, you could still need that much virtual address space.

bonzini · 1h ago

In practice it's over 64 TiB because kernels often use a quarter of the available addressing space (half of the kernel addressing space) to map the physical addresses (e.g. FFFFC000_12345678 maps physical address 0x12345678). So 48 virtual address bits can be used with up to 2^46 bytes of RAM.

reorder9695 · 1h ago

"640k ought to be enough for anybody"

mwkaufma · 1h ago

In principle it's not that different that deque, though:

(1) deque uses fixed-sized blocks, not increasing-size blocks. (2) dequeue supports prepending, which adds another level of indirection internally.

sigbottle · 1h ago

You can support prepending by mirroring the allocations, probably? eg for the "negative index" case do an exponential thing in the other direction.

Your indexing has some legitimate math to be done now which can be annoying (efficiency perspective) I think you can still get o(1) with careful allocation of powers of 2.

o11c · 1h ago

That's fine if you only ever add, but is likely to fail if you pop FIFO-style. This too is ultimately fixable but means we can no longer assume "every bucket size doubles".

That said, IMO "stable pointers" is overrated; "minimize copying" is all that's useful.

sestep · 1h ago

Does std::deque support random access?

mwkaufma · 1h ago

Yes, you can see operator[] in the linked reference docs.

unwind · 1h ago

Very nice, although I think the level of "trickery" with the macros becomes a bit much. I do understand that is The Way in C (I've written C for 30 years), it's just not something I'd do very often.

Also, from a strictly prose point of view, isn't it strange that the `clz` instruction doesn't actually appear in the 10-instruction disassembly of the indexing function? It feels like it was optimized out by the compiler perhaps due to the index being compile-time known or something, but after the setup and explanation that was a bit jarring to me.

mananaysiempre · 59m ago

The POSIX name for the function is clz() [the C23 name is stdc_leading_zeros(), because that's how the committee names things now, while the GCC intrinsic is __builtin_clz()]. The name of the x86 instruction, on the other hand, is BSR (80386+) or LZCNT (Nehalem+, K10+) depending on what semantics you want on zero input (take care that early implementations of BSF/BSR are very slow and take time proportional to the output value). The compiled code uses BSR. (All of these are specified slightly differently, take care if you plan to actually use them.)

variadix · 1h ago

You can also use virtual memory for a stable resizable vector implementation, up to some max length based on how much you virtual memory you reserve initially, then commit as required to grow the physical capacity.

loeg · 1h ago

Yeah, with less runtime overhead, so long as you're ok with the ~4kB minimum allocation size.

fyrn_ · 37m ago

This mention this alturnative in the article, and also point out how it does not work in embeded contexts or with WASM

pfg_ · 42m ago

Zig has this as std.SegmentedList, but it can resize the segment array dynamically

o11c · 1h ago

Can we really call it an array if it's not contiguous (or at least strided)? Only a small fraction of APIs take an `iovec, iovcnt`-equivalent ...

jandrese · 1h ago

Yeah, the limitation that it can't be just dumped into anything that expects a C array is a large one. You need to structure your code around the access primitives this project implements.

dhooper · 1h ago

feel free to call it a "levelwise-allocated pile"

tovej · 1h ago

Very nice! I do wonder if it would be useful to be able to skip even more smaller segments, maybe a ctor argument for the minimum segment size. Or maybe some housekeeping functions to collapse the smallest segments into one.

Mostly the thing that feels strange is when using say, n > 10 segments, then the smallest segment will be less than a thousandth of the largest, and iterating over the first half will access n-1 or n-2 segments, worse cache behaviour, while iterating over the second half will access 1 or two segments.

Seems like, in most cases, you would want to be able to collapse those earlier segments together.

01HNNWZ0MV43FF · 1h ago

Readers might also find `plf::colony` interesting: https://www.plflib.org/colony.htm

jovial_cavalier · 1h ago

The example code doesn't seem to compile.

seanwessmith · 34m ago

Make sure the create the segment_array.h file. Mine outputted just fine on Mac M4

  Desktop gcc main.c        
  Desktop ./a.out

entities[0].a = 1 entities[0].a = 1 entities[1].a = 2

Claude Code IDE integration for Emacs (github.com)

Project Hyperion: Interstellar ship design competition (projecthyperion.org)

Litestar is worth a look (b-list.org)

A fast, growable array with stable pointers in C (danielchasehooper.com)

Writing a Rust GPU kernel driver: a brief introduction on how GPU drivers work (collabora.com)

The History of F1 Design (espn.com)

Multics (multicians.org)

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model (github.com)

Comptime.ts: compile-time expressions for TypeScript (comptime.js.org)

We'd be better off with 9-bit bytes (pavpanchekha.com)

Jules, our asynchronous coding agent (blog.google)

Breaking the sorting barrier for directed single-source shortest paths (quantamagazine.org)

Zig Error Patterns (glfmn.io)

Vibe coding the MIT course catalog (stackdiver.com)

The arcane alphabets of Black Sabbath (fontsinuse.com)

The Bluesky Dictionary (avibagla.com)

303Gen – 303 acid loops generator (303-gen-06a668.netlify.app)

Why is it worth spending time on type theory? (2013) (math.stackexchange.com)

Automerge 3.0 (automerge.org)

Rethinking DOM from first principles (acko.net)

Gleam v1.12 (github.com)

Wild pigs' flesh turning neon blue in California (phys.org)

Show HN: Sinkzone DNS – Forwarder that blocks everything except your allowlist (github.com)

The 1090 Megahertz Riddle: A Guide to Decoding Mode S and ADS-B Signals (books.open.tudelft.nl)

Show HN: Write lead sheets in a Markdown way and transpose in a second (cord.land)

We shouldn't have needed lockfiles (tonsky.me)

I gave the AI arms and legs then it rejected me (grell.dev)

About the BLOBs in Ventoy (github.com)

The Real Origin of Cisco Systems (1999) (tcracs.org)

Consistency over Availability: How rqlite Handles the CAP theorem (philipotoole.com)

Qwen3-4B-Thinking-2507 (huggingface.co)

Realizing we needed two sorts of alerts for our temperature monitoring (utcc.utoronto.ca)

Show HN: An open-source e-book reader for conversational reading with an LLM (github.com)

Show HN: When is the next Caltrain? (minimal webapp) (erikschluntz.com)

Open models by OpenAI (openai.com)

The internet wants to check your ID (newyorker.com)

Google suffers data breach in ongoing Salesforce data theft attacks (bleepingcomputer.com)

How and Why to Ditch GitHub (taggart-tech.com)

Python performance myths and fairy tales (lwn.net)

States and cities decimated SROs, Americans' lowest-cost housing option (pew.org)

Dotfiles feel too personal to share (hamatti.org)

Ofcom will force payment processors and ISPs to stop doing business with you (bsky.app)

Steam's fight against Visa, Mastercard, and censorship is only getting messier (polygon.com)

NautilusTrader: Open-source algorithmic trading platform (nautilustrader.io)

Blocking LLMs from your website cuts you off from next-generation search (johnjianwang.medium.com)

How to Scale Proteomics (asimov.press)

Cognitive decline can be slowed down with lifestyle changes (smithsonianmag.com)

Kyber (YC W23) is hiring enterprise account executives (ycombinator.com)

Brennan Center for Justice Report: The Campaign to Undermine the Next Election (brennancenter.org)

Anthropogenic warming drives earlier wildfire season onset in California (science.org)

A fast, growable array with stable pointers in C

Comments (25)