Words about Arrays and Tables

46 todsacerdoti 18 7/30/2025, 3:15:30 PM buttondown.com ↗

Comments (18)

Animats · 37m ago
I get the feeling this guy likes Lua. Lua combines arrays and dicts into "tables", which are usually indexed from 1 but can be indexed by any type as a dict. He never mentions Lua, though.

Multidimensional arrays are a blind spot for modern language designers. Not arrays of arrays, as this author points out. FORTRAN had multidimensional arrays, but C didn't. Go and Rust don't have them. Part of the trouble is slices. Once you have slices, people want slicing in multidimensional arrays along any axis, which complicates the basic array representation. Discussions in this area dissolved into bikeshedding and nothing got done for Go and Rust.

zahlman · 13m ago
> Lua combines arrays and dicts into "tables", which are usually indexed from 1 but can be indexed by any type as a dict.

I think it's more accurate to just call them hash tables ("dicts") that happen to support some very limited array-like functionality. "Appending" still looks like key insertion if you don't use a named method (https://stackoverflow.com/questions/27434142); `ipairs` misses those non-integer keys but also misses non-contiguous keys, or even a zero key (https://stackoverflow.com/questions/55108794); the `#` operator is not well defined once you add those non-contiguous keys (https://stackoverflow.com/questions/2705793) (so you can't actually cleanly override the choice to "start from 1"); slicing etc. aren't provided as operators (https://stackoverflow.com/questions/24821045); etc.

mvc · 3h ago
XSLT/XPath is an example of a platform that provides multiple axis through which to access your data structure.

https://developer.mozilla.org/en-US/docs/Web/XML/XPath/Refer...

zahlman · 42m ago
2000 words, specifically. Thanks again, title filter. Even just "Arrays and Tables" would have been better.

I sincerely believe this aspect of the filter is a misfeature. Submissions that have bad reasons to put a number in the title are generally submissions that are just bad (and should be flagged) anyway.

nyrikki · 1h ago
One thing that may help is to drop the typing for generic words or pidgin holes, that will help with the FORTRAN example.

The IBM 704 had 3 index registers, which could be combined together with a bitwise OR, then was subtracted from a base address, that's where FORTRAN for its 7D arrays from.

When you mix that concept of decrementing index registers with hypercubes, you will naturally get to the extract format of BI tools like classic Tableau.

mpweiher · 2h ago
This might be interesting:

Structuring Arrays with Algebraic Shapes

https://dl.acm.org/doi/abs/10.1145/3736112.3736141

https://news.ycombinator.com/item?id=44399757

galaxyLogic · 2h ago
The characteristic feature of Arays is that all but the last element have their next element and all but the first element have their previous element. So what would be the value of having or using indexes other than Integers to index Arrays?

You can also define integer-valued named constants, and use those if you prefer.

larrik · 2h ago
That sounds more like a linked list than an array.

In C, an array is just contiguous bytes, and you reach an index through math (starting location + (index * size)).

Pretty sure lots of languages still do this to some degree, as your lookup times are effectively zero, though altering the number of elements is way more complex/expensive than in a linked list.

deepsun · 1h ago
> and you reach an index through math (starting location + (index * size)).

By the way, that's the only reason why C-derived languages use unintuitive zero-based indexing. There's really no other reason to call the first element a[0] instead of a[1].

dpassens · 9m ago
Ah, but is it unintuitive though? Once one understands that in the vast majority of cases, C deals in pointers, not arrays, it makes perfect sense that indexing uses an offset rather than the nth element.
Jtsummers · 2h ago
> So what would be the value of having or using indexes other than Integers to index Arrays?

It removes a level of indirection, wasted space, and mental overhead. Suppose you have some table of 'a'..'z' -> whatever. What are your options?

- Switch/case. This may be optimized and fast, but that's not guaranteed across all languages and compilers.

- Hash map. This introduces memory overhead even if it is technically O(1) for access times.

- If in C, you could do:

  T* table = (T*)malloc(sizeof(T) * 26) - 'a'
So that the characters can be used naturally without later modification (not a feature in every language, though).

- Manually calculate each offset every time:

  table[c - 'a'] = ...
With language appropriate type conversions, if needed.

Or you can allow any custom range to work, and do the math in that last option automatically because it's trivial for a compiler to figure it out and it removes unnecessary mental overhead from the program reader and writer.

  table['a']
Just works.

> You can also define integer-valued named constants, and use those if you prefer.

Because everyone wants to see `lower_case_a = 0` and the 25 other cases in their code along with all other possible constants. At least use an enum, sane languages even make them type-safe (or at least safer).

mcphage · 1h ago
> The characteristic feature of Arays is that all but the last element have their next element and all but the first element have their previous element.

Where are you getting that from? That's not the characteristic feature of Arrays, that's the characteristic feature of Doubly Linked Lists.

zahlman · 36m ago
I wouldn't call it a characteristic feature of either. Any ordered, finite sequence of elements has those properties. Doubly linked lists simply offer O(1) access to "next elements" and "previous elements" given an element.
kccqzy · 3h ago
I'm not sure what's the insight in this article here. All the things mentioned seem pretty straightforward. But I'll comment on a few:

> Really any finite set can be a "dimension".

This is absolutely true and yet typical programming languages don't do a good job of unleashing this. When is the last time you wish you could define an array where the indices are from a custom finite set, perhaps a range 100 to 200, perhaps an enum? I only know of two languages, Pascal and Haskell, that are sufficiently flexible. All other languages insist that indices for arrays must be integers either starting from 0 or 1. Haskell has a mechanism to let you control what is an array index: https://hackage.haskell.org/package/base-4.18.1.0/docs/Data-... The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.

> We can also transform Row -> PowerSet(Col) into Row -> Col -> Bool, aka a boolean matrix.

I mean sure. Does the author know type signatures and their operations like this are isomorphic to algebraic operations on size of the possibly infinite sets? The function A -> B has exactly |B|^|A| implementations so that's why Col -> Bool is in this sense "same" as PowerSet(Col). And the function operator associates to the right so adding Row -> doesn't change anything. Wait till you learn why sum types are called sum types and why product types are called product types. Here's another thing that might be mind-blowing: arrays of an unknown length can be thought of as the infinite union of arrays of lengths of all natural numbers: Array<T, 0> | Array<T, 1> | … so using the above notation the cardinality is 1 + |T| + |T|^2 + …; but they can also be represented as a standard linked list List<T>=Nil|(T, List<T>) and using the above notation we have |List<T>|=1+|T|*|List<T>| which simplifies to |List<T>|=1/(1-|T|). And guess what, the Taylor expansion of that is just the earlier 1 + |T| + |T|^2 + … precisely.

Jtsummers · 2h ago
> I only know of two languages, Pascal and Haskell, that are sufficiently flexible.

Julia, Fortran, some BASICs allow custom integer ranges.

Ada allows you to use any discrete range (integer, character, enums).

https://en.wikipedia.org/wiki/Comparison_of_programming_lang... - A more comprehensive list

The two columns of interest are "Specifiable index type" and "Specifiable base index".

cestith · 1h ago
Older versions of Perl allowed you to set a lexically-scoped array base integer. It defaulted to zero and the docs mentioned 1-based arrays, but IIRC it could be any integer. This was deprecated several years ago in 5.12 as a harmful practice. Specifying using features of a Perl version of 5.16 or newer actually makes assigning to it (except for 0) a compile-time error.

Interestingly, though, Perl’s native arrays are decidedly not an array of primitives. They are arrays of scalar values, and a scalar value can not only contain a character, a string, an integer, a float, or a reference to another item (scalar, hash, array, blessed object, filehandle, etc) but it can return different values when called in a numeric context vs a string context. There are certainly ways to get at primitive values, but it’s not in Perl’s native array semantics.

pklausler · 1h ago
Non-default lower bounds in Fortran are a famous pitfall, however; they don't persist in many cases where one might think that they do, and are not 100% portable across compilers.
zahlman · 28m ago
> When is the last time you wish you could define an array where the indices are from a custom finite set, perhaps a range 100 to 200, perhaps an enum?

> ...

> The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.

You can simulate this in any language by hash-mapping the custom set elements to indices. (And in the general case, you won't be able to determine the indices faster than a hash table lookup.)

In fact, I imagine there are languages and implementations where hash table lookup is the fastest way to map 'a':'z' to 0:25.

In fact, let me try a (surely totally bogus) micro-benchmark right now:

  $ python -m timeit --setup 'lookup = dict(zip("abcdefghijklmnopqrstuvwxyz", range(26)))' '[lookup[x] for x in "abcdefghijklmnopqrstuvwxyz"]'
  500000 loops, best of 5: 884 nsec per loop
  $ python -m timeit '[ord(x) - 97 for x in "abcdefghijklmnopqrstuvwxyz"]'
  200000 loops, best of 5: 1.21 usec per loop
Huh.

...And surely if you use the hash map directly, instead of using it to grab an index into another container, overall performance only gets better (unless maybe you need to save per-instance memory, and you have multiple containers using the same custom index scheme).

Hash tables are a pretty neat idea.